Regarding Data
We defined programs as a "set of instructions" in the previous section. What do those instructions actually do?When we think of programs, there is a wide range of possibilities. Games are as much of a program as a search engine. A sleep monitoring app is as much of a program as a dating website. These are all programs despite how different they are. Perhaps in a dystopian future, all 4 of these are the same app.
But no matter how different a program may seem from another program, we can say that all programs (no matter its type) share a common goal: all programs move data around.
This is true regardless of the type of program. A search engine is a program that takes an arbitrary prompt from a human and returns useful links and answers. A sleep monitoring app takes numerical measurements from your body and makes inferences about your quality of sleep. A game takes inputs from a user to influence a simulation. When that simulation reaches some end state, the player is notified whether they won or lost.
All of these processes involve taking data from somewhere, performing some sort of calculation on data, combining it with other data, or moving that data somewhere else.
A programmer does two things when writing code: expressing data and processing data. As you learn the fundamentals of writing code, everything will fall into one of these two categories.
As an analogy, learning to speak and write in a language fundamentally requires you to use nouns and verbs. As you learn more advanced grammar and vocabulary, these allow you to express more complicated thoughts, but at the core of each sentence there is a subject and an action occurring. The additional grammar and vocabulary allows you to describe what is happening to these nouns and verbs more expressively and with nuance, but nouns and verbs are still the fundamental components of the sentence. When you become fluent in a language, you don't consciously think about this grammar and the sentences flow naturally, but when you're just learning from the beginning, it is helpful to identify this fundamental grammar.
Programming is no different.
Expressing Data
Expressing data is learning the nouns of a programming language. Data falls into two categories: primitive data and complex data.Primitive Data Types
Primitive data is data that is so fundamental that it is not useful to break it down further.There are 4 main categories of data that can be considered fundamental: Numbers, Strings, Booleans, and Null.
Numbers are pure numeric quantities. 4 is a number. -38.2 is a number. If a weather app knows that the temperature is 40 degrees, it would generally represent this as a number, but the fact that it's degrees or even Celsius or Farenheit is the responsibility of the code around it to make that distinction. The thing about fundamental data types is that they are truly fundamental. They lack units. They lack context. It is the program and the way it uses the data that gives it context.
A string is a term that programmers use when what they really mean to say is just "text". A string is a collection of characters that has some purpose. A blog post. A blog comment. This sentence you're reading right now. Your facebook password. These are not numeric quantities. It's a collection of literal characters. The reason why it's called a string is because it's a "string of characters".
A boolean is a fancy term that means something is "true" or "false". Is a car in a car rental database an automatic or manual transmission? Is your social media account suspended? Is your operating system currently set to dark mode? The data that represents the answers to these yes/no questions is probably a boolean. One could alternately represent a boolean using the numbers 1 and 0. Or even the strings "yes" and "no". Sometimes programmers actually do this. But generally it's more useful to treat booleans as a separate category.
A null represents the intentional absence of data. What is someone's nickname in your contact list when they don't have one? Who is the manager of the CEO of a company?
Using these 4 fundamental data types, any type of data can be expressed. However, sometimes ideas are more complex than a single number or a bit of text. These more complex ideas must be expressed as a combination of multiple pieces of data.
Complex Data
There are two fundamental ways to combine data to express more complex ideas: a structure and a collection.When multiple pieces of data are gathered to represent a single complex idea, this is a structure. For example, a program may think of a person as a username, password, an email, and whether or not that person is online right now. This complex idea of a person includes 3 strings and a boolean. We combine these 4 pieces of data together to represent a person in a program.
The other type of combining data is a collection, which means you have multiple pieces of the same kind of data. This could be like a list of tasks represented as strings, a receipt with a list of items purchased. A lookup table of area codes to the cities they belong to. These are collections of data.
Are there counterexamples?
No.You might be thinking that media is a counterexample. Images, sounds, video, animated 3D objects: are these made from fundamental data types? They actually are, despite their apparent uniqueness.
An image is a collection of pixels. A pixel is a color. A color is a structure that can be represented by a series of 3 numbers describing their red, green, and blue light intensity. An image is just a bunch of numbers.
A sound is harder to describe without diving into a long distracting technical discussion, but you'll just have to trust me that it boils down to a collection of numbers.
A video is a structure that is a sound, a collection of images, and a rate at which the images should be displayed in quick succession. These are all numbers.
A 3D object is a collection of coordinates, and colors or textures. Coordinates are numbers. A texture is an image. Images are numbers. A color is numbers. Again, we've boiled this down to numbers.
In fact, we can further argue that strings themselves are collections of characters and characters are assigned a specific ID number by the computer. When a string needs to be shown on a screen, a font is just a lookup table (a type of collection) that converts character ID numbers to a visual representation. So in essence, a string is also a list of numbers. This distinction is not useful for the time being so for now, we will treat strings as a fundamental data type.
But it does bring up an important point: fundamentally everything the computer does is move numbers around. We don't see these numbers when we look at an image on a screen. That is because a number in a computer is pure and lacks context other than the context that is given to it by how the program treats it. A program, for example, can take a list of numbers and interpret it as an image that is displayed on a screen.
Are we talking about a specific programming language right now?
We're actually talking about all programming languages.What has been expressed above is the abstract idea of the "nouns of programming". All spoken languages have nouns in the same way that all programming languages have the same concept of fundamental data types and combining data to express collections or structures. It doesn't matter if you are learning JavaScript, C++, or Python. You will generally see this same information in the first few chapters of any tutorial for any programming language with slight variances in nuance. The terms "primitives", "strings", "booleans", etc. are not simply lingo associated with a single programming language. These terms will recur in almost all programming languages.
On the other hand, as we begin to discuss the "verbs of programming", the ability to speak in abstract terms becomes much more difficult because programming languages process data very differently. As such, in the next section we will finally look at a specific programming language.
Summary
- All programs process data in some fashion, no matter what kind of program it is.
- When programmers write code, they are either expressing data or processing data. These are the nouns and verbs of programming.
- Data is expressed using primitive (fundamental) data types or complex data.
- Primitive data generally has 4 categories: numbers, strings, booleans, and null.
- Complex data has 2 categories: collections and structures.
- Structures are formed when a programmer combines multiple pieces of data to express a more complicated entity.
- Collections are formed when a programmer combines data of the same type to express that there are multiple entities of that type.
- Programming languages all express data in this same fundamental way.
- No matter how complex data is, it can always be boiled down to fundamental data types using the idea of collections and structures. Even complex media like images and sound.
An exercise
As a thought exercise, think about how things in your day-to-day life can be expressed as complex combinations of fundamental data. The nutritional value of the food you eat. The weather forecast for the next 10 days. etc. This ability to break complex ideas down into fundamental components is the first step to designing any program.If you are logged in, you can add these terms to your study list and use the flashcards app. Otherwise, you will have to settle for this table.
primitive type | A fundamental piece of data that cannot be productively broken down into smaller more fundamental types. |
complex type | A piece of data is created by combining multiple pieces of data. |
string | A fundamental data type that represents some text. |
boolean | A fundamental data type that represents the state of being "true" or "false". |
null | A fundamental data type that represents the intentional absence of a value. |
collection | A complex piece of data that is created by combining multiple items of the same type of data to represent the existence of multiple entities. |
structure | A complex piece of data that is created by combining multiple pieces of data to represent a single more complex idea. |