Data items processed by computers form a data hierarchy that becomes larger and more complex in structure as we progress from the simplest data items (called “bits”) to richer ones, such as characters and fields. The following diagram illustrates a portion of the data hierarchy:
A bit (short for “binary digit”—a digit that can assume one of two values) is the smallest
data item in a computer. It can have the value 0 or 1. Remarkably, the impressive functions performed by computers involve only the simplest manipulations of 0s and 1s—examining a bit’s value, setting a bit’s value and reversing a bit’s value (from 1 to 0 or from 0 to 1). Bits for the basis of the binary number system, which you can study in-depth in our online “Number Systems” appendix.
Work with data in the low-level form of bits is tedious. Instead, people prefer to work with decimal digits (0–9), letters (A–Z and a–z) and special symbols such as $ @ % & * ( ) – + ” : ; , ? / Digits, letters and special symbols are known as characters. The computer’s character set contains the characters used to write programs and represent data items. Computers process only 1s and 0s, so a computer’s character set represents every character as a pattern of 1s and 0s. Python uses Unicode® characters that are composed of one, two, three or four bytes (8, 16, 24 or 32 bits, respectively)—known as UTF-8 encoding.5 Unicode contains characters for many of the world’s languages. The ASCII (American Standard Code for Information Interchange) character set is a subset of Unicode that represents letters (a–z and A–Z), digits and some common special characters. You can view the ASCII subset of Unicode at
The Unicode charts for all languages, symbols, emojis and more are viewable at
Just as characters are composed of bits, fields are composed of characters or bytes. A field is a group of characters or bytes that conveys meaning. For example, a field consisting of uppercase and lowercase letters can be used to represent a person’s name, and a field consisting of decimal digits could represent a person’s age.
Several related fields can be used to compose a record. In a payroll system, for example,
the record for an employee might consist of the following fields (possible types for these
fields are shown in parentheses): • Employee identification number (a whole number).
• Name (a string of characters).
• Address (a string of characters).
• Hourly pay rate (a number with a decimal point).
• Year-to-date earnings (a number with a decimal point).
• Amount of taxes withheld (a number with a decimal point).
Thus, a record is a group of related fields. All the fields listed above belong to the same employee. A company might have many employees and a payroll record for each.
A file is a group of related records. More generally, a file contains arbitrary data in arbitrary formats. In some operating systems, a file is viewed simply as a sequence of bytes—any organization of the bytes in a file, such as organizing the data into records, is a view created by the application programmer. It’s not unusual for an organization to have many files, some containing billions, or even trillions, of characters of information.
A database is a collection of data organized for easy access and manipulation. The most
popular model is the relational database, in which data is stored in simple tables. A table
includes records and fields. For example, a table of students might include first name, last name, major, year, student ID number and grade-point-average fields. The data for each student is a record, and the individual pieces of information in each record are the fields.
You can search, sort and otherwise manipulate the data, based on its relationship to multiple tables or databases. For example, a university might use data from the student database in combination with data from databases of courses, on-campus housing, meal plans, etc.
The table below shows some common byte measurements:
|Unit||Bytes||Which is approximately|
1 kilobyte (KB) ||1024 bytes||103 (1024) bytes exactly|
1 megabyte (MB) bytes||1024 kilobytes||106 (1,000,000)|
|1 gigabyte (GB)||1024 megabytes||
109 (1,000,000,000) bytes|
|1 terabyte (TB)||1024 gigabytes||
1012 (1,000,000,000,000) bytes|
|1 petabyte (PB)||1024 terabytes||
1015 (1,000,000,000,000,000) bytes|
|1 exabyte (EB)||1024 petabytes||
1018 (1,000,000,000,000,000,000) bytes|
|1 zettabyte (ZB)||1024 Exabyte||10 21(1,000,000,000,000,000,000,000) bytes|
The amount of data being produced worldwide is enormous and its growth is accelerating.
Big data applications deal with massive amounts of data. This field is growing quickly, creating lots of opportunity for software developers. Millions of IT jobs globally already are supporting big data applications.