Byte

From wiki.gis.com
Jump to: navigation, search

A byte (pronounced /ˈbaɪt/) is a basic unit of measurement of information storage in computer science. In many computer architectures it is a unit of memory addressing. There is no standard[citation needed], but a byte most often consists of eight bits.

A byte is an ordered collection of bits, with each bit denoting a single binary value of 1 or 0. The byte most often consists of 8 bits in modern systems; however, the size of a byte can vary and is generally determined by the underlying computer operating system or hardware. Historically, byte size was determined by the number of bits required to represent a single character from a Western character set. Its size was generally determined by the number of possible characters in the supported character set and was chosen to be a divisor of the computer's word size.

The popularity of IBM's System/360 architecture starting in the 1960s and the explosion of microcomputers based on 8-bit microprocessors in the 1980s has made eight bits by far the most common size for a byte. The term octet is widely used as a more precise synonym where ambiguity is undesirable (for example, in protocol definitions).

Usage

A byte often designates a contiguous sequence of a fixed number of bits (binary digits). The use of a byte to mean 8 bits has become ubiquitous.

When used to describe hardware aspects of a binary computer, it is a contiguous sequence of bits that comprises the smallest addressable sub-field of the computer's natural wordsize. That is, the smallest unit of binary data on which meaningful computation can be applied. For example, the CDC 6000 series scientific mainframes divided their 60-bit floating-point words into 10 six-bit bytes. These bytes conveniently held character data from punched Hollerith cards, typically the upper-case alphabet and decimal digits. CDC also often referred to 12-bit quantities as bytes, each holding two 6-bit display code characters, due to the 12-bit I/O architecture of the machine. The PDP-10 used assembly instructions LDB and DPB to load and deposit bytes of any width from 1 to 36 bits — these operations survive today in Common Lisp. Bytes of six, seven, or nine bits were used on some computers, for example within the 36-bit word of the PDP-10. The UNIVAC 1100/2200 series computers (now Unisys) addressed in both 6-bit (Fieldata) and 9-bit (ASCII) modes within its 36-bit word.

History

The term byte was coined by Dr. Werner Buchholz in July 1956, during the early design phase for the IBM Stretch computer.[1][2][3] Originally it was defined in instructions by a 4-bit byte-size field, allowing from one to sixteen bits (the production design reduced this to a 3-bit byte-size field, allowing from one to eight bits to be represented by a byte); typical I/O equipment of the period used six-bit bytes. A fixed eight-bit byte size was later adopted and promulgated as a standard by the System/360. The term "byte" comes from "bite," as in the smallest amount of data a computer could "bite" at once. The spelling change not only reduced the chance of a "bite" being mistaken for a "bit," but also was consistent with the penchant of early computer scientists to make up words and change spellings. A byte was also often referred to as "an 8-bit byte", reinforcing the notion that it was a tuple of n bits, and that other sizes were possible.

  1. A contiguous sequence of binary bits in a serial data stream, such as in modem or satellite communications, or from a disk-drive head, which is the smallest meaningful unit of data. These bytes might include start bits, stop bits, or parity bits, and thus could vary from 7 to 12 bits to contain a single 7-bit ASCII code.
  2. A data type in certain programming languages. The C and C++ programming languages, for example, define byte as "addressable unit of data large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values (signed or unsigned char does not matter). Various implementations of C and C++ define a "byte" as 8, 9, 16, 32, or 36 bits[4][5]. The actual number of bits in a particular implementation is documented as CHAR_BIT as implemented in the limits.h file. Java's primitive byte data type is always defined as consisting of 8 bits and being a signed data type, holding values from −128 to 127.

Early microprocessors, such as Intel 8008 (the direct predecessor of the 8080, and then 8086) could perform a small number of operations on four bits, such as the DAA (decimal adjust) instruction, and the "half carry" flag, that were used to implement decimal arithmetic routines. These four-bit quantities were called "nybbles," in homage to the then-common 8-bit "bytes."

Historical IETF documents cite varying examples of byte sizes. RFC 608 mentions byte sizes for FTP hosts (the FTP-BYTE-SIZE attribute in host tables for the ARPANET) to be 36 bits for PDP-10 computers and 32 bits for IBM 360 systems.[6]

Unit symbol or abbreviation

IEEE 1541 and Metric-Interchange-Format specify "B" as the symbol for byte (e.g. MB means megabyte), while IEC 60027 seems silent on the subject. Furthermore, B means bel (see decibel), another (logarithmic) unit used in the same field. The use of B to stand for bel is consistent with the metric system convention that capitalized symbols are for units named after a person (in this case Alexander Graham Bell); usage of a capital B to stand for byte is not consistent with this convention. There is little danger of confusing a byte with a bel because the bel's sub-multiple the decibel (dB) is usually preferred, while use of the decibyte (dB) is extremely rare.

The unit symbol "KB" is a commonly used abbreviation for "kilobyte" but is often confused with the use of "kb" to mean "kilobit". IEEE 1541 specifies "b" as the symbol for bit; however the IEC 60027 and Metric-Interchange-Format specify "bit" (e.g. Mbit for megabit) for the symbol, achieving maximum disambiguation from byte.

Lowercase "o" for "octet" is a commonly used symbol in several non-English-speaking countries, and is also used with metric prefixes (for example, "ko" and "Mo").

Today the harmonized ISO/IEC IEC 80000-13:2008 - Quantities and units -- Part 13: Information science and technology standard cancels and replaces subclauses 3.8 and 3.9 of IEC 60027-2:2005 (those related to Information theory and Prefixes for binary multiples). See Units of information#Byte for detailed discussion on names for derived units.

Unit multiples

See also: Binary prefixes
Linearly growing percentage of the difference between decimal and binary interpretations of the unit prefixes plotted against the logarithm of storage size.

There has been considerable confusion about the meanings of SI (or metric) prefixes used with the word "byte", especially concerning prefixes such as kilo- (k or K) and mega- (M) as shown in the chart Prefixes for bit and byte. Since computer memory is designed with binary logic, multiples are expressed in powers of 2, rather than 10. The software and computer industries often use binary estimates of the SI-prefixed quantities, while producers of computer storage devices prefer the SI values. This is the reason for specifying computer hard drive capacities of, say, "100 GB" when it contains 93 GiB (or 93 GB in traditional units) of addressable storage. Because of the confusion, a contract specifying a quantity of bytes must define the system of unit interpretation used.[citation needed]

See also

  • Bit
  • Octet (computing)
  • Word (computing)
  • Data hierarchy
  • Primitive data type
  • Nibble

References

  1. Origins of the Term "BYTE" Bob Bemer, accessed 2007-08-12
  2. TIMELINE OF THE IBM STRETCH/HARVEST ERA (1956–1961) computerhistory.org, '1956 July ... Werner Buchholz ... Werner's term "Byte" first popularized'
  3. byte catb.org, 'coined by Werner Buchholz in 1956'
  4. [26] Built-in / intrinsic / primitive data types, C++ FAQ Lite
  5. Integer Types In C and C
  6. Host Names On-Line, M.D. Kudlick, SRI-ARC (January 10, 1974)