Binary file

From wiki.gis.com
Jump to: navigation, search

A binary file (commonly, but not necessarily, with the extension .bin) is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text. Many binary file formats contain parts that can be interpreted as text; binary files that contain only textual data - without, for example, any formatting information - are called plain text files. In many cases, plain text files are considered to be different from binary files because binary files are made up of more than just plain text. When downloading, a completely functional program without any installer is also often called program binary, or binaries (as opposed to the source code).

Structure

Binary files are usually thought of as being a sequence of bytes, which means the binary digits (bits) are grouped in eights. Binary files typically contain bytes that are intended to be interpreted as something other than text characters. Compiled computer programs are typical examples; indeed, compiled applications (object files) are sometimes referred to, particularly by programmers, as binaries. But binary files can also contain images, sounds, compressed versions of other files, etc. — in short, any type of file content whatsoever.

Some binary files contain headers, blocks of metadata used by a computer program to interpret the data in the file. For example, a GIF file can contain multiple images, and headers are used to identify and describe each block of image data. If a binary file does not contain any headers, it may be called a flat binary file. Modern computers are digital, that is, all info is stored as a string of zeros or ones - off or on. Everything that a computer does is done by manipulating these digits. The concept is simple, but working it all out gets complicated.

1 bit = one on or off position 1 byte = 8 bits So 1 byte can be one of 256 possible combinations of 0 and 1. Numbers written with just 0 and 1, are called binary numbers.

Every input is converted into digital data, a string of zeroes and ones.

Manipulation

To send binary files through certain systems (such as e-mail) that do not allow all data values, they are often translated into a plain text representation (using, for example, Base64). This encoding has the disadvantage of increasing the file's size by approximately 30% during the transfer, as well as requiring translation back into binary after receipt. See Binary-to-text encoding for more on this subject.

Microsoft Windows allows the programmer to specify a system call parameter indicating if a file is text or binary; Unix does not, and treats all files as binary. This reflects the fact that the distinction between the two types of files is to a certain extent arbitrary.

Viewing

A hex editor or viewer may be used to view file data as a sequence of hexadecimal (or decimal, binary or ASCII character) values for corresponding bytes of a binary file.

If a binary file is opened in a text editor, each group of eight bits will typically be translated as a single character, and you will see a (probably unintelligible) display of textual characters. If the file were opened in some other application, that application will have its own use for each byte: maybe the application will treat each byte as a number and output a stream of numbers between 0 and 255 — or maybe interpret the numbers in the bytes as colors and display the corresponding picture. If the file is itself treated as an executable and run, then the operating system will attempt to interpret the file as a series of instructions in its machine language.

Interpretation

Standards are very important to binary files. For example, a binary file interpreted by the ASCII character set will result in text being displayed. A custom application can interpret the file differently, a byte may be a sound, or a pixel, or even an entire word. Binary itself is meaningless, until such time as an executed algorithm defines what should be done with each bit, byte, word or block. Thus, just examining the binary and attempting to match it against known formats can lead to the wrong conclusion as to what it actually represents. This fact can be used in steganography, where an algorithm interprets a binary data file differently to reveal hidden content. Without the algorithm, it is impossible to tell that hidden content exists.

Binary compatibility

Two files that are binary compatible will have the same pattern of zeros and ones in the data portion of the file. The file header, however, may be different.

The term is used most commonly to state that data files produced by one application are exactly the same as data files produced by another application. For example, many software companies now produce applications for Windows and the Macintosh that are binary compatible, which means that a file produced in a Windows environment is interchangeable with a file produced on a Macintosh. This avoids many of the conversion problems caused by importing and exporting data.

See also