Binary data is

data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...

whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the

binary numeral system A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method for representing numbers that uses only two symbols for the natural numbers: typically "0" ( zero) and "1" ( one). A ''binary number'' may als ...

and Boolean algebra. Binary data occurs in many different technical and scientific fields, where it can be called by different names including '' bit'' (binary digit) in

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, ''

truth value In logic and mathematics, a truth value, sometimes called a logical value, is a value indicating the relation of a proposition to truth, which in classical logic has only two possible values ('' true'' or '' false''). Truth values are used in ...

'' in

mathematical logic Mathematical logic is the study of Logic#Formal logic, formal logic within mathematics. Major subareas include model theory, proof theory, set theory, and recursion theory (also known as computability theory). Research in mathematical logic com ...

and related domains and '' binary variable'' in statistics.

Mathematical and combinatoric foundations

discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory * Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit * Discrete group, ...

variable that can take only one state contains zero

information Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...

, and is the next

natural number In mathematics, the natural numbers are the numbers 0, 1, 2, 3, and so on, possibly excluding 0. Some start counting with 0, defining the natural numbers as the non-negative integers , while others start with 1, defining them as the positive in ...

after 1. That is why the bit, a variable with only two possible values, is a standard primary unit of information. A collection of bits may have states: see

binary number A binary number is a number expressed in the Radix, base-2 numeral system or binary numeral system, a method for representing numbers that uses only two symbols for the natural numbers: typically "0" (zero) and "1" (one). A ''binary number'' may ...

for details. Number of states of a collection of discrete variables depends exponentially on the number of variables, and only as a

power law In statistics, a power law is a Function (mathematics), functional relationship between two quantities, where a Relative change and difference, relative change in one quantity results in a relative change in the other quantity proportional to the ...

on number of states of each variable. Ten bits have more () states than three decimal digits (). bits are more than sufficient to represent an information (a

number A number is a mathematical object used to count, measure, and label. The most basic examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers can ...

or anything else) that requires decimal digits, so information contained in discrete variables with 3, 4, 5, 6, 7, 8, 9, 10... states can be ever superseded by allocating two, three, or four times more bits. So, the use of any other small number than 2 does not provide an advantage. Moreover, Boolean algebra provides a convenient mathematical structure for collection of bits, with a semantic of a collection of propositional variables. Boolean algebra operations are known as "

bitwise operation In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operatio ...

s" in computer science. Boolean functions are also well-studied theoretically and easily implementable, either with

computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...

s or by so-named

logic gate A logic gate is a device that performs a Boolean function, a logical operation performed on one or more binary inputs that produces a single binary output. Depending on the context, the term may refer to an ideal logic gate, one that has, for ...

s in

digital electronics Digital electronics is a field of electronics involving the study of digital signals and the engineering of devices that use or produce them. It deals with the relationship between Binary number, binary inputs and outputs by passing electrical s ...

. This contributes to the use of bits to represent different data, even those originally not binary.

In statistics

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, binary data is a statistical data type consisting of categorical data, that can take exactly two possible values, such as "A" and "B", or "heads" and "tails". It is also called dichotomous data, and an older term is quantal data. The two values are often referred to generically as "success" and "failure". As a form of categorical data, binary data is nominal data, meaning the values are qualitatively different and cannot be compared numerically. However, the values are frequently represented as 1 or 0, which corresponds to counting the number of successes in a single trial: 1 (success…) or 0 (failure); see . More intuitively, binary data can be represented as count data. Often, binary data is used to represent one of two conceptually opposed values, e.g.: *the outcome of an experiment ("success" or "failure") *the response to a yes–no question ("yes" or "no") *presence or absence of some feature ("is present" or "is not present") *the truth or falsehood of a proposition ("true" or "false", "correct" or "incorrect") However, it can also be used for data that is assumed to have only two possible values, even if they are not conceptually opposed or conceptually represent all possible values in the space. For example, binary data is often used to represent the party choices of voters in elections in the United States, i.e. Republican or Democratic. In this case, there is no inherent reason why only two

political parties A political party is an organization that coordinates candidates to compete in a particular area's elections. It is common for the members of a party to hold similar ideas about politics, and parties may promote specific ideological or p ...

should exist, and indeed, other parties do exist in the U.S., but they are so minor that they are generally simply ignored. Modeling continuous data (or categorical data of more than 2 categories) as a binary variable for analysis purposes is called dichotomization (creating a

dichotomy A dichotomy () is a partition of a set, partition of a whole (or a set) into two parts (subsets). In other words, this couple of parts must be * jointly exhaustive: everything must belong to one part or the other, and * mutually exclusive: nothi ...

). Like all discretization, it involves discretization error, but the goal is to learn something valuable despite the error: treating it as negligible for the purpose at hand, but remembering that it cannot be assumed to be negligible in general.

Binary variables

A binary variable is a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

of binary type, meaning with two possible values. Independent and identically distributed (i.i.d.) binary variables follow a Bernoulli distribution, but in general binary data need not come from i.i.d. variables. Total counts of i.i.d. binary variables (equivalently, sums of i.i.d. binary variables coded as 1 or 0) follow a

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

, but when binary variables are not i.i.d., the distribution need not be binomial.

Counting

Like categorical data, binary data can be converted to a vector of count data by writing one coordinate for each possible value, and counting 1 for the value that occurs, and 0 for the value that does not occur. For example, if the values are A and B, then the data set A, A, B can be represented in counts as (1, 0), (1, 0), (0, 1). Once converted to counts, binary data can be grouped and the counts added. For instance, if the set A, A, B is grouped, the total counts are (2, 1): 2 A's and 1 B (out of 3 trials). Since there are only two possible values, this can be simplified to a single count (a scalar value) by considering one value as "success" and the other as "failure", coding a value of the success as 1 and of the failure as 0 (using only the coordinate for the "success" value, not the coordinate for the "failure" value). For example, if the value A is considered "success" (and thus B is considered "failure"), the data set A, A, B would be represented as 1, 1, 0. When this is grouped, the values are added, while the number of trial is generally tracked implicitly. For example, A, A, B would be grouped as 1 + 1 + 0 = 2 successes (out of

n = 3

trials). Going the other way, count data with

n = 1

is binary data, with the two classes being 0 (failure) or 1 (success). Counts of i.i.d. binary variables follow a binomial distribution, with the total number of trials (points in the grouped data).

Regression

Regression analysis on predicted outcomes that are binary variables is known as binary regression; when binary data is converted to count data and modeled as i.i.d. variables (so they have a binomial distribution), binomial regression can be used. The most common regression methods for binary data are

logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...

probit regression In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to e ...

, or related types of binary choice models. Similarly, counts of i.i.d. categorical variables with more than two categories can be modeled with a multinomial regression. Counts of non-i.i.d. binary data can be modeled by more complicated distributions, such as the beta-binomial distribution (a compound distribution). Alternatively, the ''relationship'' can be modeled without needing to explicitly model the distribution of the output variable using techniques from

generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...

s, such as quasi-likelihood and a quasibinomial model; see .

In computer science

In modern

computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...

s, binary data refers to any data represented in binary form rather than interpreted on a higher level or converted into some other form. At the lowest level, bits are stored in a bistable device such as a flip-flop. While most binary data has

symbol A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...

ic meaning (except for don't cares) not all binary data is numeric. Some binary data corresponds to computer instructions, such as the data within processor registers decoded by the control unit along the

fetch-decode-execute cycle The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch–execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instructions ...

. Computers rarely modify individual bits for performance reasons. Instead, data is aligned in groups of a fixed number of bits, usually 1

byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...

(8 bits). Hence, "binary data" in computers are actually sequences of bytes. On a higher level, data is accessed in groups of 1

word A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...

(4 bytes) for 32-bit systems and 2 words for 64-bit systems. In applied

and in the

information technology Information technology (IT) is a set of related fields within information and communications technology (ICT), that encompass computer systems, software, programming languages, data processing, data and information processing, and storage. Inf ...

field, the term ''binary data'' is often specifically opposed to '' text-based data'', referring to any sort of data that cannot be interpreted as

text Text may refer to: Written word * Text (literary theory) In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothi ...

. The "text" vs. "binary" distinction can sometimes refer to the semantic content of a file (e.g. a written document vs. a digital image). However, it often refers specifically to whether the individual bytes of a file are interpretable as text (see

character encoding Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...

) or cannot so be interpreted. When this last meaning is intended, the more specific terms ''binary format'' and ''text(ual) format'' are sometimes used. Semantically textual data can be represented in binary format (e.g. when compressed or in certain formats that intermix various sorts of formatting codes, as in the doc format used by

Microsoft Word Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...

); contrarily, image data is sometimes represented in textual format (e.g. the X PixMap image format used in the

X Window System The X Window System (X11, or simply X) is a windowing system for bitmap displays, common on Unix-like operating systems. X originated as part of Project Athena at Massachusetts Institute of Technology (MIT) in 1984. The X protocol has been at ...

). 1 and 0 are nothing but just two different voltage levels. You can make the computer understand 1 for higher voltage and 0 for lower voltage. There are many different ways to store two voltage levels. If you have seen floppy, then you will find a magnetic tape that has a coating of ferromagnetic material, this is a type of paramagnetic material that has domains aligned in a particular direction to give a remnant magnetic field even after removal of currents through materials or magnetic field. During loading of data in the magnetic tape, the magnetic field is passed in one direction to call the saved orientation of the domain 1 and for the magnetic field is passed in another direction, then the saved orientation of the domain is 0. In this way, generally, 1 and 0 data are stored.

References

* {{refend Statistical data types