In
digital circuits
Digital electronics is a field of electronics involving the study of digital signals and the engineering of devices that use or produce them. This is in contrast to analog electronics and analog signals.
Digital electronic circuits are usually ...
and
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, a one-hot is a group of
bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0). A similar implementation in which all bits are '1' except one '0' is sometimes called one-cold. In
statistics,
dummy variables represent a similar technique for representing
categorical data
In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group o ...
.
Applications
Digital circuitry
One-hot encoding is often used for indicating the state of a
state machine
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. When using
binary
Binary may refer to:
Science and technology Mathematics
* Binary number, a representation of numbers using only two digits (0 and 1)
* Binary function, a function that takes two arguments
* Binary operation, a mathematical operation that ta ...
, a
decoder
Decoder may refer to:
Technology
* Audio decoder converts digital audio to analog form
* Binary decoder, digital circuits such as 1-of-N and seven-segment decoders
* Decompress (compression decoder), converts compressed data (e.g., audio/video/im ...
is needed to determine the state. A one-hot state machine, however, does not need a decoder as the state machine is in the ''n''th state if, and only if, the ''n''th bit is high.
A
ring counter
A ring counter is a type of counter composed of flip-flops connected into a shift register, with the output of the last flip-flop fed to the input of the first, making a "circular" or "ring" structure.
There are two types of ring counters:
* A s ...
with 15 sequentially ordered states is an example of a state machine. A 'one-hot' implementation would have 15
flip flops
Flip-flops are a type of light sandal, typically worn as a form of casual footwear. They consist of a flat sole held loosely on the foot by a Y-shaped strap known as a toe thong that passes between the first and second toes and around both side ...
chained in series with the Q output of each flip flop connected to the D input of the next and the D input of the first flip flop connected to the Q output of the 15th flip flop. The first flip flop in the chain represents the first state, the second represents the second state, and so on to the 15th flip flop, which represents the last state. Upon reset of the state machine all of the flip flops are reset to '0' except the first in the chain, which is set to '1'. The next clock edge arriving at the flip flops advances the one 'hot' bit to the second flip flop. The 'hot' bit advances in this way until the 15th state, after which the state machine returns to the first state.
An
address decoder
In digital electronics, an address decoder is a binary decoder that has two or more inputs for address bits and one or more outputs for device selection signals. When the address for a particular device appears on the address inputs, the decoder as ...
converts from binary to one-hot representation.
A
priority encoder A priority encoder is a circuit or algorithm that compresses multiple binary inputs into a smaller number of outputs. The output of a priority encoder is the binary representation of the index of the most significant activated line, starting from z ...
converts from one-hot representation to binary.
Comparison with other encoding methods
=Advantages
=
*Determining the state has a low and constant cost of accessing one
flip-flop
*Changing the state has the constant cost of accessing two flip-flops
*Easy to design and modify
*Easy to detect illegal states
*Takes advantage of an
FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term ''Field-programmability, field-programmable''. The FPGA configuration is generally specifi ...
's abundant flip-flops
*Using a one-hot implementation typically allows a state machine to run at a faster clock rate than any other encoding of that state machine
=Disadvantages
=
*Requires more flip-flops than other encodings, making it impractical for
PAL
Phase Alternating Line (PAL) is a colour encoding system for analogue television. It was one of three major analogue colour television standards, the others being NTSC and SECAM. In most countries it was broadcast at 625 lines, 50 fields (25 ...
devices
*Many of the states are illegal
Natural language processing
In
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
, a one-hot vector is a 1 × ''N'' matrix (vector) used to distinguish each word in a vocabulary from every other word in the vocabulary. The vector consists of 0s in all cells with the exception of a single 1 in a cell used uniquely to identify the word. One-hot encoding ensures that machine learning does not assume that higher numbers are more important. For example, the value '8' is bigger than the value '1', but that does not make '8' more important than '1'. The same is true for words: the value 'laughter' is not more important than 'laugh'.
Machine learning and statistics
In machine learning, one-hot encoding is a frequently used method to deal with categorical data. Because many machine learning models need their input variables to be numeric, categorical variables need to be transformed in the pre-processing part.
Categorical data can be either nominal or ordinal. Ordinal data has a ranked order for its values and can therefore be converted to numerical data through ordinal encoding. An example of ordinal data would be the ratings on a test ranging from A to F, which could be ranked using numbers from 6 to 1. Since there is no quantitative relationship between nominal variables' individual values, using ordinal encoding can potentially create a fictional ordinal relationship in the data. Therefore, one-hot encoding is often applied to nominal variables, in order to improve the performance of the algorithm.
For each unique value in the original categorical column, a new column is created in this method. These dummy variables are then filled up with zeros and ones (1 meaning TRUE, 0 meaning FALSE).
Because this process creates multiple new variables, it is prone to creating a 'big p' problem (too many predictors) if there are many unique values in the original column. Another downside of one-hot encoding is that it causes multicollinearity between the individual variables, which potentially reduces the model's accuracy.
Also, if the categorical variable is an output variable, you may want to convert the values back into a categorical form in order to present them in your application.
In practical usage, this transformation is often directly performed by a function that takes categorical data as an input and outputs the corresponding dummy variables. An example would be the dummyVars function of the Caret library in R.
[Kuhn, Max. “dummyVars”. RDocumentation. https://www.rdocumentation.org/packages/caret/versions/6.0-86/topics/dummyVars]
See also
*
Bi-quinary coded decimal
Bi-quinary coded decimal is a numeral encoding scheme used in many abacuses and in some early computers, including the Colossus. The term ''bi-quinary'' indicates that the code comprises both a two-state (''bi'') and a five-state (''quin''ary) ...
*
Binary decoder
In digital electronics, a binary decoder is a combinational logic circuit that converts binary information from the n coded inputs to a maximum of 2n unique outputs. They are used in a wide variety of applications, including instruction decodin ...
*
Gray code
The reflected binary code (RBC), also known as reflected binary (RB) or Gray code after Frank Gray, is an ordering of the binary numeral system such that two successive values differ in only one bit (binary digit).
For example, the representat ...
*
Kronecker delta
In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers. The function is 1 if the variables are equal, and 0 otherwise:
\delta_ = \begin
0 &\text i \neq j, \\
1 ...
*
Indicator vector In mathematics, the indicator vector or characteristic vector or incidence vector of a subset ''T'' of a set ''S'' is the vector x_T := (x_s)_ such that x_s = 1 if s \in T and x_s = 0 if s \notin T.
If ''S'' is countable and its elements are numbe ...
*
Serial decimal In computers, a serial decimal numeric representation is one in which ten bits are reserved for each digit, with a different bit turned on depending on which of the ten possible digits is intended. ENIAC and CALDIC used this representation.
See al ...
*
Single-entry vector
In linear algebra, a matrix unit is a matrix with only one nonzero entry with value 1. The matrix unit with a 1 in the ''i''th row and ''j''th column is denoted as E_. For example, the 3 by 3 matrix unit with ''i'' = 1 and ''j'' = 2 is
E_ = \begi ...
*
Unary numeral system
The unary numeral system is the simplest numeral system to represent natural numbers: to represent a number ''N'', a symbol representing 1 is repeated ''N'' times.
In the unary system, the number 0 (zero) is represented by the empty string, th ...
*
Uniqueness quantification
In mathematics and logic, the term "uniqueness" refers to the property of being the one and only object satisfying a certain condition. This sort of quantification is known as uniqueness quantification or unique existential quantification, and ...
*
XOR gate
XOR gate (sometimes EOR, or EXOR and pronounced as Exclusive OR) is a digital logic gate that gives a true (1 or HIGH) output when the number of true inputs is odd. An XOR gate implements an exclusive or (\nleftrightarrow) from mathematical log ...
References
{{Reflist
Digital electronics
1 (number)