Data binning, also called data discrete binning or data bucketing, is a
data pre-processing Data preprocessing can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance, and is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to ...
technique used to reduce the effects of minor observation errors. The original data values which fall into a given small interval, a ''
bin'', are replaced by a value representative of that interval, often a
central value (
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
or
median). It is related to
quantization: data binning operates on the
abscissa
In common usage, the abscissa refers to the (''x'') coordinate and the ordinate refers to the (''y'') coordinate of a standard two-dimensional graph.
The distance of a point from the y-axis, scaled with the x-axis, is called abscissa or x coo ...
axis while quantization operates on the
ordinate
In common usage, the abscissa refers to the (''x'') coordinate and the ordinate refers to the (''y'') coordinate of a standard two-dimensional graph.
The distance of a point from the y-axis, scaled with the x-axis, is called abscissa or x coo ...
axis. Binning is a generalization of
rounding
Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression with .
Rounding is often done to obta ...
.
Statistical data binning is a way to group numbers of more-or-less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together). It can also be used in
multivariate statistics
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.
Multivariate statistics concerns understanding the different aims and background of each of the dif ...
, binning in several dimensions at once.
In
digital image processing
Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allow ...
, "binning" has a very different meaning.
Pixel binning Pixel binning, often called binning, is the process of combining adjacent pixels throughout an image, by summing or averaging their values, during or after readout.
Charge from adjacent pixels in CCD image sensors and some other image sensors can b ...
is the process of combining blocks of adjacent
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device.
In most digital display devices, pixels are the s ...
s throughout an image, by summing or averaging their values, during or after readout. It reduces the amount of data; also the relative noise level in the result is lower.
Example usage
Histogram
A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or " bucket") the range of values—that is, divide the ent ...
s are an example of data binning used in order to observe underlying
frequency distributions. They typically occur in
one-dimensional space
In physics and mathematics, a sequence of ''n'' numbers can specify a location in ''n''-dimensional space. When , the set of all such locations is called a one-dimensional space. An example of a one-dimensional space is the number line, where th ...
and in
equal
Equal(s) may refer to:
Mathematics
* Equality (mathematics).
* Equals sign (=), a mathematical symbol used to indicate equality.
Arts and entertainment
* ''Equals'' (film), a 2015 American science fiction film
* ''Equals'' (game), a board game
...
intervals for ease of visualization.
Data binning may be used when small instrumental shifts in the spectral dimension from
mass spectrometry
Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a '' mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is u ...
(MS) or
nuclear magnetic resonance
Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
(NMR) experiments will be falsely interpreted as representing different components, when a collection of data profiles is subjected to
pattern recognition
Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphic ...
analysis. A straightforward way to cope with this problem is by using binning techniques in which the spectrum is reduced in resolution to a sufficient degree to ensure that a given peak remains in its bin despite small spectral shifts between analyses. For example, in
NMR the
chemical shift
In nuclear magnetic resonance (NMR) spectroscopy, the chemical shift is the resonant frequency of an atomic nucleus relative to a standard in a magnetic field. Often the position and number of chemical shifts are diagnostic of the structure of a ...
axis may be discretized and coarsely binned, and in
MS the spectral accuracies may be rounded to integer
atomic mass unit
The dalton or unified atomic mass unit (symbols: Da or u) is a non-SI unit of mass widely used in physics and chemistry. It is defined as of the mass of an unbound neutral atom of carbon-12 in its nuclear and electronic ground state and at r ...
values. Also, several
digital camera
A digital camera is a camera that captures photographs in digital memory. Most cameras produced today are digital, largely replacing those that capture images on photographic film. Digital cameras are now widely incorporated into mobile devic ...
systems incorporate an automatic pixel binning function to improve image contrast.
Binning is also used in machine learning to speed up
the decision-tree
boosting method for supervised classification and regression in algorithms such as
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
's
LightGBM and
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support-vector ...
'
Histogram-based Gradient Boosting Classification Tree
See also
*
Binning (disambiguation)
*
Discretization of continuous features
*
Grouped data
*
Histogram
A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or " bucket") the range of values—that is, divide the ent ...
*
Level of measurement
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...
*
Quantization (signal processing)
Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and ...
*
Rounding
Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression with .
Rounding is often done to obta ...
References
Statistical data coding
{{Statistics-stub