The BFR algorithm, named after its inventors Bradley, Fayyad and Reina, is a variant of
k-means algorithm
''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers ...
that is designed to cluster data in a high-dimensional
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean sp ...
. It makes a very strong assumption about the shape of clusters: they must be
normally distributed
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu is ...
about a
centroid
In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any ...
. The
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
and
standard deviation for a cluster may differ for different dimensions, but the dimensions must be independent.
References
{{Reflist
Cluster analysis algorithms