Euclidean geometry Euclidean geometry is a mathematical system attributed to ancient Greek mathematics, Greek mathematician Euclid, which he described in his textbook on geometry, ''Euclid's Elements, Elements''. Euclid's approach consists in assuming a small set ...

, linear separability is a property of two sets of points. This is most easily visualized in two dimensions (the

Euclidean plane In mathematics, a Euclidean plane is a Euclidean space of Two-dimensional space, dimension two, denoted \textbf^2 or \mathbb^2. It is a geometric space in which two real numbers are required to determine the position (geometry), position of eac ...

) by thinking of one set of points as being colored blue and the other set of points as being colored red. These two sets are ''linearly separable'' if there exists at least one line in the plane with all of the blue points on one side of the line and all the red points on the other side. This idea immediately generalizes to higher-dimensional Euclidean spaces if the line is replaced by a

hyperplane In geometry, a hyperplane is a generalization of a two-dimensional plane in three-dimensional space to mathematical spaces of arbitrary dimension. Like a plane in space, a hyperplane is a flat hypersurface, a subspace whose dimension is ...

. The problem of determining if a pair of sets is linearly separable and finding a separating hyperplane if they are, arises in several areas. In

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

and

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

, classifying certain types of data is a problem for which good algorithms exist that are based on this concept.

Mathematical definition

Let

X_

and

X_

be two sets of points in an ''n''-dimensional Euclidean space. Then

X_

and

X_

are ''linearly separable'' if there exist ''n'' + 1 real numbers

w_, w_,..,w_, k

, such that every point

x \in X_

satisfies

\sum^_ w_x_ > k

and every point

x \in X_

satisfies

\sum^_ w_x_ < k

, where

x_

is the

i

-th component of

x

. Equivalently, two sets are linearly separable precisely when their respective

convex hull In geometry, the convex hull, convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space, ...

s are disjoint (colloquially, do not overlap). In simple 2D, it can also be imagined that the set of points under a linear transformation collapses into a line, on which there exists a value, k, greater than which one set of points will fall into, and lesser than which the other set of points fall.

Examples

Three non-

collinear In geometry, collinearity of a set of Point (geometry), points is the property of their lying on a single Line (geometry), line. A set of points with this property is said to be collinear (sometimes spelled as colinear). In greater generality, t ...

points in two classes ('+' and '-') are always linearly separable in two dimensions. This is illustrated by the three examples in the following figure (the all '+' case is not shown, but is similar to the all '-' case): However, not all sets of four points, no three collinear, are linearly separable in two dimensions. The following example would need ''two'' straight lines and thus is not linearly separable: Notice that three points which are collinear and of the form "+ ⋅⋅⋅ — ⋅⋅⋅ +" are also not linearly separable.

Number of linear separations

Let

T(N, K)

be the number of ways to linearly separate ''N'' points (in general position) in ''K'' dimensions, then

T(N, K)=\left\{\begin{array}{cc}
2^N & K \geq N \\
2 \sum_{k=0}^{K-1}\left(\begin{array}{c}
N-1 \\
k
\end{array}\right) & K When ''K'' is large, T(N, K)/2^N is very close to one when N \leq 2K, but very close to zero when N> 2K . In words, one

perceptron In machine learning, the perceptron is an algorithm for supervised classification, supervised learning of binary classification, binary classifiers. A binary classifier is a function that can decide whether or not an input, represented by a vect ...

unit can almost certainly memorize a random assignment of binary labels on N points when

N \leq 2K

, but almost certainly not when

N> 2K

Linear separability of Boolean functions in ''n'' variables

Boolean function In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth functi ...

in ''n'' variables can be thought of as an assignment of ''0'' or ''1'' to each vertex of a Boolean

hypercube In geometry, a hypercube is an ''n''-dimensional analogue of a square ( ) and a cube ( ); the special case for is known as a ''tesseract''. It is a closed, compact, convex figure whose 1- skeleton consists of groups of opposite parallel l ...

in ''n'' dimensions. This gives a natural division of the vertices into two sets. The Boolean function is said to be ''linearly separable'' provided these two sets of points are linearly separable. The number of distinct Boolean functions is

2^{2^{n

where ''n'' is the number of variables passed into the function. Such functions are also called linear threshold logic, or perceptrons. The classical theory is summarized in, as Knuth claims. The value is only known exactly up to

n=9

case, but the order of magnitude is known quite exactly: it has upper bound

2^{n^2 - n \log_2 n + O(n)}

and lower bound

2^{n^2 - n \log_2 n - O(n)}

. It is co-NP-complete to decide whether a Boolean function given in disjunctive or conjunctive normal form is linearly separable. {, class="wikitable" , +Number of linearly separable Boolean functions in each dimension !Number of variables !Boolean functions !Linearly separable Boolean functions , - , 2 , 16, , 14 , - , 3 , 256, , 104 , - , 4 , 65536, , 1882 , - , 5 , 4294967296, , 94572 , - , 6 , 18446744073709552000, , 15028134 , - , 7 , 3.402823669 ×10^38 , 8378070864 , - , 8 , 1.157920892 ×10^77, , 17561539552946 , - , 9 , 1.340780792 ×10^154, , 144130531453121108

Threshold logic

A linear threshold logic gate is a Boolean function defined by

n

weights

w_1, \dots, w_n

and a threshold

\theta

. It takes

n

binary inputs

x_1, \dots, x_n

, and outputs 1 if

\sum_i w_i x_i > \theta

, and otherwise outputs 0. For any fixed

n

, because there are only finitely many Boolean functions that can be computed by a threshold logic unit, it is possible to set all

w_1, \dots, w_n, \theta

to be integers. Let

W(n)

be the smallest number

W

such that every possible real threshold function of

n

variables can be realized using integer weights of absolute value

\leq W

. It is known that

\frac 12 n \log n - 2n + o(n) \leq \log_2 W(n) \leq \frac 12 n \log n - n + o(n)

See for a literature review.

Support vector machines

Classifying data is a common task in

. Suppose some data points, each belonging to one of two sets, are given and we wish to create a model that will decide which set a ''new'' data point will be in. In the case of

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...

s, a data point is viewed as a ''p''-dimensional vector (a list of ''p'' numbers), and we want to know whether we can separate such points with a (''p'' − 1)-dimensional

. This is called a linear classifier. There are many hyperplanes that might classify (separate) the data. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two sets. So we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized. If such a hyperplane exists, it is known as the ''

maximum-margin hyperplane In geometry, the hyperplane separation theorem is a theorem about Disjoint sets, disjoint convex sets in ''n''-dimensional Euclidean space. There are several rather similar versions. In one version of the theorem, if both these sets are Closed se ...

'' and the linear classifier it defines is known as a ''maximum

margin classifier In machine learning (ML), a margin classifier is a type of classification model which is able to give an associated distance from the decision boundary for each data sample. For instance, if a linear classifier is used, the distance (typically Eu ...

''. More formally, given some training data

\mathcal{D}

, a set of ''n'' points of the form :

\mathcal{D} = \left\{ (\mathbf{x}_i, y_i)\mid\mathbf{x}_i \in \mathbb{R}^p,\, y_i \in \{-1,1\}\right\}_{i=1}^n

where the ''y''_''i'' is either 1 or −1, indicating the set to which the point

\mathbf{x}_i

belongs. Each

\mathbf{x}_i

is a ''p''-dimensional real vector. We want to find the maximum-margin hyperplane that divides the points having

y_i=1

from those having

y_i=-1

. Any hyperplane can be written as the set of points

\mathbf{x}

satisfying :

\mathbf{w}\cdot\mathbf{x} - b=0,

where

\cdot

denotes the

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...

and

{\mathbf{w

the (not necessarily normalized)

normal vector In geometry, a normal is an object (e.g. a line, ray, or vector) that is perpendicular to a given object. For example, the normal line to a plane curve at a given point is the infinite straight line perpendicular to the tangent line to the cu ...

to the hyperplane. The parameter

\tfrac{b}{\, \mathbf{w}\

determines the offset of the hyperplane from the origin along the normal vector

{\mathbf{w

. If the training data are linearly separable, we can select two hyperplanes in such a way that they separate the data and there are no points between them, and then try to maximize their distance.

References

{{reflist Geometry Convex analysis Machine learning