HOME

TheInfoList



OR:

In
computer vision Computer vision is an Interdisciplinarity, interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate t ...
, the essential matrix is a 3 \times 3
matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
, \mathbf that relates
corresponding points The correspondence problem refers to the problem of ascertaining which parts of one image correspond to which parts of another image, where differences are due to movement of the camera, the elapse of time, and/or movement of objects in the photo ...
in stereo images assuming that the cameras satisfy the
pinhole camera model The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ''ideal'' pinhole camera, where the camera aperture is described as a p ...
.


Function

More specifically, if \mathbf and \mathbf' are homogeneous ''normalized'' image coordinates in image 1 and 2, respectively, then : (\mathbf')^\top \, \mathbf \, \mathbf = 0 if \mathbf and \mathbf' correspond to the same 3D point in the scene. The above relation which defines the essential matrix was published in 1981 by
H. Christopher Longuet-Higgins Hugh Christopher Longuet-Higgins (April 11, 1923 – March 27, 2004) was a British scholar and teacher. He was the Professor of Theoretical Chemistry at the University of Cambridge for 13 years until 1967 when he moved to the University of Edin ...
, introducing the concept to the computer vision community. Richard Hartley and Andrew Zisserman's book reports that an analogous matrix appeared in
photogrammetry Photogrammetry is the science and technology of obtaining reliable information about physical objects and the environment through the process of recording, measuring and interpreting photographic images and patterns of electromagnetic radiant ima ...
long before that. Longuet-Higgins' paper includes an algorithm for estimating \mathbf from a set of corresponding normalized image coordinates as well as an algorithm for determining the relative position and orientation of the two cameras given that \mathbf is known. Finally, it shows how the 3D coordinates of the image points can be determined with the aid of the essential matrix.


Use

The essential matrix can be seen as a precursor to the '' fundamental matrix'', \mathbf . Both matrices can be used for establishing constraints between matching image points, but the essential matrix can only be used in relation to calibrated cameras since the inner camera parameters (matrices \mathbf and \mathbf') must be known in order to achieve the normalization. If, however, the cameras are calibrated the essential matrix can be useful for determining both the relative position and orientation between the cameras and the 3D position of corresponding image points. The essential matrix is related to the fundamental matrix with : \mathbf = ()^ \; \mathbf \; \mathbf .


Derivation and definition

This derivation follows the paper by Longuet-Higgins. Two normalized cameras project the 3D world onto their respective image planes. Let the 3D coordinates of a point P be (x_1, x_2, x_3) and (x'_1, x'_2, x'_3) relative to each camera's coordinate system. Since the cameras are normalized, the corresponding image coordinates are : \begin y_1 \\ y_2 \end = \frac \begin x_1 \\ x_2 \end   and   \begin y'_1 \\ y'_2 \end = \frac \begin x'_1 \\ x'_2 \end A homogeneous representation of the two image coordinates is then given by : \begin y_1 \\ y_2 \\ 1 \end = \frac \begin x_1 \\ x_2 \\ x_ \end   and   \begin y'_1 \\ y'_2 \\ 1 \end = \frac \begin x'_1 \\ x'_2 \\ x'_ \end which also can be written more compactly as : \mathbf = \frac \, \tilde   and   \mathbf' = \frac \, \tilde' where \mathbf and \mathbf' are homogeneous representations of the 2D image coordinates and \tilde and \tilde' are proper 3D coordinates but in two different coordinate systems. Another consequence of the normalized cameras is that their respective coordinate systems are related by means of a translation and rotation. This implies that the two sets of 3D coordinates are related as : \tilde' = \mathbf \, (\tilde - \mathbf) where \mathbf is a 3 \times 3 rotation matrix and \mathbf is a 3-dimensional translation vector. The essential matrix is then defined as: : where mathbf is the matrix representation of the cross product with \mathbf . Note: Here, the transformation \mathbf will transform points in the 2nd view to the 1st view. For the definition of \mathbf we are only interested in the orientations of the normalized image coordinates (See also:
Triple product In geometry and algebra, the triple product is a product of three 3- dimensional vectors, usually Euclidean vectors. The name "triple product" is used for two different products, the scalar-valued scalar triple product and, less often, the vector ...
). As such we don't need the translational component when substituting image coordinates into the essential equation. To see that this definition of \mathbf describes a constraint on corresponding image coordinates multiply \mathbf from left and right with the 3D coordinates of point P in the two different coordinate systems: : \tilde'^ \, \mathbf \, \tilde \, \stackrel \,\tilde^ \, \mathbf^ \, \mathbf \, mathbf \, \tilde \, \stackrel \, \tilde^ \, mathbf \, \tilde \, \stackrel \, 0 # Insert the above relations between \tilde' and \tilde and the definition of \mathbf in terms of \mathbf and \mathbf . # \mathbf^ \, \mathbf = \mathbf since \mathbf is a rotation matrix. # Properties of the matrix representation of the cross product. Finally, it can be assumed that both x_ and x'_ are > 0, otherwise they are not visible in both cameras. This gives : 0 = (\tilde')^ \, \mathbf \, \tilde = \frac (\tilde')^ \, \mathbf \, \frac \tilde = (\mathbf')^ \, \mathbf \, \mathbf which is the constraint that the essential matrix defines between corresponding image points.


Properties

Not every arbitrary 3 \times 3 matrix can be an essential matrix for some stereo cameras. To see this notice that it is defined as the matrix product of one
rotation matrix In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix :R = \begin \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end ...
and one
skew-symmetric matrix In mathematics, particularly in linear algebra, a skew-symmetric (or antisymmetric or antimetric) matrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition In terms of the entries of the matrix, if a ...
, both 3 \times 3 . The skew-symmetric matrix must have two singular values which are equal and another which is zero. The multiplication of the rotation matrix does not change the singular values which means that also the essential matrix has two singular values which are equal and one which is zero. The properties described here are sometimes referred to as ''internal constraints'' of the essential matrix. If the essential matrix \mathbf is multiplied by a non-zero scalar, the result is again an essential matrix which defines exactly the same constraint as \mathbf does. This means that \mathbf can be seen as an element of a projective space, that is, two such matrices are considered equivalent if one is a non-zero scalar multiplication of the other. This is a relevant position, for example, if \mathbf is estimated from image data. However, it is also possible to take the position that \mathbf is defined as : \mathbf = mathbf \, \mathbf where \mathbf = -\mathbf\mathbf , and then \mathbf has a well-defined "scaling". It depends on the application which position is the more relevant. The constraints can also be expressed as : \det \mathbf = 0 and : 2 \mathbf \mathbf^T \mathbf - \operatorname ( \mathbf \mathbf^T ) \mathbf = 0 . Here, the last equation is a matrix constraint, which can be seen as 9 constraints, one for each matrix element. These constraints are often used for determining the essential matrix from five corresponding point pairs. The essential matrix has five or six degrees of freedom, depending on whether or not it is seen as a projective element. The rotation matrix \mathbf and the translation vector \mathbf have three degrees of freedom each, in total six. If the essential matrix is considered as a projective element, however, one degree of freedom related to scalar multiplication must be subtracted leaving five degrees of freedom in total.


Estimation

Given a set of corresponding image points it is possible to estimate an essential matrix which satisfies the defining epipolar constraint for all the points in the set. However, if the image points are subject to noise, which is the common case in any practical situation, it is not possible to find an essential matrix which satisfies all constraints exactly. Depending on how the error related to each constraint is measured, it is possible to determine or estimate an essential matrix which optimally satisfies the constraints for a given set of corresponding image points. The most straightforward approach is to set up a
total least squares In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalizat ...
problem, commonly known as the
eight-point algorithm The eight-point algorithm is an algorithm used in computer vision to estimate the essential matrix or the fundamental matrix related to a stereo camera pair from a set of corresponding image points. It was introduced by Christopher Longuet-Higgi ...
.


Extracting rotation and translation

Given that the essential matrix has been determined for a stereo camera pair -- for example, using the estimation method above -- this information can be used for determining also the rotation \mathbf and translation \mathbf (up to a scaling) between the two camera's coordinate systems. In these derivations \mathbf is seen as a projective element rather than having a well-determined scaling.


Finding one solution

The following method for determining \mathbf and \mathbf is based on performing a
SVD ''Svenska Dagbladet'' (, "The Swedish Daily News"), abbreviated SvD, is a daily List of Swedish newspapers, newspaper published in Stockholm, Sweden. History and profile The first issue of ''Svenska Dagbladet'' appeared on 18 December 1884. ...
of \mathbf , see Hartley & Zisserman's book. It is also possible to determine \mathbf and \mathbf without an SVD, for example, following Longuet-Higgins' paper. An SVD of \mathbf gives : \mathbf = \mathbf \, \mathbf \, \mathbf^ where \mathbf and \mathbf are orthogonal 3 \times 3 matrices and \mathbf is a 3 \times 3 diagonal matrix with : \mathbf = \begin s & 0 & 0 \\ 0 & s & 0 \\ 0 & 0 & 0 \end The diagonal entries of \mathbf are the singular values of \mathbf which, according to the internal constraints of the essential matrix, must consist of two identical and one zero value. Define : \mathbf = \begin 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end   with   \mathbf^ = \mathbf^ =\begin 0 & 1 & 0 \\ -1 & 0 & 0 \\ 0 & 0 & 1 \end and make the following
ansatz In physics and mathematics, an ansatz (; , meaning: "initial placement of a tool at a work piece", plural Ansätze ; ) is an educated guess or an additional assumption made to help solve a problem, and which may later be verified to be part of th ...
: mathbf = \mathbf \, \mathbf \, \mathbf \, \mathbf^ : \mathbf = \mathbf \, \mathbf^ \, \mathbf^ Since \mathbf may not completely fulfill the constraints when dealing with real world data (f.e. camera images), the alternative : mathbf = \mathbf \, \mathbf \, \mathbf^   with   \mathbf = \begin 0 & 1 & 0 \\ -1 & 0 & 0 \\ 0 & 0 & 0 \end may help.


Proof

First, these expressions for \mathbf and mathbf do satisfy the defining equation for the essential matrix : mathbf\,\mathbf = \mathbf \, \mathbf \, \mathbf \, \mathbf^ \mathbf \, \mathbf^ \, \mathbf^\, = \mathbf \, \mathbf \, \mathbf^ = \mathbf Second, it must be shown that this mathbf is a matrix representation of the cross product for some \mathbf . Since : \mathbf \, \mathbf = \begin 0 & -s & 0 \\ s & 0 & 0 \\ 0 & 0 & 0 \end it is the case that \mathbf \, \mathbf is skew-symmetric, i.e., (\mathbf \, \mathbf)^ = - \mathbf \, \mathbf . This is also the case for our mathbf , since : ( mathbf)^ = \mathbf \, (\mathbf \, \mathbf)^ \, \mathbf^ = - \mathbf \, \mathbf \, \mathbf \, \mathbf^ = - mathbf According to the general properties of the matrix representation of the cross product it then follows that mathbf must be the cross product operator of exactly one vector \mathbf . Third, it must also need to be shown that the above expression for \mathbf is a rotation matrix. It is the product of three matrices which all are orthogonal which means that \mathbf, too, is orthogonal or \det(\mathbf) = \pm 1 . To be a proper rotation matrix it must also satisfy \det(\mathbf) = 1 . Since, in this case, \mathbf is seen as a projective element this can be accomplished by reversing the sign of \mathbf if necessary.


Finding all solutions

So far one possible solution for \mathbf and \mathbf has been established given \mathbf . It is, however, not the only possible solution and it may not even be a valid solution from a practical point of view. To begin with, since the scaling of \mathbf is undefined, the scaling of \mathbf is also undefined. It must lie in the
null space In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector. That is, given a linear map between two vector spaces and , the kernel ...
of \mathbf since : \mathbf \, \mathbf = \mathbf \, mathbf \, \mathbf = \mathbf For the subsequent analysis of the solutions, however, the exact scaling of \mathbf is not so important as its "sign", i.e., in which direction it points. Let \hat be normalized vector in the null space of \mathbf . It is then the case that both \hat and -\hat are valid translation vectors relative \mathbf . It is also possible to change \mathbf into \mathbf^ in the derivations of \mathbf and \mathbf above. For the translation vector this only causes a change of sign, which has already been described as a possibility. For the rotation, on the other hand, this will produce a different transformation, at least in the general case. To summarize, given \mathbf there are two opposite directions which are possible for \mathbf and two different rotations which are compatible with this essential matrix. In total this gives four classes of solutions for the rotation and translation between the two camera coordinate systems. On top of that, there is also an unknown scaling s > 0 for the chosen translation direction. It turns out, however, that only one of the four classes of solutions can be realized in practice. Given a pair of corresponding image coordinates, three of the solutions will always produce a 3D point which lies ''behind'' at least one of the two cameras and therefore cannot be seen. Only one of the four classes will consistently produce 3D points which are in front of both cameras. This must then be the correct solution. Still, however, it has an undetermined positive scaling related to the translation component. The above determination of \mathbf and \mathbf assumes that \mathbf satisfy the internal constraints of the essential matrix. If this is not the case which, for example, typically is the case if \mathbf has been estimated from real (and noisy) image data, it has to be assumed that it approximately satisfy the internal constraints. The vector \hat is then chosen as right singular vector of \mathbf corresponding to the smallest singular value.


3D points from corresponding image points

Many methods exist for computing (x_, x_, x_) given corresponding normalized image coordinates (y_, y_) and (y'_, y'_) , if the essential matrix is known and the corresponding rotation and translation transformations have been determined.


See also

*
Bundle adjustment In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D Coordinate system, coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s ...
*
Epipolar geometry Epipolar geometry is the geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints ...
* Fundamental matrix * Geometric camera calibration * Triangulation (computer vision) *
Trifocal tensor In computer vision, the trifocal tensor (also tritensor) is a 3×3×3 array of numbers (i.e., a tensor) that incorporates all projective geometric relationships among three views. It relates the coordinates of corresponding points or lines in thr ...


Toolboxes


Essential Matrix Estimation
in MATLAB (Manolis Lourakis).


External links


An Investigation of the Essential Matrix
by R.I. Hartley


References

* * * * * * * {{refend Geometry in computer vision