HOME

TheInfoList



OR:

Computer stereo vision is the extraction of 3D information from digital images, such as those obtained by a
CCD camera A charge-coupled device (CCD) is an integrated circuit containing an array of linked, or coupled, capacitors. Under the control of an external circuit, each capacitor can transfer its electric charge to a neighboring capacitor. CCD sensors are a ...
. By comparing information about a scene from two vantage points, 3D information can be extracted by examining the relative positions of objects in the two panels. This is similar to the biological process of stereopsis.


Outline

In traditional stereo vision, two cameras, displaced horizontally from one another, are used to obtain two differing views on a scene, in a manner similar to human
binocular vision In biology, binocular vision is a type of vision in which an animal has two eyes capable of facing the same direction to perceive a single three-dimensional image of its surroundings. Binocular vision does not typically refer to vision where an ...
. By comparing these two images, the relative depth information can be obtained in the form of a disparity map, which encodes the difference in horizontal coordinates of corresponding image points. The values in this disparity map are inversely proportional to the scene depth at the corresponding pixel location. For a human to compare the two images, they must be superimposed in a stereoscopic device, with the image from the right camera being shown to the observer's right eye and from the left one to the left eye. In a computer vision system, several pre-processing steps are required. # The image must first be undistorted, such that barrel distortion and tangential distortion are removed. This ensures that the observed image matches the projection of an ideal pinhole camera. # The image must be projected back to a common plane to allow comparison of the image pairs, known as
image rectification Image rectification is a transformation process used to project images onto a common image plane. This process has several degrees of freedom and there are many strategies for transforming images to the common plane. Image rectification is used in ...
. # An information measure which compares the two images is minimized. This gives the best estimate of the position of features in the two images, and creates a disparity map. # Optionally, the received disparity map is projected into a 3d point cloud. By utilising the cameras' projective parameters, the point cloud can be computed such that it provides measurements at a known scale.


Active stereo vision

The active stereo vision is a form of stereo vision which actively employs a light such as a laser or a
structured light A structured light pattern designed for surface inspection An Automatix Seamtracker arc welding robot equipped with a camera and structured laser light source, enabling the robot to follow a welding seam automatically Structured light is the p ...
to simplify the stereo matching problem. The opposed term is passive stereo vision.


Conventional structured-light vision (SLV)

The conventional structured-light vision (SLV) employs a structured light or laser, and finds projector-camera correspondences.


Conventional active stereo vision (ASV)

The conventional active stereo vision (ASV) employs a structured light or laser, however, the stereo matching is performed only for camera-camera correspondences, in the same way as the passive stereo vision.


Structured-light stereo (SLS)

There is a hybrid technique, which utilizes both camera-camera and projector-camera correspondences.W. Jang, C. Je, Y. Seo, and S. W. Lee
Structured-Light Stereo: Comparative Analysis and Integration of Structured-Light and Active Stereo for Measuring Dynamic Shape
Optics and Lasers in Engineering, Volume 51, Issue 11, pp. 1255-1264, November, 2013.


Applications

3D
stereo display A 3D display is a display device capable of conveying depth to the viewer. Many 3D displays are stereoscopic displays, which produce a basic 3D effect by means of stereopsis, but can cause eye strain and visual fatigue. Newer 3D displays such ...
s find many applications in entertainment, information transfer and automated systems. Stereo vision is highly important in fields such as
robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrat ...
to extract information about the relative position of 3D objects in the vicinity of autonomous systems. Other applications for robotics include object recognition, where depth information allows for the system to separate occluding image components, such as one chair in front of another, which the robot may otherwise not be able to distinguish as a separate object by any other criteria. Scientific applications for digital stereo vision include the extraction of information from aerial surveys, for calculation of contour maps or even geometry extraction for 3D building mapping, photogrammetric satellite mapping, or calculation of 3D heliographical information such as obtained by the NASA
STEREO Stereophonic sound, or more commonly stereo, is a method of sound reproduction that recreates a multi-directional, 3-dimensional audible perspective. This is usually achieved by using two independent audio channels through a configuration ...
project.


Detailed definition

A pixel records color at a position. The position is identified by position in the grid of pixels (x, y) and depth to the pixel ''z.'' Stereoscopic vision gives two images of the same scene, from different positions. In the adjacent diagram light from the point ''A'' is transmitted through the entry points of pinhole cameras at ''B'' and ''D'', onto image screens at ''E'' and ''H''. In the attached diagram the distance between the centers of the two camera lens is ''BD = BC + CD''. The triangles are similar, * ''ACB'' and ''BFE'' * ''ACD'' and ''DGH'' \begin \textd &= EF + GH \\ &= BF (\frac + \frac) \\ &= BF (\frac + \frac) \\ &= BF (\frac) \\ &= BF \frac \\ &= \frac \text\\ \end * ''k = BD BF'' * ''z = AC'' is the distance from the camera plane to the object. So assuming the cameras are level, and image planes are flat on the same plane, the displacement in the y axis between the same pixel in the two images is, :d = \frac Where ''k'' is the distance between the two cameras times the distance from the lens to the image. The depth component in the two images are z_1 and z_2, given by, :z_2(x, y) = \min \left \ :z_1(x, y) = \min \left \ These formulas allow for the occlusion of voxels, seen in one image on the surface of the object, by closer voxels seen in the other image, on the surface of the object.


Image rectification

Where the image planes are not co-planar,
image rectification Image rectification is a transformation process used to project images onto a common image plane. This process has several degrees of freedom and there are many strategies for transforming images to the common plane. Image rectification is used in ...
is required to adjust the images as if they were co-planar. This may be achieved by a linear transformation. The images may also need rectification to make each image equivalent to the image taken from a pinhole camera projecting to a flat plane.


Smoothness

Smoothness is a measure of the similarity of colors. Given the assumption that a distinct object has a small number of colors, similarly-colored pixels are more likely to belong to a single object than to multiple objects. The method described above for evaluating smoothness is based on information theory, and an assumption that the influence of the color of a voxel influences the color of nearby voxels according to the normal distribution on the distance between points. The model is based on approximate assumptions about the world. Another method based on prior assumptions of smoothness is auto-correlation. Smoothness is a property of the world rather than an intrinsic property of an image. An image comprising random dots would have no smoothness, and inferences about neighboring points would be useless. In principle, smoothness, as with other properties of the world, should be learned. This appears to be what the human vision system does.


Information measure


Least squares information measure

The normal distribution is :P(x, \mu, \sigma) = \frac e^ Probability is related to information content described by message length ''L'', :P(x) = 2^ :L(x) = -\log_2 so, :L(x, \mu, \sigma) = \log_2(\sigma\sqrt) + \frac \log_2 e For the purposes of comparing stereoscopic images, only the relative message length matters. Based on this, the information measure ''I'', called the Sum of Squares of Differences (SSD) is, :I(x, \mu, \sigma) = \frac where, :L(x, \mu, \sigma) = \log_2(\sigma\sqrt) + I(x, \mu, \sigma) \frac Because of the cost in processing time of squaring numbers in SSD, many implementations use Sum of Absolute Difference (SAD) as the basis for computing the information measure. Other methods use normalized cross correlation (NCC).


Information measure for stereoscopic images

The
least squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the re ...
measure may be used to measure the information content of the stereoscopic images, given depths at each point z(x, y). Firstly the information needed to express one image in terms of the other is derived. This is called I_m. A
color difference In color science, color difference or color distance is the separation between two colors. This metric allows quantified examination of a notion that formerly could only be described with adjectives. Quantification of these properties is of great ...
function should be used to fairly measure the difference between colors. The color difference function is written ''cd'' in the following. The measure of the information needed to record the color matching between the two images is, :I_m(z_1, z_2) = \frac \sum_\operatorname(\operatorname_1(x, y + \frac), \operatorname_2(x, y))^2 An assumption is made about the smoothness of the image. Assume that two pixels are more likely to be the same color, the closer the voxels they represent are. This measure is intended to favor colors that are similar being grouped at the same depth. For example, if an object in front occludes an area of sky behind, the measure of smoothness favors the blue pixels all being grouped together at the same depth. The total measure of smoothness uses the distance between voxels as an estimate of the expected standard deviation of the color difference, :I_s(z_1, z_2) = \frac \sum_ \sum_ \sum_ \frac The total information content is then the sum, :I_t(z_1, z_2) = I_m(z_1, z_2) + I_s(z_1, z_2) The z component of each pixel must be chosen to give the minimum value for the information content. This will give the most likely depths at each pixel. The minimum total information measure is, :I_ = \min \} The depth functions for the left and right images are the pair, :(z_1, z_2) \in \


Methods of implementation

The minimization problem is NP-complete. This means a general solution to this problem will take a long time to reach. However methods exist for computers based on heuristics that approximate the result in a reasonable amount of time. Also methods exist based on
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s. Efficient implementation of stereoscopic vision is an area of active research.


See also

*
3D reconstruction from multiple images 3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes. The essence of an image is a projection from a 3D scene onto a 2D pl ...
*
3D scanner 3D scanning is the process of analyzing a real-world object or environment to collect data on its shape and possibly its appearance (e.g. color). The collected data can then be used to construct digital 3D models. A 3D scanner can be based on m ...
* Autostereoscopy *
Computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...
*
Epipolar geometry Epipolar geometry is the geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints b ...
* Semi-global matching * Structure from motion * Stereo camera * Stereophotogrammetry * Stereopsis * Stereoscopic depth rendition * Stixel *
Trifocal tensor In computer vision, the trifocal tensor (also tritensor) is a 3×3×3 array of numbers (i.e., a tensor) that incorporates all projective geometric relationships among three views. It relates the coordinates of corresponding points or lines in thr ...
- for trifocal stereoscopy (using three images instead of two)


References


External links


Tutorial on uncalibrated stereo visionStereo Vision and Rover Navigation Software for Planetary ExplorationCalculator for choosing stereo baseline and focal length, and for computing expected depth measurement errorsLIBELAS: Library for Efficient Large-scale Stereo MatchingViva3D Stereo Vision manual & depth tutorial
{{Stereoscopy Applications of computer vision Geometry in computer vision Vision Stereoscopy Stereophotogrammetry