computer stereo vision
   HOME

TheInfoList



OR:

Computer stereo vision is the extraction of 3D information from digital images, such as those obtained by a
CCD camera A charge-coupled device (CCD) is an integrated circuit containing an array of linked, or coupled, capacitors. Under the control of an external circuit, each capacitor can transfer its electric charge to a neighboring capacitor. CCD sensors are a ...
. By comparing information about a scene from two vantage points, 3D information can be extracted by examining the relative positions of objects in the two panels. This is similar to the biological process of
stereopsis Binocular vision is seeing with two eyes, which increases the size of the Visual field, visual field. If the visual fields of the two eyes overlap, binocular #Depth, depth can be seen. This allows objects to be recognized more quickly, camouflage ...
.


Outline

In traditional stereo vision, two cameras, displaced horizontally from one another, are used to obtain two differing views on a scene, in a manner similar to human
binocular vision Binocular vision is seeing with two eyes. The Field_of_view, field of view that can be surveyed with two eyes is greater than with one eye. To the extent that the visual fields of the two eyes overlap, #Depth, binocular depth can be perceived. Th ...
. By comparing these two images, the relative depth information can be obtained in the form of a disparity map, which encodes the difference in horizontal coordinates of corresponding image points. The values in this disparity map are inversely proportional to the scene depth at the corresponding pixel location. For a human to compare the two images, they must be superimposed in a stereoscopic device, with the image from the right camera being shown to the observer's right eye and from the left one to the left eye. In a computer vision system, several pre-processing steps are required. # The image must first be undistorted, such that barrel distortion and tangential distortion are removed. This ensures that the observed image matches the projection of an ideal
pinhole camera A pinhole camera is a simple camera without a lens but with a tiny aperture (the so-called ''Pinhole (optics), pinhole'')—effectively a light-proof box with a small hole in one side. Light from a scene passes through the aperture and projects a ...
. # The image must be projected back to a common plane to allow comparison of the image pairs, known as image rectification. # An information measure which compares the two images is minimized. This gives the best estimate of the position of features in the two images, and creates a disparity map. # Optionally, the received disparity map is projected into a 3d point cloud. By utilising the cameras' projective parameters, the point cloud can be computed such that it provides measurements at a known scale.


Active stereo vision

The active stereo vision is a form of stereo vision which actively employs a light such as a laser or a structured light to simplify the stereo matching problem. The opposed term is passive stereo vision. * Conventional structured-light vision (SLV) employs a structured light or laser, and finds projector-camera correspondences. * Conventional active stereo vision (ASV) employs a structured light or laser, however, the stereo matching is performed only for camera-camera correspondences, in the same way as the passive stereo vision. * Structured-light stereo (SLS) is a hybrid technique, which utilizes both camera-camera and projector-camera correspondences.


Applications

3D stereo displays find many applications in entertainment, information transfer and automated systems. Stereo vision is highly important in fields such as
robotics Robotics is the interdisciplinary study and practice of the design, construction, operation, and use of robots. Within mechanical engineering, robotics is the design and construction of the physical structures of robots, while in computer s ...
to extract information about the relative position of 3D objects in the vicinity of autonomous systems. Other applications for robotics include
object recognition Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
, where depth information allows for the system to separate occluding image components, such as one chair in front of another, which the robot may otherwise not be able to distinguish as a separate object by any other criteria. Scientific applications for digital stereo vision include the extraction of information from
aerial survey Aerial survey is a method of collecting geomatics or other imagery data using airplanes, helicopters, unmanned aerial vehicle, UAVs, Balloon (aeronautics), balloons, or other aerial methods. Typical data collected includes aerial photography, Li ...
s, for calculation of contour maps or even geometry extraction for 3D building mapping, photogrammetric satellite mapping, or calculation of 3D heliographical information such as obtained by the NASA
STEREO Stereophonic sound, commonly shortened to stereo, is a method of sound reproduction that recreates a multi-directional, 3-dimensional audible perspective. This is usually achieved by using two independent audio channels through a configurat ...
project.


Detailed definition

A pixel records color at a position. The position is identified by position in the grid of pixels (x, y) and depth to the pixel ''z.'' Stereoscopic vision gives two images of the same scene, from different positions. In the adjacent diagram light from the point ''A'' is transmitted through the entry points of pinhole cameras at ''B'' and ''D'', onto image screens at ''E'' and ''H''. In the attached diagram the distance between the centers of the two camera lens is ''BD = BC + CD''. The triangles are similar, * ''ACB'' and ''BFE'' * ''ACD'' and ''DGH'' \begin \textd &= EF + GH \\ &= BF (\frac + \frac) \\ &= BF (\frac + \frac) \\ &= BF (\frac) \\ &= BF \frac \\ &= \frac \text\\ \end * ''k = BD BF'' * ''z = AC'' is the distance from the camera plane to the object. So assuming the cameras are level, and image planes are flat on the same plane, the displacement in the y axis between the same pixel in the two images is, :d = \frac Where ''k'' is the distance between the two cameras times the distance from the lens to the image. The depth component in the two images are z_1 and z_2, given by, :z_2(x, y) = \min \left \ :z_1(x, y) = \min \left \ These formulas allow for the occlusion of
voxels In computing, a voxel is a representation of a value on a three-dimensional regular grid, akin to the two-dimensional pixel. Voxels are frequently used in the visualization and analysis of medical and scientific data (e.g. geographic informati ...
, seen in one image on the surface of the object, by closer voxels seen in the other image, on the surface of the object.


Image rectification

Where the image planes are not co-planar, image rectification is required to adjust the images as if they were co-planar. This may be achieved by a linear transformation. The images may also need rectification to make each image equivalent to the image taken from a pinhole camera projecting to a flat plane.


Smoothness

Smoothness is a measure of the similarity of colors. Given the assumption that a distinct object has a small number of colors, similarly-colored pixels are more likely to belong to a single object than to multiple objects. The method described above for evaluating smoothness is based on information theory, and an assumption that the influence of the color of a voxel influences the color of nearby voxels according to the normal distribution on the distance between points. The model is based on approximate assumptions about the world. Another method based on prior assumptions of smoothness is auto-correlation. Smoothness is a property of the world rather than an intrinsic property of an image. An image comprising random dots would have no smoothness, and inferences about neighboring points would be useless. In principle, smoothness, as with other properties of the world, should be learned. This appears to be what the human vision system does.


Information measure


Least squares information measure

The normal distribution is :P(x, \mu, \sigma) = \frac e^ Probability is related to information content described by message length ''L'', :P(x) = 2^ :L(x) = -\log_2 so, :L(x, \mu, \sigma) = \log_2(\sigma\sqrt) + \frac \log_2 e For the purposes of comparing stereoscopic images, only the relative message length matters. Based on this, the information measure ''I'', called the Sum of Squares of Differences (SSD) is, :I(x, \mu, \sigma) = \frac where, :L(x, \mu, \sigma) = \log_2(\sigma\sqrt) + I(x, \mu, \sigma) \frac Because of the cost in processing time of squaring numbers in SSD, many implementations use Sum of Absolute Difference (SAD) as the basis for computing the information measure. Other methods use normalized cross correlation (NCC).


Information measure for stereoscopic images

The
least squares The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...
measure may be used to measure the information content of the stereoscopic images, given depths at each point z(x, y). Firstly the information needed to express one image in terms of the other is derived. This is called I_m. A color difference function should be used to fairly measure the difference between colors. The color difference function is written ''cd'' in the following. The measure of the information needed to record the color matching between the two images is, :I_m(z_1, z_2) = \frac \sum_\operatorname(\operatorname_1(x, y + \frac), \operatorname_2(x, y))^2 An assumption is made about the smoothness of the image. Assume that two pixels are more likely to be the same color, the closer the voxels they represent are. This measure is intended to favor colors that are similar being grouped at the same depth. For example, if an object in front occludes an area of sky behind, the measure of smoothness favors the blue pixels all being grouped together at the same depth. The total measure of smoothness uses the distance between voxels as an estimate of the expected standard deviation of the color difference, :I_s(z_1, z_2) = \frac \sum_ \sum_ \sum_ \frac The total information content is then the sum, :I_t(z_1, z_2) = I_m(z_1, z_2) + I_s(z_1, z_2) The z component of each pixel must be chosen to give the minimum value for the information content. This will give the most likely depths at each pixel. The minimum total information measure is, :I_ = \min The depth functions for the left and right images are the pair, :(z_1, z_2) \in \


Methods of implementation

The minimization problem is
NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...
. This means a general solution to this problem will take a long time to reach. However methods exist for computers based on
heuristics A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...
that approximate the result in a reasonable amount of time. Also methods exist based on
neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
s. Efficient implementation of stereoscopic vision is an area of active research.


See also

* 3D reconstruction from multiple images *
3D scanner 3D scanning is the process of analyzing a real-world object or environment to collect three dimensional data of its shape and possibly its appearance (e.g. color). The collected data can then be used to construct digital 3D models. A 3D scanner ...
* Autostereoscopy *
Computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
*
Epipolar geometry Epipolar geometry is the geometry of stereo vision#Computer stereo vision, stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto th ...
* Semi-global matching * Structure from motion * Stereo camera * Stereophotogrammetry *
Stereopsis Binocular vision is seeing with two eyes, which increases the size of the Visual field, visual field. If the visual fields of the two eyes overlap, binocular #Depth, depth can be seen. This allows objects to be recognized more quickly, camouflage ...
* Stereoscopic depth rendition * Stixel *
Trifocal tensor In computer vision, the trifocal tensor (also tritensor) is a 3×3×3 array of numbers (i.e., a tensor) that incorporates all projective geometric relationships among three views. It relates the coordinates of corresponding points or lines in thre ...
- for trifocal stereoscopy (using three images instead of two)


References


External links


Tutorial on uncalibrated stereo visionStereo Vision and Rover Navigation Software for Planetary Exploration
{{Stereoscopy Applications of computer vision Geometry in computer vision Vision Stereoscopy Stereophotogrammetry