The following outline is provided as an overview of and topical guide to computer vision:

Computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...

– interdisciplinary field that deals with how computers can be made to gain high-level understanding from

digital image A digital image is an image composed of picture elements, also known as ''pixels'', each with '' finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions ...

s or

video Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) sy ...

s. From the perspective of

engineering Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad range of more speciali ...

, it seeks to automate tasks that the human visual system can do. Computer vision tasks include methods for acquiring digital images (through image sensors),

image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensio ...

, and

image analysis Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as soph ...

, to reach an understanding of digital images. In general, it deals with the extraction of high-dimensional data from the real world in order to produce numerical or symbolic information that the computer can interpret. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models for the construction of computer vision systems. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images.

Branches of computer vision

* Computer stereo vision * Underwater computer vision

History of computer vision

Computer vision subsystems

Image enhancement

* Image denoising *

Image histogram An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value. By looking at the histogram for a specific image a viewer will ...

* Inpainting * Histogram equalization * Tone mapping *

Retinex Color constancy is an example of subjective constancy and a feature of the human color perception system which ensures that the perceived color of objects remains relatively constant under varying illumination conditions. A green apple ...

Gamma correction Gamma correction or gamma is a nonlinear operation used to encode and decode luminance or tristimulus values in video or still image systems. Gamma correction is, in the simplest cases, defined by the following power-law expression: : V_\tex ...

Anisotropic diffusion In image processing and computer vision, anisotropic diffusion, also called Perona–Malik diffusion, is a technique aiming at reducing image noise without removing significant parts of the image content, typically edges, lines or other details t ...

(Perona–Malik equation)

Transformations

* Affine transform * Homography (computer vision) * Hough transform *

Radon transform In mathematics, the Radon transform is the integral transform which takes a function ''f'' defined on the plane to a function ''Rf'' defined on the (two-dimensional) space of lines in the plane, whose value at a particular line is equal to the ...

* Walsh–Hadamard transform

Filtering, Fourier and wavelet transforms and image compression

* Image compression * Filter bank * Gabor filter * JPEG 2000 * Adaptive filtering

Color vision

Visual perception Visual perception is the ability to interpret the surrounding environment through photopic vision (daytime vision), color vision, scotopic vision (night vision), and mesopic vision (twilight vision), using light in the visible spectrum ref ...

Human visual system model A human visual system model (HVS model) is used by image processing, video processing and computer vision experts to deal with biological and psychological processes that are not yet fully understood. Such a model is used to simplify the behavi ...

Color matching function The CIE 1931 color spaces are the first defined quantitative links between distributions of wavelengths in the electromagnetic visible spectrum, and physiologically perceived colors in human color vision. The mathematical relationships that defin ...

Color space A color space is a specific organization of colors. In combination with color profiling supported by various physical devices, it supports reproducible representations of colorwhether such representation entails an analog or a digital represen ...

Color appearance model A color appearance model (CAM) is a mathematical model that seeks to describe the perceptual aspects of human color vision, i.e. viewing conditions under which the appearance of a color does not tally with the corresponding physical measurement o ...

Color management system In digital imaging systems, color management (or colour management) is the controlled conversion between the color representations of various devices, such as image scanners, digital cameras, monitors, TV screens, film printers, computer print ...

Color mapping Color mapping is a function that maps (transforms) the colors of one (source) image to the colors of another (target) image. A color mapping may be referred to as the algorithm that results in the mapping function or the algorithm that transfor ...

* Color model *

Color profile In color management, an ICC profile is a set of data that characterizes a color input or output device, or a color space, according to standards promulgated by the International Color Consortium (ICC). Profiles describe the color attributes of a ...

Feature extraction

Active contour Active contour model, also called snakes, is a framework in computer vision introduced by Michael Kass, Andrew Witkin, and Demetri Terzopoulos for delineating an object outline from a possibly noisy 2D image. The snakes model is popular in comput ...

Blob detection In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some propert ...

Canny edge detector The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a ''computational theory of edge detection'' explai ...

* Contour detection * Edge detection * Edge linking * Harris Corner Detector * Histogram of oriented gradients (HOG) *

Random sample consensus Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it ...

(RANSAC) * Scale-invariant feature transform (SIFT)

Pose estimation

Bundle adjustment In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D Coordinate system, coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s ...

* Articulated body pose estimation (BoPoE) *

Direct linear transformation Direct linear transformation (DLT) is an algorithm which solves a set of variables from a set of similarity relations: : \mathbf_ \propto \mathbf \, \mathbf_ for \, k = 1, \ldots, N where \mathbf_ and \mathbf_ are known vectors, \, ...

(DLT) *

Epipolar geometry Epipolar geometry is the geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints b ...

* Fundamental matrix *

Pinhole camera model The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ''ideal'' pinhole camera, where the camera aperture is described as a p ...

* Projective geometry *

Trifocal tensor In computer vision, the trifocal tensor (also tritensor) is a 3×3×3 array of numbers (i.e., a tensor) that incorporates all projective geometric relationships among three views. It relates the coordinates of corresponding points or lines in thr ...

Registration

* Active appearance model (AAM) * Cross-correlation * Geometric hashing * Graph cut segmentation * Least squares estimation *

Image pyramid Pyramid, or pyramid representation, is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsam ...

* Image segmentation * Level-set method * Markov random fields *

Medial axis The medial axis of an object is the set of all points having more than one closest point on the object's boundary. Originally referred to as the topological skeleton, it was introduced in 1967 by Harry Blum as a tool for biological shape recog ...

Motion field In computer vision the motion field is an ideal representation of 3D motion as it is projected onto a camera image. Given a simplified camera model, each point (y_, y_) in the image is the projection of some point in the 3D scene but the position ...

Motion vector Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions ...

* Multispectral imaging * Normalized cut segmentation * Optical flow *

Particle filter Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the inte ...

ing * Scale space

Visual recognition

* Object recognition * Scale-invariant feature transform (SIFT) *

Gesture recognition Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. It is a subdiscipline of computer vision. Gestures can originate from any bodily motion or sta ...

* Bag-of-words model in computer vision * Kadir–Brady saliency detector * Eigenface

Commercial computer vision systems

* 5DX * Aphelion (software) *

Microsoft PixelSense Microsoft PixelSense (formerly called Microsoft Surface) was an interactive surface computing platform that allowed one or more people to use and touch real-world objects, and share digital content at the same time. The PixelSense platform consist ...

* Poseidon drowning detection system *

Visage SDK Visage may refer to: *A synonym of face * Visage Mobile, an American software as a service company * Visage, Georgia, a community in the United States * ''Visage'' (film), also known as ''Face'', a 2009 French film * ''Visage'' (video game), a sur ...

Applications

3D reconstruction from multiple images 3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes. The essence of an image is a projection from a 3D scene onto a 2D pl ...

* Audio-visual speech recognition *

Augmented reality Augmented reality (AR) is an interactive experience that combines the real world and computer-generated content. The content can span multiple sensory modalities, including visual, auditory, haptic, somatosensory and olfactory. AR can be de ...

* Augmented reality-assisted surgery *

Automated optical inspection Automated optical inspection (AOI) is an automated visual inspection of printed circuit board (PCB) (or LCD, transistor) manufacture where a camera autonomously scans the device under test for both catastrophic failure (e.g. missing component) and ...

* Automatic image annotation *

Automatic number plate recognition Automatic number-plate recognition (ANPR; see also other names below) is a technology that uses optical character recognition on images to read vehicle registration plates to create vehicle location data. It can use existing closed-circuit tel ...

* Automatic target recognition * Check weigher *

Closed-circuit television Closed-circuit television (CCTV), also known as video surveillance, is the use of video cameras to transmit a signal to a specific place, on a limited set of monitors. It differs from broadcast television in that the signal is not openly tr ...

* Computer stereo vision * Contextual image classification * DARPA LAGR Program * Digital video fingerprinting * Document mosaicing * Facial recognition systems * GazoPa * Geometric feature learning *

* Image collection exploration * Image retrieval ** Content-based image retrieval **

Reverse image search Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very useful ...

* Image-based modeling and rendering * Integrated mail processing * Iris recognition * Machine vision * Mobile mapping * Navigation system components for: ** Autonomous cars ** Mobile robots * Object detection * Optical braille recognition *

Optical character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a sc ...

** Intelligent character recognition * Pedestrian detection * People counter * Physical computing * Red light camera *

Remote sensing Remote sensing is the acquisition of information about an object or phenomenon without making physical contact with the object, in contrast to in situ or on-site observation. The term is applied especially to acquiring information about Ear ...

Smart camera A smart camera (sensor) or intelligent camera (sensor) or (smart) vision sensor or intelligent vision sensor or smart optical sensor or intelligent optical sensor or smart visual sensor or intelligent visual sensor is a machine vision system whic ...

* Traffic enforcement camera * Traffic sign recognition *

Vehicle infrastructure integration Vehicle infrastructure integration (VII) is an initiative fostering research and applications development for a series of technologies directly linking road vehicles to their physical surroundings, first and foremost in order to improve road saf ...

* Velocity Moments *

Video content analysis Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events. This technical capability is use ...

View synthesis View synthesis aims to create new views of a specific subject starting from a number of pictures taken from given point of views. Currently a study branch of Computer Science Research, Vision Research and Artificial Intelligence fields are involve ...

* Visual sensor network * Visual Word * Water remote sensing

Computer vision companies

* 3DFLOW * Automatix * Clarifai * Cognex Corporation * Datagen *

Diffbot Diffbot is a developer of machine learning and computer vision algorithms and public APIs for extracting data from web pages / web scraping to create a knowledge base. The company has gained interest from its application of computer vision t ...

* IBM * InspecVision * Isra Vision * Kinesense * Mobileye *

Scantron Corporation Scantron Corporation is an American company based in Eagan, Minnesota. Scantron provides assessment solutions and technology services for business, education, certification, and government clients. Scantron Assessment Solutions deals with scanne ...

Teledyne DALSA Teledyne DALSA (formerly DALSA Corporation) is a Canadian company specializing in the design and manufacture of specialized electronic imaging components (image sensors, cameras, frame grabbers, imaging software) as well as specialized semiconduc ...

* VIEW Engineering *

Zivid Zivid is a Norwegian machine vision technology company headquartered in Oslo, Norway. It designs and sells 3D color cameras with vision software that are used in autonomous industrial robot cells, collaborative robot ( cobot) cells and other ind ...

* Warden Machinery

Computer vision publications

* Electronic Letters on Computer Vision and Image Analysis * International Journal of Computer Vision

Computer vision organizations

* Conference on Computer Vision and Pattern Recognition *

European Conference on Computer Vision The European Conference on Computer Vision (ECCV) is a biennial research conference with the proceedings published by Springer Science+Business Media. Similar to ICCV in scope and quality, it is held those years which ICCV is not. It is considere ...

International Conference on Computer Vision The International Conference on Computer Vision (ICCV) is a research conference sponsored by the Institute of Electrical and Electronics Engineers (IEEE) held every other year. It is considered to be one of the top conferences in computer vision, ...

* International Conferences in Central Europe on Computer Graphics, Visualization and Computer Vision

Persons influential in computer vision

References

External links

USC Iris computer vision conference list

A complete list of papers of the most relevant computer vision conferences.
Computer Vision Online
News, source code, datasets and job offers related to computer vision.

CVonline
Bob Fisher's Compendium of Computer Vision.
British Machine Vision Association
Supporting computer vision research within the UK via the BMVC and MIUA conferences, ''Annals of the BMVA'' (open-source journal),

BMVA Summer School BMVA Summer School is an annual summer school on computer vision, organised by the British Machine Vision Association and Society for Pattern Recognition (BMVA). The course is residential, usually held over five days, and consists of lectures an ...

and one-day meetings {{Outline footer Computer vision topics