In the fields of
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
and
computer vision
Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
, pose (or spatial pose) represents the
position and the
orientation of an object, each usually in
three dimensions. Poses are often stored internally as
transformation matrices. The term “pose” is largely synonymous with the term “transform”, but a transform may often include
scale, whereas pose does not.
In computer vision, the pose of an object is often estimated from camera input by the process of ''
pose estimation''. This information can then be used, for example, to allow a robot to manipulate an object or to avoid moving into the object based on its perceived position and orientation in the environment. Other applications include skeletal action recognition.
Pose estimation
The specific task of determining the pose of an object in an image (or stereo images, image sequence) is referred to as ''pose estimation''. Pose estimation problems can be solved in different ways depending on the image sensor configuration, and choice of methodology. Three classes of methodologies can be distinguished:
* Analytic or geometric methods: Given that the image sensor (camera) is calibrated and the mapping from 3D points in the scene and 2D points in the image is known. If also the geometry of the object is known, it means that the projected image of the object on the camera image is a well-known function of the object's pose. Once a set of control points on the object, typically corners or other feature points, has been identified, it is then possible to solve the pose transformation from a set of equations which relate the 3D coordinates of the points with their 2D image coordinates. Algorithms that determine the pose of a
point cloud
A point cloud is a discrete set of data Point (geometry), points in space. The points may represent a 3D shape or object. Each point Position (geometry), position has its set of Cartesian coordinates (X, Y, Z). Points may contain data other than ...
with respect to another point cloud are known as
point set registration algorithms, if the correspondences between points are not already known.
*
Genetic algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to g ...
methods: If the pose of an object does not have to be computed in real-time a
genetic algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to g ...
may be used. This approach is robust especially when the images are not perfectly calibrated. In this particular case, the pose represent the
genetic representation
In computer programming, genetic representation is a way of presenting solutions/individuals in evolutionary computation methods. The term encompasses both the concrete data structures and data types used to realize the genetic material of the can ...
and the error between the projection of the object control points with the image is the
fitness function
A fitness function is a particular type of objective or cost function that is used to summarize, as a single figure of merit, how close a given candidate solution is to achieving the set aims. It is an important component of evolutionary algorit ...
.
* Learning-based methods: These methods use artificial learning-based system which learn the mapping from 2D image features to pose transformation. In short, this means that a sufficiently large set of images of the object, in different poses, must be presented to the system during a learning phase. Once the learning phase is completed, the system should be able to present an estimate of the object's pose given an image of the object.
Camera pose
See also
*
Gesture recognition
*
Homography (computer vision)
*
Camera calibration
*
Structure from motion
*
Essential matrix and
Trifocal tensor (relative pose)
References
{{Reflist
Computer vision
Geometry in computer vision
Robot control