3D pose estimation is a process of predicting the transformation of an object from a user-defined reference pose, given an image or a

3D scan 3D scanning is the process of analyzing a real-world object or environment to collect data on its shape and possibly its appearance (e.g. color). The collected data can then be used to construct digital 3D models. A 3D scanner can be based on ...

. It arises in

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...

robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrat ...

where the pose or transformation of an object can be used for alignment of a

computer-aided design Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve co ...

models, identification,

grasping A grasp is an act of taking, holding or seizing firmly with (or as if with) the hand. An example of a grasp is the handshake, wherein two people grasp one of each other's like hands. In zoology particularly, prehensility is the quality of an app ...

, or manipulation of the object. The image data from which the pose of an object is determined can be either a single image, a stereo image pair, or an image sequence where, typically, the camera is moving with a known velocity. The objects which are considered can be rather general, including a living being or body parts, e.g., a head or hands. The methods which are used for determining the pose of an object, however, are usually specific for a class of objects and cannot generally be expected to work well for other types of objects.

From an uncalibrated 2D camera

It is possible to estimate the 3D rotation and translation of a 3D object from a single 2D photo, if an approximate 3D model of the object is known and the corresponding points in the 2D image are known. A common technique for solving this has recently been "POSIT", where the 3D pose is estimated directly from the 3D model points and the 2D image points, and corrects the errors iteratively until a good estimate is found from a single image. Most implementations of POSIT only work on non-coplanar points (in other words, it won't work with flat objects or planes). Another approach is to register a 3D CAD model over the photograph of a known object by

optimizing Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

a suitable distance measure with respect to the pose parameters. The distance measure is computed between the object in the photograph and the 3D

CAD model Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve co ...

projection at a given pose.

Perspective projection Linear or point-projection perspective (from la, perspicere 'to see through') is one of two types of graphical projection perspective in the graphic arts; the other is parallel projection. Linear perspective is an approximate representation ...

orthogonal projection In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if it wer ...

is possible depending on the pose representation used. This approach is appropriate for applications where a 3D CAD model of a known object (or object category) is available.

From a calibrated 2D camera

Given a 2D image of an object, and the camera that is calibrated with respect to a world coordinate system, it is also possible to find the pose which gives the 3D object in its object coordinate system. This works as follows.

Extracting 3D from 2D

Starting with a 2D image, image points are extracted which correspond to corners in an image. The projection rays from the image points are reconstructed from the 2D points so that the 3D points, which must be incident with the reconstructed rays, can be determined.

Pseudocode

The algorithm for determining pose estimation is based on the

iterative closest point Iterative closest point (ICP) is an algorithm employed to minimize the difference between two clouds of points. ICP is often used to reconstruct 2D or 3D surfaces from different scans, to localize robots and achieve optimal path planning (espec ...

algorithm. The main idea is to determine the correspondences between 2D image features and points on the 3D model curve. (a) Reconstruct projection rays from the image points (b) Estimate the nearest point of each projection ray to a point on the 3D contour (c) Estimate the pose of the contour with the use of this correspondence set (d) goto (b) The above algorithm does not account for images containing an object that is partially occluded. The following algorithm assumes that all contours are rigidly coupled, meaning the pose of one contour defines the pose of another contour. (a) Reconstruct projection rays from the image points (b) For each projection ray R: (c) For each 3D contour: (c1) Estimate the nearest point P1 of ray R to a point on the contour (c2) if (n

1) choose P1 as actual P for the point-line correspondence (c3) else compare P1 with P: if dist(P1, R) is smaller than dist(P, R) then choose P1 as new P (d) Use (P, R) as correspondence set. (e) Estimate pose with this correspondence set (f) Transform contours, goto (b)

Estimating pose through comparison

Systems exist which use a database of an object at different rotations and translations to compare an input image against to estimate pose. These systems accuracy is limited to situations which are represented in their database of images, however the goal is to recognize a pose, rather than determine it.

Software

posest
a GPL C/

C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...

library for 6DoF pose estimation from 3D-2D correspondences.
diffgeom2pose
fast Matlab solver for 6DoF pose estimation from only ''two'' 3D-2D correspondences of points with directions (vectors), or points at curves (point-tangents). The points can be SIFT attributed with feature directions.
MINUS
C++ package for (relative) pose estimation of three views. Includes cases of three corresponding points with lines at these points (as in feature positions and orientations, or curve points with tangents), and also for three corresponding points and one line correspondence.

References

{{Reflist, 30em

Bibliography

*Rosenhahn, B. "Foundations about 2D-3D Pose Estimation." *Rosenhahn, B
"Pose Estimation of 3D Free-form Contours in Conformal Geometry."
*Athitsos, V
"Estimating 3D Hand Pose from a Cluttered Image."

External links

* Estimación de una Postura 3D Computer vision Geometry in computer vision Robot control