Facial motion capture is the process of electronically converting the movements of a person's face into a digital database using cameras or laser scanners. This database may then be used to produce

computer graphics Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...

(CG),

computer animation Computer animation is the process used for digitally generating animations. The more general term computer-generated imagery (CGI) encompasses both static scenes ( still images) and dynamic images ( moving images), while computer animation re ...

for movies, games, or real-time avatars. Because the motion of CG characters is derived from the movements of real people, it results in a more realistic and nuanced computer character animation than if the animation were created manually. A facial

motion capture Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robo ...

database describes the coordinates or relative positions of reference points on the actor's face. The capture may be in two dimensions, in which case the capture process is sometimes called "

expression Expression may refer to: Linguistics * Expression (linguistics), a word, phrase, or sentence * Fixed expression, a form of words with a specific meaning * Idiom, a type of fixed expression * Metaphorical expression, a particular word, phrase, o ...

tracking", or in three dimensions. Two-dimensional capture can be achieved using a single camera and capture software. This produces less sophisticated tracking, and is unable to fully capture three-dimensional motions such as head rotation. Three-dimensional capture is accomplished using multi-camera rigs or laser marker system. Such systems are typically far more expensive, complicated, and time-consuming to use. Two predominate technologies exist: marker and marker-less tracking systems. Facial motion capture is related to body motion capture, but is more challenging due to the higher resolution requirements to detect and track subtle expressions possible from small movements of the eyes and lips. These movements are often less than a few millimeters, requiring even greater resolution and fidelity and different filtering techniques than usually used in full body capture. The additional constraints of the face also allow more opportunities for using models and rules. Facial expression capture is similar to facial motion capture. It is a process of using visual or mechanical means to manipulate computer generated characters with input from human

face The face is the front of an animal's head that features the eyes, nose and mouth, and through which animals express many of their emotions. The face is crucial for human identity, and damage such as scarring or developmental deformities may aff ...

s, or to recognize emotions from a user.

History

One of the first papers discussing performance-driven animation was published by Lance Williams in 1990. There, he describes 'a means of acquiring the expressions of realfaces, and applying them to computer-generated faces'.Performance-Driven Facial Animation, Lance Williams, Computer Graphics, Volume 24, Number 4, August 1990

Technologies

Marker-based

Traditional marker based systems apply up to 350 markers to the actors

and track the marker movement with high resolution

cameras A camera is an optical instrument that can capture an image. Most cameras can capture 2D images, with some more advanced models being able to capture 3D images. At a basic level, most cameras consist of sealed boxes (the camera body), with a ...

. This has been used on movies such as ''

The Polar Express ''The Polar Express'' is a children's book written and illustrated by Chris Van Allsburg and published by Houghton Mifflin in 1985. The book is now widely considered to be a classic Christmas story for young children. It was praised for its detai ...

'' and ''

Beowulf ''Beowulf'' (; ang, Bēowulf ) is an Old English epic poem in the tradition of Germanic heroic legend consisting of 3,182 alliterative lines. It is one of the most important and most often translated works of Old English literature. ...

'' to allow an actor such as

Tom Hanks Thomas Jeffrey Hanks (born July 9, 1956) is an American actor and filmmaker. Known for both his comedic and dramatic roles, he is one of the most popular and recognizable film stars worldwide, and is regarded as an American cultural icon. Ha ...

to drive the facial expressions of several different characters. Unfortunately this is relatively cumbersome and makes the actors expressions overly driven once the smoothing and filtering have taken place. Next generation systems such as CaptiveMotion utilize offshoots of the traditional marker based system with higher levels of details. Active LED Marker technology is currently being used to drive facial animation in real-time to provide user feedback.

Markerless

Markerless technologies use the features of the face such as

nostril A nostril (or naris , plural ''nares'' ) is either of the two orifices of the nose. They enable the entry and exit of air and other gasses through the nasal cavities. In birds and mammals, they contain branched bones or cartilages called turbi ...

s, the corners of the lips and eyes, and wrinkles and then track them. This technology is discussed and demonstrated at CMU, IBM,

University of Manchester The University of Manchester is a public university, public research university in Manchester, England. The main campus is south of Manchester city centre, Manchester City Centre on Wilmslow Road, Oxford Road. The university owns and operates majo ...

(where much of this started wit
Tim Cootes
Gareth Edwards and Chris Taylor) and other locations, using active appearance models,

principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...

, eigen tracking
deformable surface models
and other techniques to track the desired facial features from frame to frame. This technology is much less cumbersome, and allows greater expression for the actor. These vision based approaches also have the ability to track pupil movement, eyelids, teeth occlusion by the lips and tongue, which are obvious problems in most computer animated features. Typical limitations of vision based approaches are resolution and frame rate, both of which are decreasing as issues as high speed, high resolution

CMOS camera An active-pixel sensor (APS) is an image sensor where each pixel sensor unit cell has a photodetector (typically a pinned photodiode) and one or more active transistors. In a metal–oxide–semiconductor (MOS) active-pixel sensor, MOS field-effec ...

s become available from multiple sources. The technology for markerless face tracking is related to that in a

Facial recognition system A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and ...

, since a facial recognition system can potentially be applied sequentially to each frame of video, resulting in face tracking. For example, the Neven Vision system (formerly Eyematics, now acquired by Google) allowed real-time 2D face tracking with no person-specific training; their system was also amongst the best-performing facial recognition systems in the U.S. Government's 2002 Facial Recognition Vendor Test (FRVT). On the other hand, some recognition systems do not explicitly track expressions or even fail on non-neutral expressions, and so are not suitable for tracking. Conversely, systems such a
deformable surface models
pool temporal information to disambiguate and obtain more robust results, and thus could not be applied from a single photograph. Markerless face tracking has progressed to commercial systems such as

Image Metrics Image Metrics is a 3D facial animation and Virtual Try-on company headquartered in El Segundo, with offices in Las Vegas, and research facilities in Manchester. Image Metrics are the makers of the Live Driver and Portable You SDKs for softw ...

, which has been applied in movies such as ''

The Matrix ''The Matrix'' is a 1999 science fiction action film written and directed by the Wachowskis. It is the first installment in ''The Matrix'' film series, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantolia ...

'' sequels and '' The Curious Case of Benjamin Button''. The latter used the Mova system to capture a deformable facial model, which was then animated with a combination of manual and vision tracking. ''

Avatar Avatar (, ; ), is a concept within Hinduism that in Sanskrit literally means "descent". It signifies the material appearance or incarnation of a powerful deity, goddess or spirit on Earth. The relative verb to "alight, to make one's appear ...

'' was another prominent performance capture movie however it used painted markers rather than being markerless
Dynamixyz
is another commercial system currently in use. Markerless systems can be classified according to several distinguishing criteria: * 2D versus 3D tracking * whether person-specific training or other human assistance is required * real-time performance (which is only possible if no training or supervision is required) * whether they need an additional source of information such as projected patterns or invisible paint such as used in the Mova system. To date, no system is ideal with respect to all these criteria. For example, the Neven Vision system was fully automatic and required no hidden patterns or per-person training, but was 2D. The Face/Off system{{Citation , last = Weise , first = Thibaut , author2=H. Li , author3=L. Van Gool , author4=M. Pauly , title = Face/off: Live Facial Puppetry , journal = ACM Symposium on Computer Animation , year = 2009 is 3D, automatic, and real-time but requires projected patterns.

Facial expression capture

Technology

Digital video-based methods are becoming increasingly preferred, as mechanical systems tend to be cumbersome and difficult to use. Using

digital camera A digital camera is a camera that captures photographs in digital memory. Most cameras produced today are digital, largely replacing those that capture images on photographic film. Digital cameras are now widely incorporated into mobile devices ...

s, the input user's expressions are processed to provide the head pose, which allows the software to then find the eyes, nose and mouth. The face is initially calibrated using a neutral expression. Then depending on the architecture, the eyebrows, eyelids, cheeks, and mouth can be processed as differences from the neutral expression. This is done by looking for the edges of the lips for instance and recognizing it as a unique object. Often contrast enhancing makeup or markers are worn, or some other method to make the processing faster. Like voice recognition, the best techniques are only good 90 percent of the time, requiring a great deal of tweaking by hand, or tolerance for errors. Since computer generated characters don't actually have

muscle Skeletal muscles (commonly referred to as muscles) are organs of the vertebrate muscular system and typically are attached by tendons to bones of a skeleton. The muscle cells of skeletal muscles are much longer than in the other types of mus ...

s, different techniques are used to achieve the same results. Some animators create bones or objects that are controlled by the capture software, and move them accordingly, which when the character is rigged correctly gives a good approximation. Since faces are very elastic this technique is often mixed with others, adjusting the weights differently for the

skin Skin is the layer of usually soft, flexible outer tissue covering the body of a vertebrate animal, with three main functions: protection, regulation, and sensation. Other animal coverings, such as the arthropod exoskeleton, have different ...

elasticity and other factors depending on the desired expressions.

Usage

Several commercial companies are developing products that have been used, but are rather expensive. It is expected that this will become a major

input device In computing, an input device is a piece of equipment used to provide data and control signals to an information processing system, such as a computer or information appliance. Examples of input devices include keyboards, mouse, scanners, cameras ...

for computer games once the software is available in an affordable format, but the hardware and software do not yet exist, despite the research for the last 15 years producing results that are almost usable.

References

External links

Carnegie Mellon University

Delft University of Technology

Sheffield and Otago
Computer animation Facial expressions Computing input devices Motion capture