In the field of

gesture recognition Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. It is a subdiscipline of computer vision. Gestures can originate from any bodily motion or sta ...

and

image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensio ...

, finger tracking is a

high-resolution Image resolution is the detail an image holds. The term applies to digital images, film images, and other types of images. "Higher resolution" means more image detail. Image resolution can be measured in various ways. Resolution quantifies how cl ...

technique developed in 1969 that is employed to know the consecutive position of the fingers of the user and hence represent objects in 3D. In addition to that, the finger tracking technique is used as a tool of the computer, acting as an external device in our computer, similar to a

keyboard Keyboard may refer to: Text input * Keyboard, part of a typewriter * Computer keyboard ** Keyboard layout, the software control of computer keyboards and their mapping ** Keyboard technology, computer keyboard hardware and firmware Music * Mu ...

and a

mouse A mouse ( : mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...

Introduction

The finger tracking system is focused on user-data interaction, where the user interacts with virtual data, by handling through the fingers the

volumetric Volume is a measure of occupied three-dimensional space. It is often quantified numerically using SI derived units (such as the cubic metre and litre) or by various imperial or US customary units (such as the gallon, quart, cubic inch). The def ...

of a 3D object that we want to represent. This system was born based on the human-computer interaction problem. The objective is to allow the communication between them and the use of

gesture A gesture is a form of non-verbal communication or non-vocal communication in which visible bodily actions communicate particular messages, either in place of, or in conjunction with, speech. Gestures include movement of the hands, face, or ...

s and hand movements to be more intuitive, Finger tracking systems have been created. These systems track in real time the position in 3D and 2D of the orientation of the fingers of each marker and use the intuitive hand movements and gestures to interact.

Types of tracking

There are many options for the implementation of finger tracking, principally those used with or without an

interface Interface or interfacing may refer to: Academic journals * ''Interface'' (journal), by the Electrochemical Society * '' Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics'' * '' Int ...

Tracking with interface

This system mostly uses inertial and optical

motion capture Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robo ...

systems.

Inertial motion capture gloves

Inertial motion capture systems are able to capture finger motion by reading the rotation of each finger segment in 3D space. Applying these rotations to

kinematic chain In mechanical engineering, a kinematic chain is an assembly of rigid bodies connected by joints to provide constrained (or desired) motion that is the mathematical model for a mechanical system. Reuleaux, F., 187''The Kinematics of Machine ...

, the whole human hand can be tracked in real time, without occlusion and wireless.

Hand inertial motion capture systems, like for example Synertial mocap gloves, use tiny IMU based sensors, located on each finger segment. Precise capture requires at least 16 sensors to be used. There are also mocap glove models with less sensors (13 / 7 sensors) for which the rest of the finger segments is interpolated (proximal segments) or extrapolated (distal segments). The sensors are typically inserted into textile glove which makes the use of the sensors more comfortable.

Inertial sensors can capture movement in all 3 directions, which means finger and thumb flexion, extension and abduction can be detected.

=Hand skeleton

= Since inertial sensors track only rotations, the rotations have to be applied to some hand skeleton in order to get proper output. To get precise output (for example to be able to touch the fingertips), the hand skeleton has to be properly scaled to match the real hand. For this purpose manual measurement of the hand or automatic measurement extraction can be used.

=Hand position tracking

= On the top of finger tracking, many users require positional tracking for the whole hand in space. Multiple methods can be used for this purpose: * Capturing the whole body using inertial mocap system (hand skeleton is attached at the end of body skeleton kinematic chain). Position of the palm is determined from the body. * Capturing position of the palm (forearm) using optical mocap system. * Capturing position of the palm (forearm) using other position tracking method, widely used in VR headsets (for example HTC Vive Lighthouse).

=Disadvantages of inertial motion capture systems

= Inertial sensors have two main disadvantages connected with finger tracking: * Problems capturing the absolute position of the hand in space. * Magnetic interference * The metal materials use to interfere with sensors. This problem may be noticeable mainly because hands are often in contact with different things, often made of metal. Current generations of motion capture gloves are able to withstand magnetic interference. The degree to which they are immune to magnetic interference depends on manufacturer, price range and number of sensors used in the mocap glove. Notably, stretch sensors are silicone-based capacitors that are completely unaffected by magnetic interference.

Optical motion capture systems

a tracking of the location of the markers and patterns in 3D is performed, the system identifies them and labels each marker according to the position of the user’s fingers. The

coordinate In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine the position of the points or other geometric elements on a manifold such as Euclidean space. The order of the coordinates is sign ...

s in 3D of the labels of these markers are produced in real time with other applications.

= Markers

= Some of the optical systems, like Vicon or ART, are able to capture hand motion through markers. In each hand we have a marker per each “operative” finger. Three high-resolution cameras are responsible for capturing each marker and measure its positions. This will be only produced when the camera is able to see them. The visual markers, usually known as rings or bracelets, are used to recognize user gesture in 3D. In addition, as the classification indicates, these rings act as an interface in 2D.

Occlusion as an interaction method

The visual occlusion is a very intuitive method to provide a more realistic viewpoint of the virtual information in three dimensions. The interfaces provide more natural

3D interaction In computing, 3D interaction is a form of human-machine interaction where users are able to move and perform interaction in 3D space. Both human and machine process information where the physical position of elements in the 3D space is relevant. ...

techniques over base 6.

Marker functionality

Markers operate through

interaction point In particle physics, an interaction point (IP) is the place where particles collide in an accelerator experiment. The ''nominal'' interaction point is the design position, which may differ from the ''real'' or ''physics'' interaction point, where ...

s, which are usually already set and we have the knowledge about the regions. Because of that, it is not necessary to follow each marker all the time; the multipointers can be treated in the same way when there is only one operating pointer. To detect such pointers through an interaction, we enable

ultrasound Ultrasound is sound waves with frequencies higher than the upper audible limit of human hearing. Ultrasound is not different from "normal" (audible) sound in its physical properties, except that humans cannot hear it. This limit varies ...

infrared Infrared (IR), sometimes called infrared light, is electromagnetic radiation (EMR) with wavelengths longer than those of Light, visible light. It is therefore invisible to the human eye. IR is generally understood to encompass wavelengths from ...

sensors. The fact that many pointers can be handled as one, problems would be solved. In the case when we are exposed to operate under difficult conditions like bad illumination,

motion blur Motion blur is the apparent streaking of moving objects in a photograph or a sequence of frames, such as a film or animation. It results when the image being recorded changes during the recording of a single exposure, due to rapid movement or lo ...

s, malformation of the marker or occlusion. The system allows following the object, even though if some markers are not visible. Because of the spatial relationships of all the markers are known, the positions of the markers that are not visible can be computed by using the markers that are known. There are several methods for marker detection like border marker and estimated marker methods. * The Homer technique includes ray selection with direct handling: An object is selected and then its position and orientation are handled like if it was connected directly to the hand. * The Conner technique presents a set of 3D widgets that permit an indirect interaction with the virtual objects through a virtual widget that acts as an intermediary.

=Fusing data with optical motion capture systems

= Because of marker occlusion during capture, tracking fingers is the most challenging part for optical motion capture systems (like Vicon, Optitrack, ART, ..). Users of optical mocap systems claims that the most post-process work is usually due to finger capture. As the inertial mocap systems (if properly calibrated) are mostly without the need for post-process, the typical use for high end mocap users is to fuse data from inertial mocap systems (fingers) with optical mocap systems (body + position in space).
The process of fusing mocap data is based on matching time codes of each frame for inertial and optical mocap system data source. This way any 3rd party software (for example MotionBuilder, Blender) can apply motions from two sources, independently of the mocap method used.

Stretch sensor finger tracking

Stretch sensor enabled motion capture systems use flexible parallel plate capacitors to detect differences in capacitance when the sensors stretch, bend, shear or are subjected to pressure. Stretch sensors are commonly silicone-based, which means they are unaffected by magnetic interference, occlusion or positional drift (common in inertial systems). The robust and flexible qualities of these sensors leads to high fidelity finger tracking and feature in mocap gloves produced by StretchSense.

= Articulated hand tracking

= Articulated hand tracking is simpler and less expensive than many methods because it only needs one

camera A camera is an optical instrument that can capture an image. Most cameras can capture 2D images, with some more advanced models being able to capture 3D images. At a basic level, most cameras consist of sealed boxes (the camera body), with ...

. This simplicity results in less precision. It provides a new base for new interactions in the modeling, the control of the

animation Animation is a method by which still figures are manipulated to appear as moving images. In traditional animation, images are drawn or painted by hand on transparent celluloid sheets to be photographed and exhibited on film. Today, most ani ...

and the added realism. It uses a glove composed of a set of colors which are assigned according to the position of the fingers. This color test is limited to the vision system of the computers and based on the capture function and the position of the color, the position of the hand is known.

Tracking without interface

In terms of

visual perception Visual perception is the ability to interpret the surrounding environment through photopic vision (daytime vision), color vision, scotopic vision (night vision), and mesopic vision (twilight vision), using light in the visible spectrum ref ...

, the legs and hands can be modeled as articulated mechanisms, system of rigid bodies that are connected between them to articulations with one or more degrees of freedom. This model can be applied to a more reduced scale to describe hand motion and based on a wide scale to describe a complete body motion. A certain finger motion, for example, can be recognized from its usual angles and it does not depend on the position of the hand in relation to the camera. Many tracking systems are based on a model focused on a problem of sequence estimation, where a sequence of images is given and a model of changing, we estimate the 3D configuration for each photo. All the possible hand configurations are represented by

vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...

s on a state space, which codes the position of the hand and the angles of the finger’s joint. Each hand configuration generates a set of images through the detection of the borders of the occlusion of the finger’s joint. The estimation of each image is calculated by finding the state vector that better fits to the measured characteristics. The finger joints have the added 21 states more than the rigid body movement of the palms; this means that the cost computational of the estimation is increased. The technique consists of label each finger joint links is modeled as a cylinder. We do the axes at each joint and bisector of this axis is the projection of the joint. Hence we use 3 DOF, because there are only 3 degrees of movement. In this case, it is the same as in the previous

typology Typology is the study of types or the systematic classification of the types of something according to their common characteristics. Typology is the act of finding, counting and classification facts with the help of eyes, other senses and logic. Ty ...

as there is a wide variety of deployment thesis on this subject. Therefore, the steps and treatment technique are different depending on the purpose and needs of the person who will use this technique. Anyway, we can say that a very general way and in most systems, you should carry out the following steps: * Background subtraction: the idea is to convolve all the images that are captured with a Gauss filter of 5x5, and then these are scaled to reduce noisy pixel data. * Segmentation: a binary mask application is used to represent with a white color, the pixels that belong to the hand and to apply the black color to the foreground skin image. * Region extraction: left and right hand detection based on a comparison between them. * Characteristic extraction: location of the fingertips and to detect if it is a peak or a valley. To classify the point, peaks or valleys, these are transformed to 3D vectors, usually named pseudo vectors in the xy-plane, and then to compute the cross product. If the sign of the z component of the cross product is positive, we consider that the point is a peak, and in the case that the result of the cross product is negative, it will be a valley. * Point and pinch gesture recognition: taking into account the points of reference that are visible (fingertips) a certain gesture is associated. * Pose estimation: a procedure which consists on identify the position of the hands through the use of algorithms that compute the distances between positions.

Other tracking techniques

It is also possible to perform active tracking of fingers. The Smart Laser Scanner is a marker-less finger tracking system using a modified laser scanner/projector developed at the University of Tokyo in 2003-2004. It is capable of acquiring three-dimensional coordinates in real time without the need of any image processing at all (essentially, it is a rangefinder scanner that instead of continuously scanning over the full field of view, restricts its scanning area to a very narrow window precisely the size of the target). Gesture recognition has been demonstrated with this system. The sampling rate can be very high (500 Hz), enabling smooth trajectories to be acquired without the need of filtering (such as Kalman).

Application

Definitely, the finger tracking systems are used to represent a

virtual reality Virtual reality (VR) is a simulated experience that employs pose tracking and 3D near-eye displays to give the user an immersive feel of a virtual world. Applications of virtual reality include entertainment (particularly video games), edu ...

. However its application has gone to professional level

3D modeling In 3D computer graphics, 3D modeling is the process of developing a mathematical coordinate-based representation of any surface of an object (inanimate or living) in three dimensions via specialized software by manipulating edges, vertices, a ...

, companies and projects directly in this case overturned. Thus such systems rarely have been used in consumer applications due to its high price and complexity. In any case, the main objective is to facilitate the task of executing commands to the computer via natural language or interacting gesture. The objective is centered on the following idea computers should be easier in terms of usage if there is a possibility to operate through natural language or gesture interaction. The main application of this technique is to highlight the 3D design and animation, where software like Maya and 3D StudioMax employ these kinds of tools. The reason is to allow a more accurate and simple control of the instructions that we want to execute. This technology offers many possibilities, where the sculpture, building and modeling in 3D in real time through the use of a computer is the most important.

References

* Anderson, D., Yedidia, J., Frankel, J., Marks, J., Agarwala, A., Beardsley, P., Hodgins, J., Leigh, D., Ryall, K., & Sullivan, E. (2000)
Tangible interaction + graphical interpretation: a new approach to 3D modeling
SIGGRAPH. p. 393-402. * Angelidis, A., Cani, M.-P., Wyvill, G., & King, S. (2004)
Swirling-Sweepers: Constant-volume modeling
Pacific Graphics. p. 10-15. * Grossman, T., Wigdor, D., & Balakrishnan, R. (2004)
Multi finger gestural interaction with 3D volumetric displays
UIST. p. 61-70. * Freeman, W. & Weissman, C. (1995)
Television control by hand gestures
International Workshop on Automatic Face and Gesture Recognition. p. 179-183. * Ringel, M., Berg, H., Jin, Y., & Winograd, T. (2001)
Barehands: implement-free interaction with a wallmounted display
CHI Extended Abstracts. p. 367-368. * Cao, X. & Balakrishnan, R. (2003)
VisionWand: interaction techniques for large displays using a passive wand tracked in 3D
UIST. p. 173-182. * A. Cassinelli, S. Perrin and M. Ishikawa
Smart Laser-Scanner for 3D Human-Machine Interface
ACM SIGCHI 2005 (CHI '05) International Conference on Human Factors in Computing Systems, Portland, OR, USA April 2–07, 2005, pp. 1138 – 1139 (2005).

External links

* http://www.synertial.com/ * http://www.vicon.com/ *https://stretchsense.com * http://www.dgp.toronto.edu/~ravin/videos/graphite2006_proxy.mov{{dead link, date=April 2019 * https://web.archive.org/web/20091211043000/http://actuality-medical.com/Home.html * http://www.dgp.toronto.edu/ * http://www.k2.t.u-tokyo.ac.jp/perception/SmartLaserTracking/ * Finger trackin
using markers
o
without markers

3D Hand Tracking
3D imaging Computing input devices Articles containing video clips Gesture recognition Fingers Tracking