In
image analysis
Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophi ...
, the generalized structure tensor (GST) is an extension of the Cartesian
structure tensor to
curvilinear coordinates.
It is mainly used to detect and to represent the "direction" parameters of curves, just as the Cartesian structure tensor detects and represents the direction in Cartesian coordinates. Curve families generated by pairs of locally orthogonal functions have been the best studied.
It is a widely known method in applications of image and video processing including computer vision, such as biometric identification by fingerprints,
and studies of human tissue sections.
GST in 2D and locally orthogonal bases
Let the term image represent a function
where
are real variables and
, and
, are real valued functions. GST represents the direction along which the image
can undergo an infinitesimal translation with minimal (
total least squares
In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliza ...
) error, along the "lines" fulfilling the following conditions:
1. The "lines" are ordinary lines in the curvilinear coordinate basis
:
which are curves in Cartesian coordinates as depicted by the equation above. The error is measured in the
sense and the minimality of the error refers thereby to
L2 norm.
2. The functions
constitute a harmonic pair, i.e. they fulfill
Cauchy–Riemann equations,
:
Accordingly, such curvilinear coordinates
are locally orthogonal.
Then GST consists in
:
where
are errors of (infinitesimal) translation in the best direction (designated by the angle
) and the worst direction (designated by
). The function
is the window function defining the "outer scale" wherein the detection of
will be carried out, which can be omitted if it is already included in
or if
is the full image (rather than local). The matrix
is the
identity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere.
Terminology and notation
The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...
. Using the
chain rule, it can be shown that the integration above can be implemented as
convolutions in Cartesian coordinates applied to the ordinary structure tensor when
pair the real and imaginary parts of an
analytic function ,
:
where
.
Examples of analytic functions include
, as well as monomials
,
, where
is an arbitrary positive or negative integer. The monomials
are also referred to as
harmonic functions in computer vision, and image processing.
Thereby, Cartesian
Structure tensor is a special case of GST where
, and
, i.e. the harmonic function is simply
. Thus by choosing a harmonic function
, one can detect all curves that are linear combinations of its real and imaginary parts by convolutions on (rectangular) image grids only, even if
are non-Cartesian. Furthermore, the convolution computations can be done by using complex filters applied to the complex version of the structure tensor. Thus, GST implementations have frequently been done using complex version of the structure tensor, rather than using the (1,1) tensor.
Complex version of GST
As there is a complex version of the ordinary
structure tensor, there is also a complex version of the GST
:
which is identical to its cousin with the difference that
is a complex filter. It should be recalled that, the ordinary structure tensor
is a real filter, usually defined by a sampled and scaled Gaussian to delineate the neighborhood, also known as the outer scale. This simplicity is a reason for why GST implementations have predominantly used the complex version above. For curve families
defined by analytic functions
, it can be shown that,
the neighborhood defining function is complex valued,
:
,
a so called symmetry derivative of a Gaussian. Thus, the orientation wise variation of the pattern to be looked for is directly incorporated into the neighborhood defining function, and the detection occurs in the space of the (ordinary) structure tensor.
Basic concept for its use in image processing and computer vision
Efficient detection of
in images is possible by image processing for a pair
,
. Complex convolutions (or the corresponding matrix operations) and point-wise non-linear mappings are the basic computational elements of GST implementations. A total least square error estimation of
is then obtained along with the two errors,
and
. In analogy with the Cartesian
structure tensor, the estimated angle is in double angle representation, i.e.
is delivered by computations, and can be used as a shape feature whereas
alone or in combination with
can be used as a quality (confidence, certainty) measure for the angle estimation.
Logarithmic spirals, including circles, can for instance be detected by (complex) convolutions and non-linear mappings.
The spirals can be in
gray (valued) images or in a
binary image
A binary image is one that consists of pixels that can have one of exactly two colors, usually black and white. Binary images are also called ''bi-level'' or ''two-level'', Pixelart made of two colours is often referred to as ''1-Bit'' or ''1b ...
, i.e. locations of edge elements of the concerned patterns, such as contours of circles or spirals, must not be known or marked otherwise.
Generalized structure tensor can be used as an alternative to
Hough transform
The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting pro ...
in
image processing
An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
and
computer vision
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
to detect patterns whose local orientations can be modelled, for example junction points. The main differences comprise:
*Negative, as well as complex voting are allowed;
*With one template multiple patterns belonging to the same family can be detected;
*Image binarization is not required.
Physical and mathematical interpretation
The curvilinear coordinates of GST can explain physical processes applied to images. A well known pair of processes consist in rotation, and zooming. These are related to the coordinate transformation
and
.
If an image
consists in iso-curves that can be explained by only
i.e. its iso-curves consist in circles
, where
is any real valued differentiable function defined on 1D, the image is invariant to rotations (around the origin).
Zooming (comprising unzooming) operation is modeled similarly. If the image has iso-curves that look like a "star" or bicycle spokes, i.e.
for some differentiable 1D function
then, the image
is invariant to scaling (w.r.t. the origin).
In combination,
:
is invariant to a certain amount of rotation combined with scaling, where the amount is precised by the parameter
.
Analogously, the Cartesian
structure tensor is a representation of a
translation too. Here the physical process consists in an ordinary translation of a certain amount along
combined with translation along
,
:
where the amount is specified by the parameter
. Evidently
here represents the direction of the line.
Generally, the estimated
represents the direction (in
coordinates) along which infinitesimal translations leave the image invariant, in practice least variant. With every curvilinear coordinate basis pair, there is thus a pair of infinitesimal translators, a linear combination of which is a
Differential operator
In mathematics, a differential operator is an operator defined as a function of the differentiation operator. It is helpful, as a matter of notation first, to consider differentiation as an abstract operation that accepts a function and return ...
. The latter are related to
Lie algebra
In mathematics, a Lie algebra (pronounced ) is a vector space \mathfrak g together with an Binary operation, operation called the Lie bracket, an Alternating multilinear map, alternating bilinear map \mathfrak g \times \mathfrak g \rightarrow ...
.
Miscellaneous
"Image" in the context of the GST can mean both an ordinary image and an image neighborhood thereof (local image), depending on context. For example, a photograph is an image as is any neighborhood of it.
See also
*
Structure tensor
*
Hough transform
The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting pro ...
*
Tensor
*
Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
*
Corner detection
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosai ...
*
Edge detection
*
Affine shape adaptation
*
Directional derivative
*
Differential operator
In mathematics, a differential operator is an operator defined as a function of the differentiation operator. It is helpful, as a matter of notation first, to consider differentiation as an abstract operation that accepts a function and return ...
*
Lie algebra
In mathematics, a Lie algebra (pronounced ) is a vector space \mathfrak g together with an Binary operation, operation called the Lie bracket, an Alternating multilinear map, alternating bilinear map \mathfrak g \times \mathfrak g \rightarrow ...
References
{{reflist
Tensors
Feature detection (computer vision)