computer vision Computer vision is an Interdisciplinarity, interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate t ...

, a stixel (portmanteau of "stick" and "

pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the s ...

") is a superpixel representation of depth information in an image, in the form of a vertical stick that approximates the closest obstacles within a certain vertical slice of the scene. Introduced in 2009, stixels have applications in

robotic navigation Robot localization denotes the robot's ability to establish its own position and orientation within the frame of reference. Path planning is effectively an extension of localisation, in that it requires the determination of the robot's current pos ...

and

advanced driver-assistance system An advanced driver-assistance system (ADAS) is any of a groups of electronic technologies that assist drivers in driving and parking functions. Through a safe human-machine interface, ADAS increase car and road safety. ADAS uses automated technol ...

s, where they can be used to define a representation of robotic environments and traffic scenes with a medium level of abstraction.

Definition

One of the problems of scene understanding in computer vision is to determine horizontal freespace around the camera, where the agent can move, and the vertical obstacles delimiting it. An image can be paired with depth information (produced e.g. from stereo

disparity Disparity and disparities may refer to: in healthcare: * Health disparities in finance: * Income disparity between females and males. ** Male–female income disparity in the United States **Income gender gap * Economic inequality * Income ineq ...

lidar Lidar (, also LIDAR, or LiDAR; sometimes LADAR) is a method for determining ranges (variable distance) by targeting an object or a surface with a laser and measuring the time for the reflected light to return to the receiver. It can also be ...

, or monocular depth estimation), allowing a dense tridimensional reconstruction of the observed scene. One drawback of dense reconstruction is the large amount of data involved, since each pixel in the image is mapped to an element of a point cloud. Vision problems characterised by planar freespace delimited by mostly vertical obstacles, such as traffic scenes or robotic navigation, can benefit from a condensed representation that allows to save memory and processing time. Stixels are thin vertical rectangles representing a slice of a vertical surface belonging to the closest obstacle in the observed scene. They allow to dramatically reduce the amount of information needed to represent a scene in such problems. A stixel is characterised by three parameters: vertical coordinate of the bottom, height of the stick, and depth. Stixels have fixed width, with each stixel spanning over a certain number of image columns, allowing downsampling of the horizontal image resolution. In the original formulation, each column of the image would contain at most one stixel, and later extensions were developed to allow multiple stixels on each column, allowing to represent multiple objects at different distances.

Stixel estimation

The input to stixel estimation is a dense

depth map In 3D computer graphics and computer vision, a depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. The term is related (and may be analogous) to ''depth ...

, that can be computed from stereo disparity or other means. The original approach computes an

occupancy grid Occupancy Grid Mapping refers to a family of computer algorithms in probabilistic robotics for mobile robots which address the problem of generating maps from noisy and uncertain sensor measurement data, with the assumption that the robot pose is k ...

that can be segmented to estimate the freespace, with

dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...

providing an efficient method to find an optimal segmentation. Alternative approaches can be used instead of occupancy grid mapping, such as manifold-based methods. The freespace boundary provides the base points of the obstacles at closest longitudinal distance, however multiple objects at different distances might appear in each column of the image. To fully define the obstacles, their height should be estimated, and this is accomplished by segmenting the depth of the object from the depth of the background. A membership function over the pixels can be defined based on the depth value, where the membership represents the confidence of a pixel belonging to the closest vertical obstacle or to the background, and a cut separating the obstacles from the background can again be computed effectively with dynamic programming. Once both the freespace and the obstacle height are known, the stixels can be estimated by fusing the information over the columns spanned by each stixel, and finally a refined depth of the stixel can be estimated via model fitting over the depth of the pixels covered by the stixel, possibly paired with confidence information (e.g. disparity confidence produced by methods such as

semi-global matching Semi-global matching (SGM) is a computer vision algorithm for the estimation of a dense disparity map from a rectified stereo image pair, introduced in 2005 by Heiko Hirschmüller while working at the German Aerospace Center.Hirschmüller (2005), ...

References

Sources

* * * * * {{DEFAULTSORT:Stixel Computer vision