RoboCup Rescue arena map generated by robot Hector from Darmstadt at 2010 German open

Simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an

agent Agent may refer to: Espionage, investigation, and law *, spies or intelligence officers * Law of agency, laws involving a person authorized to act on behalf of another ** Agent of record, a person with a contractual agreement with an insuran ...

's location within it. While this initially appears to be a

chicken or the egg The chicken or the egg causality dilemma is commonly stated as the question, "which came first: the chicken or the egg?" The dilemma stems from the observation that all chickens hatch from eggs and all chicken eggs are laid by chickens. "Chicke ...

problem, there are several

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

s known to solve it in, at least approximately, tractable time for certain environments. Popular approximate solution methods include the

particle filter Particle filters, also known as sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to find approximate solutions for filtering problems for nonlinear state-space systems, such as signal processing and Bayesian statistical ...

, extended

Kalman filter In statistics and control theory, Kalman filtering (also known as linear quadratic estimation) is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unk ...

, covariance intersection, and GraphSLAM. SLAM algorithms are based on concepts in computational geometry and

computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...

, and are used in

robot navigation Robot localization denotes the robot's ability to establish its own position and orientation within the frame of reference. Path planning is effectively an extension of localization, in that it requires the determination of the robot's current pos ...

robotic mapping Robotic mapping is a discipline related to computer vision and cartography. The goal for an autonomous robot is to be able to construct (or use) a map (outdoor use) or floor plan (indoor use) and to localize itself and its recharging bases or beac ...

and

odometry Odometry is the use of data from motion sensors to estimate change in position over time. It is used in robotics by some legged or wheeled robots to estimate their position relative to a starting location. This method is sensitive to errors due ...

for

virtual reality Virtual reality (VR) is a Simulation, simulated experience that employs 3D near-eye displays and pose tracking to give the user an immersive feel of a virtual world. Applications of virtual reality include entertainment (particularly video gam ...

augmented reality Augmented reality (AR), also known as mixed reality (MR), is a technology that overlays real-time 3D computer graphics, 3D-rendered computer graphics onto a portion of the real world through a display, such as a handheld device or head-mounted ...

. SLAM algorithms are tailored to the available resources and are not aimed at perfection but at operational compliance. Published approaches are employed in

self-driving car A self-driving car, also known as an autonomous car (AC), driverless car, robotic car or robo-car, is a car that is capable of operating with reduced or no human input. They are sometimes called robotaxis, though this term refers specifica ...

unmanned aerial vehicle An unmanned aerial vehicle (UAV) or unmanned aircraft system (UAS), commonly known as a drone, is an aircraft with no human pilot, crew, or passengers onboard, but rather is controlled remotely or is autonomous.De Gruyter Handbook of Dron ...

autonomous underwater vehicle An autonomous underwater vehicle (AUV) is a robot that travels underwater without requiring continuous input from an operator. AUVs constitute part of a larger group of undersea systems known as unmanned underwater vehicles, a classification tha ...

planetary rovers Planetary means relating to a planet or planets. It can also refer to: Science * Planetary habitability, the measure of an astronomical body's potential to develop and sustain life * Planetary nebula, an astronomical object People * Planetary (r ...

, newer

domestic robot A domestic robot or homebot is a type of service robot, an autonomous robot that is primarily used for household chores, but may also be used for education, entertainment or therapy. While most domestic robots are simplistic, some are connect ...

s and even inside the human body.

Mathematical description of the problem

Given a series of controls

u_t

and sensor observations

o_t

over discrete time steps

t

, the SLAM problem is to compute an estimate of the agent's state

x_t

and a map of the environment

m_t

. All quantities are usually probabilistic, so the objective is to compute :

P(m_,x_, o_,u_)

Applying

Bayes' rule Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting conditional probabilities, allowing one to find the probability of a cause given its effect. For example, if the risk of develo ...

gives a framework for sequentially updating the location posteriors, given a map and a transition function

P(x_t, x_)

, :

P(x_t ,  o_,u_,m_t) = \sum_ P(o_, x_t, m_t,u_) \sum_ P(x_t, x_) P(x_, m_t, o_,u_) /Z

Similarly the map can be updated sequentially by :

P(m_t ,  x_t,o_,u_) = \sum_ \sum_ P(m_t ,  x_t, m_, o_t,u_ ) P(m_,x_t ,  o_,m_,u_)

Like many inference problems, the solutions to inferring the two variables together can be found, to a local optimum solution, by alternating updates of the two beliefs in a form of an

expectation–maximization algorithm In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent varia ...

Algorithms

Statistical techniques used to approximate the above equations include

s and

s (the algorithm behind Monte Carlo Localization). They provide an estimation of the

posterior probability distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...

for the pose of the robot and for the parameters of the map. Methods which conservatively approximate the above model using

covariance intersection Covariance intersection (CI) is an algorithm for combining two or more estimates of state variables in a Kalman filter when the correlation between them is unknown. Formulation Items of information a and b are known and are to be fused into info ...

are able to avoid reliance on statistical independence assumptions to reduce algorithmic complexity for large-scale applications. Other approximation methods achieve improved computational efficiency by using simple bounded-region representations of uncertainty. Set-membership techniques are mainly based on interval constraint propagation. They provide a set which encloses the pose of the robot and a set approximation of the map.

Bundle adjustment In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s) employed to acq ...

, and more generally

maximum a posteriori estimation An estimation procedure that is often claimed to be part of Bayesian statistics is the maximum a posteriori (MAP) estimate of an unknown quantity, that equals the mode of the posterior density with respect to some reference measure, typically t ...

(MAP), is another popular technique for SLAM using image data, which jointly estimates poses and landmark positions, increasing map fidelity, and is used in commercialized SLAM systems such as Google's ARCore which replaces their prior augmented reality computing platform named Tango, formerly ''Project Tango''. MAP estimators compute the most likely explanation of the robot poses and the map given the sensor data, rather than trying to estimate the entire posterior probability. New SLAM algorithms remain an active research area, and are often driven by differing requirements and assumptions about the types of maps, sensors and models as detailed below. Many SLAM systems can be viewed as combinations of choices from each of these aspects.

Mapping

Topological maps are a method of environment representation which capture the connectivity (i.e., topology) of the environment rather than creating a geometrically accurate map. Topological SLAM approaches have been used to enforce global consistency in metric SLAM algorithms. In contrast, grid maps use arrays (typically square or hexagonal) of discretized cells to represent a topological world, and make inferences about which cells are occupied. Typically the cells are assumed to be statistically independent to simplify computation. Under such assumption,

P(m_t ,  x_t, m_, o_t )

are set to 1 if the new map's cells are consistent with the observation

o_t

at location

x_t

and 0 if inconsistent. Modern self driving cars mostly simplify the mapping problem to almost nothing, by making extensive use of highly detailed map data collected in advance. This can include map annotations to the level of marking locations of individual white line segments and curbs on the road. Location-tagged visual data such as Google's StreetView may also be used as part of maps. Essentially such systems simplify the SLAM problem to a simpler localization only task, perhaps allowing for moving objects such as cars and people only to be updated in the map at runtime.

Sensing

SLAM will always use several different types of sensors, and the powers and limits of various sensor types have been a major driver of new algorithms. Statistical independence is the mandatory requirement to cope with metric bias and with noise in measurements. Different types of sensors give rise to different SLAM algorithms which assumptions are most appropriate to the sensors. At one extreme, laser scans or visual features provide details of many points within an area, sometimes rendering SLAM inference unnecessary because shapes in these point clouds can be easily and unambiguously aligned at each step via

image registration Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, mil ...

. At the opposite extreme,

tactile sensor A tactile sensor is a device that measures information arising from physical interaction with its environment. Tactile sensors are generally modeled after the biological sense of cutaneous receptor, cutaneous touch which is capable of detect ...

s are extremely sparse as they contain only information about points very close to the agent, so they require strong prior models to compensate in purely tactile SLAM. Most practical SLAM tasks fall somewhere between these visual and tactile extremes. Sensor models divide broadly into landmark-based and raw-data approaches. Landmarks are uniquely identifiable objects in the world which location can be estimated by a sensor, such as

Wi-Fi Wi-Fi () is a family of wireless network protocols based on the IEEE 802.11 family of standards, which are commonly used for Wireless LAN, local area networking of devices and Internet access, allowing nearby digital devices to exchange data by ...

access points or radio beacons. Raw-data approaches make no assumption that landmarks can be identified, and instead model

P(o_t, x_t)

directly as a function of the location. Optical sensors may be one-dimensional (single beam) or 2D- (sweeping)

laser rangefinder A laser rangefinder, also known as a laser telemeter or laser distance meter, is a rangefinder that uses a laser beam to determine the distance to an object. The most common form of laser rangefinder operates on the time of flight principle by ...

s, 3D high definition light detection and ranging (

lidar Lidar (, also LIDAR, an acronym of "light detection and ranging" or "laser imaging, detection, and ranging") is a method for determining ranging, ranges by targeting an object or a surface with a laser and measuring the time for the reflected li ...

), 3D flash lidar, 2D or 3D

sonar Sonar (sound navigation and ranging or sonic navigation and ranging) is a technique that uses sound propagation (usually underwater, as in submarine navigation) to navigate, measure distances ( ranging), communicate with or detect objects o ...

sensors, and one or more 2D

camera A camera is an instrument used to capture and store images and videos, either digitally via an electronic image sensor, or chemically via a light-sensitive material such as photographic film. As a pivotal technology in the fields of photograp ...

s. Since the invention of local features, such as SIFT, there has been intense research into visual SLAM (VSLAM) using primarily visual (camera) sensors, because of the increasing ubiquity of cameras such as those in mobile devices. Follow up research includes. Both visual and

sensors are informative enough to allow for landmark extraction in many cases. Other recent forms of SLAM include tactile SLAM (sensing by local touch only), radar SLAM, acoustic SLAM, and Wi-Fi-SLAM (sensing by strengths of nearby Wi-Fi access points). Recent approaches apply quasi-optical

wireless Wireless communication (or just wireless, when the context allows) is the transfer of information (''telecommunication'') between two or more points without the use of an electrical conductor, optical fiber or other continuous guided transm ...

ranging for multi-lateration (

real-time locating system Real-time locating systems (RTLS), also known as real-time tracking systems, are used to automatically identify and track the location of objects or people in real time, usually within a building or other contained area. Wireless RTLS tags are ...

(RTLS)) or multi-angulation in conjunction with SLAM as a tribute to erratic wireless measures. A kind of SLAM for human pedestrians uses a shoe mounted

inertial measurement unit An inertial measurement unit (IMU) is an electronic device that measures and reports a body's specific force, angular rate, and sometimes the Orientation (geometry), orientation of the body, using a combination of accelerometers, gyroscopes, an ...

as the main sensor and relies on the fact that pedestrians are able to avoid walls to automatically build floor plans of buildings by an

indoor positioning system An indoor positioning system (IPS) is a network of devices used to locate people or objects where GPS and other satellite technologies lack precision or fail entirely, such as inside multistory buildings, airports, alleys, parking garages, and u ...

. For some outdoor applications, the need for SLAM has been almost entirely removed due to high precision differential

GPS The Global Positioning System (GPS) is a satellite-based hyperbolic navigation system owned by the United States Space Force and operated by Mission Delta 31. It is one of the global navigation satellite systems (GNSS) that provide geol ...

sensors. From a SLAM perspective, these may be viewed as location sensors which likelihoods are so sharp that they completely dominate the inference. However, GPS sensors may occasionally decline or go down entirely, e.g. during times of military conflict, which are of particular interest to some robotics applications.

Kinematics modeling

The

P(x_t, x_)

term represents the kinematics of the model, which usually include information about action commands given to a robot. As a part of the model, the kinematics of the robot is included, to improve estimates of sensing under conditions of inherent and ambient noise. The dynamic model balances the contributions from various sensors, various partial error models and finally comprises in a sharp virtual depiction as a map with the location and heading of the robot as some cloud of probability. Mapping is the final depicting of such model, the map is either such depiction or the abstract term for the model. For 2D robots, the kinematics are usually given by a mixture of rotation and "move forward" commands, which are implemented with additional motor noise. Unfortunately the distribution formed by independent noise in angular and linear directions is non-Gaussian, but is often approximated by a Gaussian. An alternative approach is to ignore the kinematic term and read odometry data from robot wheels after each command—such data may then be treated as one of the sensors rather than as kinematics.

Moving objects

Non-static environments, such as those containing other vehicles or pedestrians, continue to present research challenges. SLAM with DATMO is a model which tracks moving objects in a similar way to the agent itself.

Loop closure

Loop closure is the problem of recognizing a previously-visited location and updating beliefs accordingly. This can be a problem because model or algorithm errors can assign low priors to the location. Typical loop closure methods apply a second algorithm to compute some type of sensor measure similarity, and reset the location priors when a match is detected. For example, this can be done by storing and comparing

bag of words The bag-of-words (BoW) model is a model of text which uses an unordered collection (a "bag") of words. It is used in natural language processing and information retrieval (IR). It disregards word order (and thus most of syntax or grammar) but ca ...

vectors of

scale-invariant feature transform The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local '' features'' in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, ...

(SIFT) features from each previously visited location.

Exploration

''Active SLAM'' studies the combined problem of SLAM with deciding where to move next to build the map as efficiently as possible. The need for active exploration is especially pronounced in sparse sensing regimes such as tactile SLAM. Active SLAM is generally performed by approximating the

entropy Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...

of the map under hypothetical actions. "Multi agent SLAM" extends this problem to the case of multiple robots coordinating themselves to explore optimally.

Biological inspiration

In neuroscience, the

hippocampus The hippocampus (: hippocampi; via Latin from Ancient Greek, Greek , 'seahorse'), also hippocampus proper, is a major component of the brain of humans and many other vertebrates. In the human brain the hippocampus, the dentate gyrus, and the ...

appears to be involved in SLAM-like computations, giving rise to

place cells A place cell is a kind of pyramidal neuron in the hippocampus that becomes active when an animal enters a particular place in its environment, which is known as the place field. Place cells are thought to act collectively as a cognitive represe ...

, and has formed the basis for bio-inspired SLAM systems such as RatSLAM.

Collaborative SLAM

''Collaborative SLAM'' combines sensors from multiple robots or users to generate 3D maps. This capability was demonstrated by a number of teams in the 2021 DARPA Subterranean Challenge.

Specialized SLAM methods

Acoustic SLAM

An extension of the common SLAM problem has been applied to the acoustic domain, where environments are represented by the three-dimensional (3D) position of sound sources, termed aSLAM (Acoustic Simultaneous Localization and Mapping). Early implementations of this technique have used direction-of-arrival (DoA) estimates of the sound source location, and rely on principal techniques of

sound localization Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance. The sound localization mechanisms of the mammalian auditory system have been extensively studied. The auditory system u ...

to determine source locations. An observer, or robot must be equipped with a

microphone array A microphone array is any number of microphones operating in tandem. There are many applications: * Systems for extracting voice input from ambient noise level, ambient noise (notably telephones, speech recognition systems, hearing aids) * Sur ...

to enable use of Acoustic SLAM, so that DoA features are properly estimated. Acoustic SLAM has paved foundations for further studies in acoustic scene mapping, and can play an important role in human-robot interaction through speech. To map multiple, and occasionally intermittent sound sources, an acoustic SLAM system uses foundations in random finite set theory to handle the varying presence of acoustic landmarks. However, the nature of acoustically derived features leaves Acoustic SLAM susceptible to problems of reverberation, inactivity, and noise within an environment.

Audiovisual SLAM

Originally designed for human–robot interaction, Audio-Visual SLAM is a framework that provides the fusion of landmark features obtained from both the acoustic and visual modalities within an environment. Human interaction is characterized by features perceived in not only the visual modality, but the acoustic modality as well; as such, SLAM algorithms for human-centered robots and machines must account for both sets of features. An Audio-Visual framework estimates and maps positions of human landmarks through use of visual features like human pose, and audio features like human speech, and fuses the beliefs for a more robust map of the environment. For applications in mobile robotics (ex. drones, service robots), it is valuable to use low-power, lightweight equipment such as monocular cameras, or microelectronic microphone arrays. Audio-Visual SLAM can also allow for complimentary function of such sensors, by compensating the narrow field-of-view, feature occlusions, and optical degradations common to lightweight visual sensors with the full field-of-view, and unobstructed feature representations inherent to audio sensors. The susceptibility of audio sensors to reverberation, sound source inactivity, and noise can also be accordingly compensated through fusion of landmark beliefs from the visual modality. Complimentary function between the audio and visual modalities in an environment can prove valuable for the creation of robotics and machines that fully interact with human speech and human movement.

Implementation methods

Various SLAM algorithms are implemented in the

open-source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...

Robot Operating System Robot Operating System (ROS or ros) is an Open-source software, open-source robotics middleware suite. Although ROS is not an operating system (OS) but a set of software frameworks for robot software software development, development, it provide ...

(ROS) libraries, often used together with the

Point Cloud Library The Point Cloud Library (PCL) is an open-source library of algorithms for point cloud processing tasks and 3D geometry processing, such as occur in three-dimensional computer vision. The library contains algorithms for filtering, feature estimati ...

for 3D maps or visual features from

OpenCV OpenCV (Open Source Computer Vision Library) is a Library (computing), library of programming functions mainly for Real-time computing, real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage, then Itseez ...

EKF SLAM

robotics Robotics is the interdisciplinary study and practice of the design, construction, operation, and use of robots. Within mechanical engineering, robotics is the design and construction of the physical structures of robots, while in computer s ...

, ''EKF SLAM'' is a class of algorithms which uses the

extended Kalman filter In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered t ...

(EKF) for SLAM. Typically, EKF SLAM algorithms are feature based, and use the maximum likelihood algorithm for data association. In the 1990s and 2000s, EKF SLAM had been the de facto method for SLAM, until the introduction of FastSLAM. Associated with the EKF is the gaussian noise assumption, which significantly impairs EKF SLAM's ability to deal with uncertainty. With greater amount of uncertainty in the posterior, the linearization in the EKF fails.

GraphSLAM

, GraphSLAM is a SLAM algorithm which uses sparse information matrices produced by generating a

factor graph A factor graph is a bipartite graph representing the factorization of a function (mathematics), function. In probability theory and its applications, factor graphs are used to represent factorization of a Probability distribution function (disam ...

of observation interdependencies (two observations are related if they contain data about the same landmark). It is based on optimization algorithms.

History

A seminal work in SLAM is the research of Smith and Cheeseman on the representation and estimation of spatial uncertainty in 1986. Other pioneering work in this field was conducted by the research group of Hugh F. Durrant-Whyte in the early 1990s. which showed that solutions to SLAM exist in the infinite data limit. This finding motivates the search for algorithms which are computationally tractable and approximate the solution. The acronym SLAM was coined within the paper, "Localization of Autonomous Guided Vehicles" which first appeared in ISR in 1995. The self-driving STANLEY and JUNIOR cars, led by

Sebastian Thrun Sebastian Thrun (born May 14, 1967) is a German-American entrepreneur, educator, and computer scientist. He is chief executive officer of Kitty Hawk Corporation, and chairman and co-founder of Udacity. Before that, he was a Google vice preside ...

, won the DARPA Grand Challenge and came second in the DARPA Urban Challenge in the 2000s, and included SLAM systems, bringing SLAM to worldwide attention. Mass-market SLAM implementations can now be found in consumer robot vacuum cleaners and

virtual reality headsets A virtual reality headset (or VR headset) is a head-mounted device that uses 3D near-eye displays and positional tracking to provide a virtual reality environment for the user. VR headsets are widely used with VR video games, but they are als ...

such as the

Meta Quest 2 Quest 2 is a standalone virtual reality headset developed by Reality Labs, a division of Meta Platforms. It was unveiled on September 16, 2020, and released on October 13, 2020 as the Oculus Quest 2. It was then rebranded as the Meta Quest 2 ...

and

PICO 4 PICO 4 is a virtual reality headset developed by ByteDance. It is designed for virtual reality games and is only available in Europe and East Asia (China, South Korea, Japan, Malaysia and Singapore). It is currently not available in the United ...

for markerless inside-out tracking.

References

External links

Probabilistic Robotics
by

Wolfram Burgard Wolfram Burgard (born 1961 in Gelsenkirchen, Germany) is a German roboticist. He is a full professor at the University of Technology Nuremberg where he heads the Laboratory for Robotics and Artificial Intelligence. He is known for his substantial ...

and Dieter Fox with a clear overview of SLAM.
SLAM For Dummies (A Tutorial Approach to Simultaneous Localization and Mapping)

Andrew Davison
research page at the Department of Computing,

Imperial College London Imperial College London, also known as Imperial, is a Public university, public research university in London, England. Its history began with Prince Albert of Saxe-Coburg and Gotha, Prince Albert, husband of Queen Victoria, who envisioned a Al ...

about SLAM using vision.
openslam.org
A good collection of open source code and explanations of SLAM.
Matlab Toolbox of Kalman Filtering applied to Simultaneous Localization and Mapping
Vehicle moving in 1D, 2D and 3D.

at

German Aerospace Center The German Aerospace Center (, abbreviated DLR, literally ''German Center for Air- and Space-flight'') is the national center for aerospace, energy and transportation research of Germany, founded in 1969. It is headquartered in Cologne with 3 ...

(DLR) including the related Wi-Fi SLAM and PlaceSLAM approaches.
SLAM lecture
Online SLAM lecture based on Python. {{DEFAULTSORT:Simultaneous Localization And Mapping Computational geometry Robot navigation Applied machine learning Motion in computer vision Positioning