Travelling salesman problem solved with simulated annealing

3D TSP solved with simulated annealing 2

Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a

metaheuristic In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimizati ...

to approximate global optimization in a large search space for an

optimization problem In mathematics, computer science and economics, an optimization problem is the problem of finding the ''best'' solution from all feasible solutions. Optimization problems can be divided into two categories, depending on whether the variables ...

. It is often used when the search space is discrete (for example the

traveling salesman problem The travelling salesman problem (also called the travelling salesperson problem or TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each cit ...

, the

boolean satisfiability problem In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisf ...

protein structure prediction Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different ...

, and

job-shop scheduling Job-shop scheduling, the job-shop problem (JSP) or job-shop scheduling problem (JSSP) is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. In a general job scheduling problem, we are gi ...

). For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to exact algorithms such as gradient descent or

branch and bound Branch and bound (BB, B&B, or BnB) is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solut ...

. The name of the algorithm comes from annealing in metallurgy, a technique involving heating and controlled cooling of a material to alter its physical properties. Both are attributes of the material that depend on their thermodynamic free energy. Heating and cooling the material affects both the temperature and the thermodynamic free energy or Gibbs energy. Simulated annealing can be used for very hard computational optimization problems where exact algorithms fail; even though it usually achieves an approximate solution to the global minimum, it could be enough for many practical problems. The problems solved by SA are currently formulated by an objective function of many variables, subject to several constraints. In practice, the constraint can be penalized as part of the objective function. Similar techniques have been independently introduced on several occasions, including Pincus (1970), Khachaturyan et al (1979, 1981), Kirkpatrick, Gelatt and Vecchi (1983), and Cerny (1985). In 1983, this approach was used by Kirkpatrick, Gelatt Jr., Vecchi, for a solution of the traveling salesman problem. They also proposed its current name, simulated annealing. This notion of slow cooling implemented in the simulated annealing algorithm is interpreted as a slow decrease in the probability of accepting worse solutions as the solution space is explored. Accepting worse solutions allows for a more extensive search for the global optimal solution. In general, simulated annealing algorithms work as follows. The temperature progressively decreases from an initial positive value to zero. At each time step, the algorithm randomly selects a solution close to the current one, measures its quality, and moves to it according to the temperature-dependent probabilities of selecting better or worse solutions, which during the search respectively remain at 1 (or positive) and decrease toward zero. The simulation can be performed either by a solution of kinetic equations for density functions or by using the stochastic sampling method. The method is an adaptation of the

Metropolis–Hastings algorithm In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This se ...

, a

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deter ...

to generate sample states of a thermodynamic system, published by N. Metropolis et al. in 1953.

Overview

The

state State may refer to: Arts, entertainment, and media Literature * ''State Magazine'', a monthly magazine published by the U.S. Department of State * ''The State'' (newspaper), a daily newspaper in Columbia, South Carolina, United States * ''Our S ...

of some

physical system A physical system is a collection of physical objects. In physics, it is a portion of the physical universe chosen for analysis. Everything outside the system is known as the environment. The environment is ignored except for its effects on the ...

s, and the function ''E''(''s'') to be minimized, is analogous to the

internal energy The internal energy of a thermodynamic system is the total energy contained within it. It is the energy necessary to create or prepare the system in its given internal state, and includes the contributions of potential energy and internal kinet ...

of the system in that state. The goal is to bring the system, from an arbitrary ''initial state'', to a state with the minimum possible energy. Hill Climbing with Simulated Annealing

The basic iteration

At each step, the simulated annealing heuristic considers some neighboring state ''s*'' of the current state ''s'', and

probabilistic Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

ally decides between moving the system to state ''s*'' or staying in-state ''s''. These probabilities ultimately lead the system to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.

The neighbours of a state

Optimization of a solution involves evaluating the neighbours of a state of the problem, which are new states produced through conservatively altering a given state. For example, in the

each state is typically defined as a

permutation In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or pro ...

of the cities to be visited, and the neighbors of any state are the set of permutations produced by swapping any two of these cities. The well-defined way in which the states are altered to produce neighboring states is called a "move", and different moves give different sets of neighboring states. These moves usually result in minimal alterations of the last state, in an attempt to progressively improve the solution through iteratively improving its parts (such as the city connections in the traveling salesman problem). Simple

heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...

s like

hill climbing numerical analysis, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solutio ...

, which move by finding better neighbour after better neighbour and stop when they have reached a solution which has no neighbours that are better solutions, cannot guarantee to lead to any of the existing better solutions their outcome may easily be just a local optimum, while the actual best solution would be a global optimum that could be different.

Metaheuristic In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimizati ...

s use the neighbours of a solution as a way to explore the solution space, and although they prefer better neighbours, they also accept worse neighbours in order to avoid getting stuck in local optima; they can find the global optimum if run for a long enough amount of time.

Acceptance probabilities

The probability of making the transition from the current state

s

to a candidate new state

s_\mathrm

is specified by an ''acceptance probability function''

P(e, e_\mathrm, T)

, that depends on the energies

e = E(s)

and

e_\mathrm= E(s_\mathrm)

of the two states, and on a global time-varying parameter

T

called the ''temperature''. States with a smaller energy are better than those with a greater energy. The probability function

P

must be positive even when

e_\mathrm

is greater than

e

. This feature prevents the method from becoming stuck at a local minimum that is worse than the global one. When

T

tends to zero, the probability

P(e, e_\mathrm, T)

must tend to zero if

e_\mathrm > e

and to a positive value otherwise. For sufficiently small values of

T

, the system will then increasingly favor moves that go "downhill" (i.e., to lower energy values), and avoid those that go "uphill." With

T=0

the procedure reduces to the greedy algorithm, which makes only the downhill transitions. In the original description of simulated annealing, the probability

P(e, e_\mathrm, T)

was equal to 1 when

e_\mathrm < e

—i.e., the procedure always moved downhill when it found a way to do so, irrespective of the temperature. Many descriptions and implementations of simulated annealing still take this condition as part of the method's definition. However, this condition is not essential for the method to work. The

P

function is usually chosen so that the probability of accepting a move decreases when the difference

e_\mathrm-e

increases—that is, small uphill moves are more likely than large ones. However, this requirement is not strictly necessary, provided that the above requirements are met. Given these properties, the temperature

T

plays a crucial role in controlling the evolution of the state

s

of the system with regard to its sensitivity to the variations of system energies. To be precise, for a large

T

, the evolution of

s

is sensitive to coarser energy variations, while it is sensitive to finer energy variations when

T

is small.

The annealing schedule

The name and inspiration of the algorithm demand an interesting feature related to the temperature variation to be embedded in the operational characteristics of the algorithm. This necessitates a gradual reduction of the temperature as the simulation proceeds. The algorithm starts initially with

T

set to a high value (or infinity), and then it is decreased at each step following some ''annealing schedule''—which may be specified by the user, but must end with

T=0

towards the end of the allotted time budget. In this way, the system is expected to wander initially towards a broad region of the search space containing good solutions, ignoring small features of the energy function; then drift towards low-energy regions that become narrower and narrower, and finally move downhill according to the steepest descent heuristic. For any given finite problem, the probability that the simulated annealing algorithm terminates with a global optimal solution approaches 1 as the annealing schedule is extended. This theoretical result, however, is not particularly helpful, since the time required to ensure a significant probability of success will usually exceed the time required for a complete search of the

solution space In mathematical optimization, a feasible region, feasible set, search space, or solution space is the set of all possible points (sets of values of the choice variables) of an optimization problem that satisfy the problem's constraints, pote ...

Pseudocode

The following pseudocode presents the simulated annealing heuristic as described above. It starts from a state and continues until a maximum of steps have been taken. In the process, the call should generate a randomly chosen neighbour of a given state ; the call should pick and return a value in the range , uniformly at random. The annealing schedule is defined by the call , which should yield the temperature to use, given the fraction of the time budget that has been expended so far.

* Let * For through (exclusive): ** ** Pick a random neighbour, ** If : *** * Output: the final state

Selecting the parameters

In order to apply the simulated annealing method to a specific problem, one must specify the following parameters: the state space, the energy (goal) function , the candidate generator procedure , the acceptance probability function , and the annealing schedule AND initial temperature . These choices can have a significant impact on the method's effectiveness. Unfortunately, there are no choices of these parameters that will be good for all problems, and there is no general way to find the best choices for a given problem. The following sections give some general guidelines.

Sufficiently near neighbour

Simulated annealing may be modeled as a random walk on a search graph, whose vertices are all possible states, and whose edges are the candidate moves. An essential requirement for the function is that it must provide a sufficiently short path on this graph from the initial state to any state which may be the global optimum the diameter of the search graph must be small. In the traveling salesman example above, for instance, the search space for n = 20 cities has n! = 2,432,902,008,176,640,000 (2.4 quintillion) states; yet the number of neighbors of each vertex is

\sum_^ k=\frac=190

edges (coming from n choose 2), and the diameter of the graph is

n-1

Transition probabilities

To investigate the behavior of simulated annealing on a particular problem, it can be useful to consider the ''transition probabilities'' that result from the various design choices made in the implementation of the algorithm. For each edge

(s,s')

of the search graph, the transition probability is defined as the probability that the simulated annealing algorithm will move to state

s'

when its current state is

s

. This probability depends on the current temperature as specified by , on the order in which the candidate moves are generated by the function, and on the acceptance probability function . (Note that the transition probability is not simply

P(e, e', T)

, because the candidates are tested serially.)

Acceptance probabilities

The specification of , , and is partially redundant. In practice, it's common to use the same acceptance function for many problems, and adjust the other two functions according to the specific problem. In the formulation of the method by Kirkpatrick et al., the acceptance probability function

P(e, e', T)

was defined as 1 if

e' < e

, and

\exp(-(e'-e)/T)

otherwise. This formula was superficially justified by analogy with the transitions of a physical system; it corresponds to the

, in the case where T=1 and the proposal distribution of Metropolis–Hastings is symmetric. However, this acceptance probability is often used for simulated annealing even when the function, which is analogous to the proposal distribution in Metropolis–Hastings, is not symmetric, or not probabilistic at all. As a result, the transition probabilities of the simulated annealing algorithm do not correspond to the transitions of the analogous physical system, and the long-term distribution of states at a constant temperature

T

need not bear any resemblance to the thermodynamic equilibrium distribution over states of that physical system, at any temperature. Nevertheless, most descriptions of simulated annealing assume the original acceptance function, which is probably hard-coded in many implementations of SA. In 1990, Moscato and Fontanari, and independently Dueck and Scheuer, proposed that a deterministic update (i.e. one that is not based on the probabilistic acceptance rule) could speed-up the optimization process without impacting on the final quality. Moscato and Fontanari conclude from observing the analogous of the "specific heat" curve of the "threshold updating" annealing originating from their study that "the stochasticity of the Metropolis updating in the simulated annealing algorithm does not play a major role in the search of near-optimal minima". Instead, they proposed that "the smoothening of the cost function landscape at high temperature and the gradual definition of the minima during the cooling process are the fundamental ingredients for the success of simulated annealing." The method subsequently popularized under the denomination of "threshold accepting" due to Dueck and Scheuer's denomination. In 2001, Franz, Hoffmann and Salamon showed that the deterministic update strategy is indeed the optimal one within the large class of algorithms that simulate a random walk on the cost/energy landscape.

Efficient candidate generation

When choosing the candidate generator , one must consider that after a few iterations of the simulated annealing algorithm, the current state is expected to have much lower energy than a random state. Therefore, as a general rule, one should skew the generator towards candidate moves where the energy of the destination state

s'

is likely to be similar to that of the current state. This

(which is the main principle of the

) tends to exclude "very good" candidate moves as well as "very bad" ones; however, the former are usually much less common than the latter, so the heuristic is generally quite effective. In the traveling salesman problem above, for example, swapping two ''consecutive'' cities in a low-energy tour is expected to have a modest effect on its energy (length); whereas swapping two ''arbitrary'' cities is far more likely to increase its length than to decrease it. Thus, the consecutive-swap neighbour generator is expected to perform better than the arbitrary-swap one, even though the latter could provide a somewhat shorter path to the optimum (with

n-1

swaps, instead of

n(n-1)/2

). A more precise statement of the heuristic is that one should try first candidate states

s'

for which

P(E(s), E(s'), T)

is large. For the "standard" acceptance function

P

above, it means that

E(s') - E(s)

is on the order of

T

or less. Thus, in the traveling salesman example above, one could use a function that swaps two random cities, where the probability of choosing a city-pair vanishes as their distance increases beyond

T

Barrier avoidance

When choosing the candidate generator one must also try to reduce the number of "deep" local minima—states (or sets of connected states) that have much lower energy than all its neighbouring states. Such "closed

catchment A drainage basin is an area of land where all flowing surface water converges to a single point, such as a river mouth, or flows into another body of water, such as a lake or ocean. A basin is separated from adjacent basins by a perimeter, ...

basins" of the energy function may trap the simulated annealing algorithm with high probability (roughly proportional to the number of states in the basin) and for a very long time (roughly exponential on the energy difference between the surrounding states and the bottom of the basin). As a rule, it is impossible to design a candidate generator that will satisfy this goal and also prioritize candidates with similar energy. On the other hand, one can often vastly improve the efficiency of simulated annealing by relatively simple changes to the generator. In the traveling salesman problem, for instance, it is not hard to exhibit two tours

A

B

, with nearly equal lengths, such that (1)

A

is optimal, (2) every sequence of city-pair swaps that converts

A

B

goes through tours that are much longer than both, and (3)

A

can be transformed into

B

by flipping (reversing the order of) a set of consecutive cities. In this example,

A

and

B

lie in different "deep basins" if the generator performs only random pair-swaps; but they will be in the same basin if the generator performs random segment-flips.

Cooling schedule

The physical analogy that is used to justify simulated annealing assumes that the cooling rate is low enough for the probability distribution of the current state to be near

thermodynamic equilibrium Thermodynamic equilibrium is an axiomatic concept of thermodynamics. It is an internal state of a single thermodynamic system, or a relation between several thermodynamic systems connected by more or less permeable or impermeable walls. In the ...

at all times. Unfortunately, the ''relaxation time''—the time one must wait for the equilibrium to be restored after a change in temperature—strongly depends on the "topography" of the energy function and on the current temperature. In the simulated annealing algorithm, the relaxation time also depends on the candidate generator, in a very complicated way. Note that all these parameters are usually provided as black box functions to the simulated annealing algorithm. Therefore, the ideal cooling rate cannot be determined beforehand, and should be empirically adjusted for each problem. Adaptive simulated annealing algorithms address this problem by connecting the cooling schedule to the search progress. Other adaptive approach as Thermodynamic Simulated Annealing, automatically adjusts the temperature at each step based on the energy difference between the two states, according to the laws of thermodynamics.

Restarts

Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This process is called ''restarting'' of simulated annealing. To do this we set s and e to sbest and ebest and perhaps restart the annealing schedule. The decision to restart could be based on several criteria. Notable among these include restarting based on a fixed number of steps, based on whether the current energy is too high compared to the best energy obtained so far, restarting randomly, etc.

Related methods

* Interacting Metropolis–Hasting algorithms (a.k.a. sequential Monte Carlo) combines simulated annealing moves with an acceptance-rejection of the best fitted individuals equipped with an interacting recycling mechanism. *

Quantum annealing Quantum annealing (QA) is an optimization process for finding the global minimum of a given objective function over a given set of candidate solutions (candidate states), by a process using quantum fluctuations. Quantum annealing is used mainl ...

uses "quantum fluctuations" instead of thermal fluctuations to get through high but thin barriers in the target function. * Stochastic tunneling attempts to overcome the increasing difficulty simulated annealing runs have in escaping from local minima as the temperature decreases, by 'tunneling' through barriers. *

Tabu search Tabu search is a metaheuristic search method employing local search methods used for mathematical optimization. It was created by Fred W. Glover in 1986 and formalized in 1989. Local (neighborhood) searches take a potential solution to a pro ...

normally moves to neighbouring states of lower energy, but will take uphill moves when it finds itself stuck in a local minimum; and avoids cycles by keeping a "taboo list" of solutions already seen. * Dual-phase evolution is a family of algorithms and processes (to which simulated annealing belongs) that mediate between local and global search by exploiting phase changes in the search space. * Reactive search optimization focuses on combining machine learning with optimization, by adding an internal feedback loop to self-tune the free parameters of an algorithm to the characteristics of the problem, of the instance, and of the local situation around the current solution. * Genetic algorithms maintain a pool of solutions rather than just one. New candidate solutions are generated not only by "mutation" (as in SA), but also by "recombination" of two solutions from the pool. Probabilistic criteria, similar to those used in SA, are used to select the candidates for mutation or combination, and for discarding excess solutions from the pool. *

Memetic algorithms A memetic algorithm (MA) in computer science and operations research, is an extension of the traditional genetic algorithm. It may provide a sufficiently good solution to an optimization problem. It uses a local search technique to reduce the like ...

search for solutions by employing a set of agents that both cooperate and compete in the process; sometimes the agents' strategies involve simulated annealing procedures for obtaining high quality solutions before recombining them. Annealing has also been suggested as a mechanism for increasing the diversity of the search. *

Graduated optimization Graduated optimization is a global optimization technique that attempts to solve a difficult optimization problem by initially solving a greatly simplified problem, and progressively transforming that problem (while optimizing) until it is equivalen ...

digressively "smooths" the target function while optimizing. * Ant colony optimization (ACO) uses many ants (or agents) to traverse the solution space and find locally productive areas. * The cross-entropy method (CE) generates candidates solutions via a parameterized probability distribution. The parameters are updated via cross-entropy minimization, so as to generate better samples in the next iteration. *

Harmony search This is a chronologically ordered list of metaphor-based metaheuristics and swarm intelligence algorithms, sorted by decade of proposal. Algorithms 1980s-1990s Simulated annealing (Kirkpatrick et al., 1983) Simulated annealing is a pro ...

mimics musicians in improvisation process where each musician plays a note for finding a best harmony all together. *

Stochastic optimization Stochastic optimization (SO) methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involves random objective functi ...

is an umbrella set of methods that includes simulated annealing and numerous other approaches. * Particle swarm optimization is an algorithm modeled on swarm intelligence that finds a solution to an optimization problem in a search space, or model and predict social behavior in the presence of objectives. * The runner-root algorithm (RRA) is a meta-heuristic optimization algorithm for solving unimodal and multimodal problems inspired by the runners and roots of plants in nature. *

Intelligent water drops algorithm This is a chronologically ordered list of metaphor-based metaheuristics and swarm intelligence algorithms, sorted by decade of proposal. Algorithms 1980s-1990s Simulated annealing (Kirkpatrick et al., 1983) Simulated annealing is a ...

(IWD) which mimics the behavior of natural water drops to solve optimization problems * Parallel tempering is a simulation of model copies at different temperatures (or Hamiltonians) to overcome the potential barriers. * Multi-objective simulated annealing algorithms have been used in

multi-objective optimization Multi-objective optimization (also known as multi-objective programming, vector optimization, multicriteria optimization, multiattribute optimization or Pareto optimization) is an area of multiple criteria decision making that is concerned with ...

References

External links

A Javascript app that allows you to experiment with simulated annealing. Source code included.
"General Simulated Annealing Algorithm"
An open-source MATLAB program for general simulated annealing exercises. * Self-Guided Lesson on Simulated Annealing A Wikiversity project.
Google in superposition of using, not using quantum computer
Ars Technica discusses the possibility that the D-Wave computer being used by Google may, in fact, be an efficient simulated annealing co-processor.

A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA. {{DEFAULTSORT:Simulated Annealing Metaheuristics Optimization algorithms and methods Monte Carlo methods

Overview

The basic iteration

The neighbours of a state

Acceptance probabilities

The annealing schedule

Pseudocode

Selecting the parameters

Sufficiently near neighbour

Transition probabilities

Acceptance probabilities

Efficient candidate generation

Barrier avoidance

Cooling schedule

Restarts

Related methods

See also

References

Further reading

External links