IteratedLocalSearch [608883]

IteratedLocalSearch
Helena R. Lourenc ¸o
UniversitatPompeuFabra, Barcelona, Spain
[anonimizat]
OlivierC. Martin
/A3
Universit´eParis-Sud,Orsay, France
[anonimizat]
ThomasSt¨ utzle
/DD
DarmstadtUniversityof Technology,Darmstadt,Germany
[anonimizat]
Abstract
IteratedLocalSearchhasmanyofthedesirablefeaturesofametaheuris-
tic: itissimple,easytoimplement,robust,andhighlyeffective. Theessential
ideaofIteratedLocalSearchlies infocusingthesearchnotonthe fullspace
of solutions but on a smaller subspace defined by the solutions that are lo-cally optimal for a given optimizationengine. The success of Iterated Local
Search lies in the biased sampling of this set of local optima. How effec-
tive this approach turns out to be depends mainly on the choice of the local
search, the perturbations, and the acceptance criterion. So far, in spite of
its conceptual simplicity, it has lead to a number of state-of-the-art resultswithout the use of too much problem-specific knowledge. But with further
work so that the different modules are well adapted to the problem at hand,
IteratedLocalSearchcanoftenbecomeacompetitiveorevenstateoftheartalgorithm. The purpose of this review is both to give a detailed description
ofthismetaheuristicandtoshowwhereit standsinterms ofperformance.
/A3O.M.acknowledges support from the Institut Universitairede France./DDThis work was partially supported by the “Metaheuristics Network”, a Research Training Net-
work funded by the Improving Human Potential programme of the CEC, grant HPRN-CT-1999-00106. The information provided is the sole responsibility of the authors and does not reflect the
Community’s opinion. The Community is not responsible for any use that might be made of data
appearing inthis publication.
1

1 Introduction
The importance of high performance algorithms for tackling difficult optimization
problems cannot be understated, and in many cases the only available methods aremetaheuristics. When designing a metaheuristic, it is preferable that it be simple,both conceptually and in practice. Naturally, it also must be effective, and if pos-
sible, general purpose. If we think of a metaheuristic as simply a construction for
guiding (problem-specific) heuristics, the ideal case is when the metaheuristic canbe used without anyproblem-dependent knowledge. An excellent example of this
issimulated annealing. Givenaneighborhood structure asinlocalsearch, itguidesthe local changes, allowing for non-improving changes from time to time. Thisavoids premature trapping of the local search and is surprizingly effective. Thussimulated annealing has the very satisfying property of being conceptually simpleand general purpose, while requiring no problem-specific knowledge.
1
As metaheuristics have become more and more sophisticated, this ideal case
has been pushed aside in the quest for greater performance. As a consequence,problem-specificknowledge(inadditiontothatbuiltintotheheuristicbeingguided)must now be incorporated into metaheuristics in order to reach the state of the art
level. Unfortunately, thismakestheboundarybetweenheuristicsand metaheuristics
fuzzy, and we run the risk of loosing both simplicity and generality. To counterthis, we appeal to modularity and try to break a metaheuristic algorithm into afew parts, each with their own specificity. In particular, we would like to have atotally general purpose part, while any problem-specific knowledge built into themetaheuristic would be restricted to another part. Finally, to the extent possible,weprefer to leave untouched the embedded heuristic (which is to be “guided”) be-cause of its potential complexity; thus we ask that it can be kept as a “black-box”routine. Iterated local search provides a simple way to satisfy these requirements.
The essence of the iterated local search metaheuristic can be given in a nut-
shell: one iteratively builds a sequence of solutions generated by the embedded
heuristic, leading to far better solutions than if one were to use repeated random
trials of that heuristic. This simple idea has a long history, and its rediscoveryby many authors has lead to many different names for iterated local search likeiterateddescent [9,8],large-stepMarkovchains [48],iteratedLin-Kernighan [36],
chained local optimization [47],orcombinations of these [2]… Readers interested
in the historical developments of these ideas should consult the review [37]. Forus, there are two main points that make an algorithm an iterated local search: (i)there must be a single chain that is being followed (this then excludes population-
1Simulated annealing does need to be adapted to each problem through a temperature schedule,
but this can be done blindly by trialand error,with no knowledge of the problem being treated.
2

based algorithms); (ii) the search for better solutions occurs in a reduced space
defined by the output of a black-box heuristic. Historically, local search has beenthe most frequently used embedded heuristic, but in fact it can be any optimizer,deterministic or not.
Thepurposeofthisreviewisbothtogiveadetaileddescription ofiteratedlocal
search and to show where it stands in terms of performance. So far, in spite of itsconceptual simplicity, ithas lead to anumber of state-of-the-art results without theuseoftoomuchproblem-specific knowledge. Perhapsthisisbecause iteratedlocalsearch is very malleable, many implementation choices being left to the develop-per. This chapter is organized as follows. First we give a high-level presentationof iterated local search in Section 2. Then we discuss the importance of the dif-ferent parts of the metaheuristic in Section 3, especially the subtleties associatedwith perturbing the solutions. In Section 4 we go over past work testing iteratedlocal search in practice, while in Section 5 we discuss similarities and differencesbetween iterated local search and other metaheuristics. The chapter closes with asummary of what has been achieved so far and an outlook on what the near future
may look like.
2 Iteratinga local search
2.1 General framework
Weassume wehave been given aproblem-specific algorithm that from nowon we
shall refer to as a local search (even if in fact it is not a true local search). This
algorithm is implemented via a computer routine that we call LocalSearch . The
question we ask is “Can such an algorithm be improved by the use of iteration?”.Ouransweris“YES”,andinfacttheimprovementsobtainedinpracticeareusuallysignificant. Onlyinratherpathological caseswheretheiterationmethodis“incom-patible” with the local search will the improvement be minimal. In the same vein,in order to have the mostimprovement possible, it is necessary to have some un-
derstanding of the way the
LocalSearch works. However,to keep this presentation
as simple as possible, we shall ignore for the time being these complications; theadditional complexities associated withtuning the iteration to thelocal search pro-cedure will be discussed in Section 3 Furthermore, all issues associated with theactual speed of the algorithm are omitted in this first section as we wish to focus
solely on the high-level architecture of iterated local search.
Let/BVbethecostfunctionofourcombinatorial optimizationproblem; /BVistobe
minimized . Welabel candidate solutions orsimply “solutions” by /D7,and denote by/CBthesetofall /D7(forsimplicity /CBistaken tobefinite,butitdoesnotmattermuch).
Finally,forthepurposesofthishigh-levelpresentation, itissimplesttoassumethat
3

costprobability density
ss*
Figure 1: Probability densities of costs. The curve labeled /D7gives the cost density
for all solutions, while the curve labeled /D7
/A3gives it for the solutions that are local
optima.
the local search procedure is deterministic:2for a given input /D7it always outputs
the same solution /D7
/A3whose cost is less or equal to /BV /B4 /D7 /B5.LocalSearch then defines
a many to one mapping from the set /CBto the smaller set /CB
/A3of locally optimal
solutions /D7
/A3. Ifonewantstohaveakindofpictorial viewofthis,onecanintroduce
what is called the basin of attraction of a local minimum /D7
/A3as the set of /D7that are
mapped to /D7
/A3under the local search routine. LocalSearch then takes one from a
startingsolutiontoasolutionatthebottomofthecorresponding basinofattraction.
Now take an /D7or an /D7
/A3at random. Typically, the distribution of costs found
has a very rapidly rising part at the lowest values. In Figure 1 we show the kindof distributions found in practice for combinatorial optimization problems havinga finite solution space. The distribution of costs is bell-shaped, with a mean andvariance that is significantly smaller for solutions in/CB
/A3than for those in /CB.A sa
consequence, it is much better to use local search than to sample randomly in /CBif
one seeks lowcost solutions. The essential ingredient necessary for local search isa neighborhood structure. This means that/CBis a “space” with some topological
structure, notjustaset. Having such aspace allowsonetomovefromonesolution
2The reader can check that very little of what we say really uses this property, and in practice,
many successful implementations ofiterated local search have non-deterministic local searches.
4

/D7to a better one in an intelligent way, something that would not be possible if /CB
were just a set.
Nowthequestion ishowtogobeyondthisuseof LocalSearch . Moreprecisely,
given the mapping from /CBto /CB
/A3,howcan one further reduce the costs found with-
out opening up and modifying LocalSearch , leaving it as a “black box” routine?
2.2 Random restart
The simplest possibility to improve upon a cost found by LocalSearch is to repeat
thesearchfromanother startingpoint. Every /D7
/A3generated isthenindependent, and
the use of multiple trials allows one to reach into the lower part of the distribution.Note that this sampling is not uniform; the probability of hitting a given/D7
/A3varies
with the size of its basin of attraction. Although such a “random restart” approachwith independent samplings is sometimes a useful strategy (in particular when all
other options fail), it breaks down as the instance size grows because in that limit
the tail of the distribution of costs collapses. Indeed, empirical studies [37] andgeneral arguments [56] indicate that local search algorithms on large generic in-stances lead to costs that: (i) have a mean that is a fixed (percentage) excess abovethe optimum cost; (ii) have a distribution that becomes arbitrarily peaked about its
meanwhentheinstance sizegoestoinfinity. Thissecond property makesitimpos-sible in practice to find an/D7
/A3whose cost is even a little bit lower than the typical
cost. Note however that there do exist many solutions of significantly lower cost,it is just that randomsampling has a lower and lower probability of finding them
as the instance size increases. To reach those configurations, a biased sampling isnecessary; this is precisely what is accomplished by a stochastic search.
2.3 Searching in /CB
/A3
To overcome the problem just mentioned associated with large instance sizes, re-
consider what local search does: it takes one from /CBwhere /BVhas a large mean to/CB
/A3where /BVhas a smaller mean. It is then most natural to invoke recursion: use
local search to go from /CB
/A3to a smaller space /CB
/A3/A3where the mean cost will be still
lower! That would correspond to an algorithm with one local search nested insideanother. Suchaconstruction could beiterated toasmanylevels asdesired, leading
to a hierarchy of nested local searches. But upon closer scrutiny, we see that the
problem is precisely how to formulate local search beyond the lowest level of thehierarchy: local search requires a neighborhood structure and this is not `a priori
given. The fundamental difficulty is to define neighbors in/CB
/A3so that they can be
enumerated andaccessed efficiently. Furthermore, itisdesirable fornearest neigh-bors in/CB
/A3to be relatively close when using the distance in /CB; if this were not the
5

case, a stochastic search on /CB
/A3would have little chance of being effective.
Uponfurther thought, ittranspires thatone canintroduce agoodneighborhood
structure on /CB
/A3as follows. First, one recalls that a neighborhood structure on a
set /CBinduces a canonical neighborhood structure on sub-setsof /CB: two sub-sets
are nearest neighbors simply if they contain solutions that are nearest neighbors.
Second, take these sub-sets tobe thebasins ofattraction ofthe /D7
/A3;in effect, weare
lead to identify any /D7
/A3with its basin of attraction. This then immediately gives the
“standard” notion of neighborhood on /CB
/A3, notion which can be stated in a simple
way as follows: /D7
/A3/BDand /D7
/A3/BEare neighbors in /CB
/A3if their basins of attraction “touch”
(i.e., contain nearest-neighbor solutions in /CB). Unfortunately this definition has
the major drawback that one cannot in practice list the neighbors of an /D7
/A3because
there is no computationally efficient method for finding all solutions /D7in the basin
of attraction of /D7
/A3. Nevertheless, we can stochastically generate nearest neighbors
as follows. Starting from /D7
/A3, create a randomized path in /CB, /D7/BD, /D7/BE, …, /D7/CX, where/D7/CY /B7/BDisanearest neighbor of /D7/CY. Determine thefirst /D7/CYinthispaththatbelongs toa
different basin ofattraction sothat applying local search to /D7/CYleads toan /D7
/A3/BC/BI/BP /D7
/A3.
Then /D7
/A3/BCis a nearest-neighbor of /D7
/A3.
Given this procedure, we can in principle perform a local search3in /CB
/A3. Ex-
tending the argument recursively, we see that it would be possible to have an al-gorithm implementing nested searches, performing local search on/CB, /CB
/A3, /CB
/A3/A3,
etc… in a hierarchical way. Unfortunately, the implementation of nearest neighborsearch atthelevelof/CB
/A3ismuchtoocostly computationally because ofthenumber
of times one has to execute LocalSearch before finding a new basin of attraction.
Thus we are led to abandon the (stochastic) search for nearest neighbors in /CB
/A3;
instead weuse aweaker notion ofcloseness whichthen allowsfor afast stochasticsearch in/CB
/A3. Ourconstruction leads toa(biased) sampling of /CB
/A3;such asampling
will be better than a random one if it is possible to find appropriate computational
ways to go from one /D7
/A3to another.
2.4 Iterated Local Search
We want to explore /CB
/A3using a walk that steps from one /D7
/A3to a “nearby” one,
without the constraint of using only nearest neighbors as defined above. Iteratedlocal search (ILS) achieves this heuristically as follows. Given the current/D7
/A3,w e
first apply a change or perturbation that leads to an intermediate state /D7
/BC(which
belongs to /CB). ThenLocalSearch is applied to /D7
/BCand we reach a solution /D7
/A3/BCin/CB
/A3.I f /D7
/A3/BCpasses anacceptance test, itbecomes thenext element ofthe walkin /CB
/A3;
otherwise, one returns to /D7
/A3. The resulting walk is a case of a stochastic search in
3Note that the local search finds neighbors stochastically; generally there is no efficient way to
ensure that one has tested allthe neighbors ofany given /D7
/A3.
6

perturbation
solution space Scost
s* s*’s’
Figure2: Pictorialsummaryofiteratedlocalsearch. Startingwithalocalminimum/D7
/A3, we apply a perturbation leading to a less good solution /D7
/BC. After applying
LocalSearch , we find a new local minimum /D7
/A3/BC./CB
/A3, but where neighborhoods are never explicitly introduced. This iterated local
search procedure should lead to good biased sampling as long as the perturbationsare neither too small nor too large. If they are too small,/D7
/BCwill often belong to
the basin of attraction of /D7
/A3and few new solutions of /CB
/A3will be explored. If on
the contrary the perturbations are too large, /D7
/BCwill belong to a random basin of
attraction, there will be no bias in the sampling, and we will recover a randomrestart type algorithm.
TheoverallILSprocedure ispictorially illustrated inFigure2. Tobecomplete,
let us note that generally the iterated local search walk will not be reversible; inparticular one may sometimes be able to step from/D7
/A3/BDto /D7
/A3/BEbut not from /D7
/A3/BEto/D7
/A3/BD. However this “unfortunate” aspect of the procedure does not prevent ILSfrom
being very effective in practice.
Since deterministic perturbations may lead to short cycles (for instance of
length /BE), one should randomize the perturbations or have them be adaptive so
as to avoid this kind of cycling. If the perturbations depend on any of the previous/D7
/A3,onehasawalkin /CB
/A3withmemory. Nowthereader mayhavenoticed thataside
from the issues of perturbations (which use the structure on /CB), our formalism re-
duces the problem to that of a stochastic search on /CB
/A3. Then all of the bells and
whistles(exploration, exploitation, tabu,adaptiveperturbations andacceptancecri-teria,etc…) thatarecommonlyusedinthatcontextmaybeappliedhere. Thisleads
7

us todefineiterated local search algorithms as metaheuristics having the following
high level architecture:
procedure Iterated Local Search/D7/BC
/BPGenerateInitialSolution/D7
/A3/BPLocalSearch /B4 /D7/BC
/B5
repeat/D7
/BC/BPPerturbation /B4 /D7
/A3/BNhistory /B5/D7
/A3/BC/BPLocalSearch /B4 /D7
/BC/B5/D7
/A3/BPAcceptanceCriterion /B4 /D7
/A3/BN/D7
/A3/BC/BNhistory /B5
untiltermination condition met
end
In practice, much of the potential complexity of ILS is hidden in the history
dependence. If there happens to be no such dependence, the walk has no memory:the perturbation and acceptance criterion do not depend on any of the solutionsvisited previously during the walk, and one accepts or not/D7
/A3/BCwith a fixed rule.
This leads to random walk dynamics on /CB
/A3that are “Markovian”, the probability
ofmaking aparticular step from /D7
/A3/BDto /D7
/A3/BEdepending only on /D7
/A3/BDand /D7
/A3/BE. Mostofthe
work using ILS has been of this type, though recent studies show unambiguouslythat incorporating memory enhances performance [59].
Staying within Markovian walks, the most basic acceptance criteria will use
only the difference in the costs of/D7
/A3and /D7
/A3/BC; this type of dynamics for the walk
is then very similar in spirit to what occurs in simulated annealing. A limitingcase of this is to accept only improving moves, as happens in simulated annealingat zero temperature; the algorithm then does (stochastic) descent in/CB
/A3.I f w e
add to such a method a CPU time criterion to stop the search for improvements,the resulting algorithm pretty much has two nested local searches; to be precise,it has a local search operating on/CBembedded in a stochastic search operating
on /CB
/A3. More generally, one can extend this type of algorithm to more levels of
nesting, having adifferent stochastic search algorithm for /CB
/A3, /CB
/A3/A3etc… Eachlevel
would be characterized by its own type of perturbation and stopping rule; to our
knowledge, such a construction has never been attempted.
We can summarize this section by saying that the potential power of iterated
local search lies in its biasedsampling of the set of local optima. The efficiency
of this sampling depends both on the kinds of perturbations and on the accep-tance criteria. Interestingly, with the most na¨ ıve implementations of these parts,
iterated local search is much better than random restart.
4But still much better re-
4Thisisanalogous tothefactthatsimulated annealing performsfarbetterthan localsearchwhen
8

sults can be obtained if the iterated local search modules are optimized. First, the
acceptance criteria can be ajusted empirically as in simulated annealing withoutknowing anything about the problem being optimized. This kind of optimizationwillbefamiliartoanyuserofmetaheuristics, though thequestions ofmemorymay
become quite complex. Second, the
Perturbation routine can incorporate as much
problem-specific information asthedevelopper iswillingtoputintoit. Inpractice,this rule of thumb can be used as a guide: “a good perturbation transforms oneexcellent solution into another”. These different aspects show that iterated localsearch algorithms can have a wide range of complexity, but complexity may beadded progressively and in amodular way. (Recall inparticular that allof thefine-tuning that resides in the embedded local search can be ignored if one wants, andit does not appear in the metaheuristic per-se.) This makes iterated local search anappealing metaheuristic for both academic and industrial applications. The cherryon the cake is speed: as we shall soon see, one can perform/CZlocal searches em-
bedded within an iterated local search muchfaster than if the /CZlocal searches are
run from random restart.
3 Getting high performance
Given all these advantages, we hope the reader is now motivated to go on andconsider the more nitty-gritty details that arise when developping an ILSfor anewapplication. Inthissection,wewillillustratethemainissuesthatneedtobetackledwhen optimizing an ILSin order to achieve high performance.
There are four components to consider:
GenerateInitialSolution ,LocalSearch ,
Perturbation , andAcceptanceCriterion . Before attempting to develop a state-of-
the-art algorithm, it is relatively straight-forward to develop a more basic version
of ILS. Indeed, (i) one can start with a random solution or one returned by some
greedy construction heuristic; (ii) for most problems a local search algorithm isreadily available; (iii) for the perturbation, a random move in a neighborhood ofhigher order than the one used by the local search algorithm can be surprisinglyeffective; and(iv)areasonable firstguessfortheacceptancecriterionistoforcethecost todecrease, corresponding to afirst-improvement descent in the set/CB
/A3. Basic
ILS implementations of this type usually lead to much better performance thanrandom restart approaches. The developper can then run this basic ILS to buildhis intuition and try to improve the overall algorithm performance by improvingeach of the four modules. This should be particularly effective if it is possibleto take into account the specificities of the combinatorial optimization problemunder consideration. In practice, this tuning is easier for ILS than for memetic
both use the same neighborhood structure.
9

algorithms ortabusearchtonamebutthesemetaheuristics. Thereasonmaybethat
thecomplexity ofILSisreduced byitsmodularity, thefunctionofeachcomponentbeing relatively easy to understand. Finally, the last task to consider is the overalloptimization of the ILS algorithm; indeed, the different components affect one
another and so it is necessary to understand their interactions. However, because
these interactions are so problem dependent, we wait till the end of this sectionbefore discussing that kind of “global” optimization.
Perhaps the main message here is that the developper can choose the level of
optimization he wants. In the absence of any optimizations, ILS is a simple, easyto implement, and quite effective metaheuristic. But with further work on its fourcomponents, ILS can often be turned into a very competitive or even state of theart algorithm.
3.1 Initial solution
Localsearch applied totheinitial solution /D7/BCgivesthestarting point /D7
/A3/BCofthewalk
intheset /CB
/A3. Starting withagood /D7
/A3/BCcanbeimportant ifhigh-quality solutions are
to be reached as fast as possible .
Standard choices for /D7/BCare either a random initial solution or a solution re-
turnedbyagreedyconstruction heuristic. Agreedyinitialsolution /D7/BChastwomain
advantages over random starting solutions: (i) when combined with local search,greedy initial solutions often result in better quality solutions/D7
/A3/BC; (ii) alocal search
from greedy solutions takes, on average, less improvement steps and therefore thelocal search requires less CPUtime.
5
Thequestion ofan appropriate initial solution for(random restart) local search
carries over to ILS because of the dependence of the walk in /CB
/A3on the initial so-
lution /D7
/A3/BC. Indeed, when starting with a random /D7/BC, ILS may take several iterations
to catch up in quality with runs using an /D7
/A3/BCobtained by a greedy initial solution.
Hence, for short computation times the initial solution is certainly important toachieve the highest quality solutions possible. For larger computation times, thedependence on/D7/BCof the final solution returned by ILS reflects just how fast, if at
all, the memory of the initial solution is lost when performing the walk in /CB
/A3.
Letusillustratethetradeoffsbetweenrandomandgreedyinitialsolutionswhen
using an ILS algorithm for the permutation flow shop problem (FSP) [58]. ThatILS algorithm uses a straight-forward local search implementation, random per-
5Note that the best possible greedy initial solution need not be the best choice when combined
with a local search. For example, in [37], it is shown that the combination of the Clarke-Wright
starting tour (one of the best performing TSP construction heuristics) with local search resulted
in worse local optima than starting from random initial solutions when using /BF-opt. Additionally,
greedy algorithms which generate very high quality initialsolutions can be quite time-consuming.
10

turbations, and always applies Perturbation to the best solution found so far. In
Figure 3 we show how the average solution cost evolves with the number of iter-ations for two instances. The averages are for/BD/BCindependent runs when starting
from a random initial solution or from an initial solution returned by the NEH
heuristic [55]. (NEH is one of the best performing constructive heuristics for the
FSP.) For short runs, the curve for the instance on the right shows that the NEHinitial solutions lead to better average solution quality than the random initial so-lutions. But at longer times, the picture is not so clear, sometimes random initialsolutions lead to better results as we see on the instance on the left. This kind oftest wasalsoperformed forILSapplied totheTSP[2]. Againitwasobserved, thatthe initial solution had a significant influence on quality for short to medium sizedruns.
388039003920394039603980400040204040
1 10 100 1000 10000NEH start
Random start
372037403760378038003820384038603880
1 10 100 1000 10000NEH start
Ran
Figure3: TheplotsshowtheaveragesolutionqualityasafunctionofthenumberofiterationsforanILSalgorithmappliedtotheFSPoninstances ta051andta056.
Ingeneral, therewillnotalwaysbeaclear-cutanswerregardingthebestchoice
ofaninitialsolution,butgreedyinitialsolutionsappeartoberecommendablewhenone needs low-cost solutions quickly. For much longer runs, the initial solutionseems to be less relevant, so the user can choose the initial solution that is theeasiest to implement. If however one has an application where the influence ofthe initial solution does persist for long times, probably the ILS walk is havingdifficulty in exploring/CB
/A3and so other perturbations or acceptance criteria should
be considered.
3.2 Perturbation
The main drawback of local descent is that it gets trapped in local optima thatare significantly less good than the global optimum. Much like simulated anneal-
11

ing, ILS escapes from local optima by applying perturbations to the current local
minimum. Wewillrefer tothe strengthof aperturbation asthe number ofsolution
components whicharemodified. ForinstancefortheTSP,itisthenumberofedgesthat are changed in the tour, while in the flow shop problem, it is the number of
jobswhicharemovedintheperturbation. Generally, thelocalsearchshould notbe
able to undo the perturbation, otherwise one will fall back into the local optimumjust visited. Surprisingly often, a randommove in a neighborhood of higher order
than the one used by the local search algorithm can achieve this and will lead to asatisfactory algorithm. Still better results can be obtained if the perturbations takeinto account properties of the problem and provide a good match with the localsearch algorithm.
By how much should the perturbation change the current solution? If the per-
turbation is too strong, ILS may behave like a random restart, so better solutionswill only be found with a very low probability. On the other hand, if the perturba-tion is too small, the local search will often fall back into the local optimum justvisited and the exploration of the search space will be very limited. An example
of a simple but effective perturbation for the TSP is the double-bridge move . This
perturbation cuts four edges (and is thus of “strength”/BG) and introduces four new
ones asshown inFigure4. Notice thateach bridge isa /BE-change, butneither ofthe/BE-changes individually keeps thetourconnected. NearlyallILSstudies oftheTSP
have incorporated this kind of perturbation, and it has been found to be effectivefor all instance sizes. This is almost certainly because it changes the topology ofthe tour and can operate on quadruples of very distant cities, whereas local searchalways modifies the tour among nearby cities. (One could imagine more powerfullocal searches which would include such double-bridge changes, but the compu-tational cost would be far greater than for the local search methods used today.)In effect, the double-bridge perturbation cannot be undone easily, neither by sim-
ple local search algorithms such as 2-optor3-opt, nor by Lin-Kernighan [42]
which is currently the champion (variable-depth) local search algorithm for theTSP.Furthermore, theperturbation doesnotincrease muchthetourlength, soevenifthecurrentsolutionisverygood,oneisalmostsurethenextonewillbegoodtoo.These two properties of the perturbation – its small strength and its fundamentallydifferent nature from the changes used in local search – make the TSP the perfectapplication for iterated local search. But for other problems, finding an effectiveperturbation may be more difficult.
We will now consider the optimization of the perturbation, assuming the other
modules are fixed. In problems like the TSP, one can hope to have an acceptableiterated local search when using perturbations of fixed size (independent of the
instance size). On the contrary, for more difficult problems, fixed-strength per-
turbations may lead to unsatisfactory performance. Of course, the strength of the
12

AB C
Dabc
d
Figure 4: Schematic representation of the double-bridge move. The four dotted
edges are removed and the remaining parts A,B, C,D are reconnected by the newedges a, b, c, d.
perturbations used is not the whole story; their nature is almost always very im-
portant and will also be discussed. Finally we will close by pointing out that the
perturbation strength has an effect on the speed of the local search: weak pertur-
bations usually lead to faster execution of
LocalSearch . All these different aspects
need to be considered when optimizing this module.
3.2.1 Perturbation strength
For some problems, an appropriate perturbation strength is very small and seems
to be rather independent of the instance size. This is the case for both the TSPandthe FSP,and interestingly iterated local search for these problems is very competi-tivewithtoday’s best metaheuristic methods. Wecan alsoconsider other problemswhere instead one is driven to large perturbation sizes. Consider the example ofan ILSalgorithm for the quadratic assignment problem (QAP). We use an embed-ded2-optlocal search algorithm, the perturbation is a random exchange of the
location of/CZitems, where /CZis an adjustable parameter, and Perturbation always
modifies the best solution found so far. We applied this ILS algorithm to fourQAPLIB instances
6from four different classes of QAP instances [62]; computa-
tional results are given in Table 1. A first observation is that the best perturbation
size is strongly dependent on the particular instance. For two of the instances, thebest performance was achieved when as many as 75% of the solution components
6QAPLIBis accessible at http://serv1.imm.dtu.dk/ sk/qaplib/.
13

Table 1: The first collumn is the name of the QAP instance; the number gives its size /D2.
The successive collumns are for perturbationsizes /BF, /D2/BP /BD/BE, /A1/A1/A1, /D2. A perturbationof size/D2correspondstorandomrestart. Thetableshowsthemeansolutioncost,averagedover /BD/BC
independent runs for each instance. The CPU-time for each trial is 30 sec. for kra30a,
60sec.for tai60a andsko64,and120sec.for tai60b onaPentiumIII500MHzPC.
instance 3 /D2/BP /BD/BE /D2/BP /BI /D2/BP /BG /D2/BP /BF /D2/BP /BE /BF /D2/BP /BG /D2
kra30a 2.51 2.51 2.04 1.06 0.83 0.42 0.0 0.77
ste36a 1.96 1.96 1.99 1.17 0.37 0.24 0.88 1.42
sko64 0.65 1.04 0.50 0.37 0.29 0.29 0.82 0.93
tai60a 2.31 2.24 1.91 1.71 1.86 2.94 3.13 3.18
tai60b 2.44 0.97 0.67 0.96 0.82 0.50 0.14 0.43
werealteredbytheperturbation. Additionally, foratoosmallperturbation strength
the ILS performed worse than random restart (corresponding to the perturbationstrength/D2). However, the fact that random restart for the QAP may perform—on
average—better than a basic ILS algorithm is a bit misleading: in the next sectionwewillshowthatbysimplymodifying abittheacceptance criterion, ILSbecomesfar better than random restart. Thus one should keep in mind that the optimizationofaniterated localsearchmayrequire morethantheoptimization oftheindividual
components.
3.2.2 Adaptive perturbations
The behavior of ILS for the QAP and also for other combinatorial optimization
problems [34, 58] shows that there is no `a priorisingle best size for the pertur-
bation. This motivates the possibility of modifying the perturbation strength and
adapting it duringthe run.
One possibility to do so is to exploit the search history. For the development
of such schemes, inspiration can be taken from what is done in the context of tabusearch [7, 6]. In particular, Battiti and Protasi proposed [6] a reactive search algo-rithm for MAX-SATwhich fits perfectly into the ILS framework. They perform adirected perturbation schemewhichisimplemented byatabusearchalgorithm andafter each perturbation they apply a standard local descent algorithm.
Another way of adapting the perturbation strength was introduced in the sim-
ple variable neighborhood search (simple VNS) scheme [54, 31]. In simple VNS
the perturbations are always applied to the best solution found so far and the
perturbations correspond to random moves in some higher order neighborhoods,/C6/BD
/BN /C6/BE
/BN/BM/BM/BM /BN /C6/D0, ordered according to some criterion, for example, their size. We
14

refer to Section 5 for a more detailed discussion of VNS.
3.2.3 More complex perturbation schemes
Perturbations can be more complex than changes in a higher order neighborhood.
Forexample,intheperturbation schemeintroduced in[17]fortheTSP,theauthorsfirst change slightly the city coordinates rather than perturbing the current tour.Then they apply the local search using these perturbed city locations, obtaininga new tour which in our notation is the perturbed solution/D7
/BC. Finally, running
LocalSearch on /D7
/BCusing the unperturbed city coordinates, they obtain the new
candidate tour /D7
/A3/BC.
Other sophisticated ways to generate good perturbations consist in optimizing
a sub-part of the problem. If this task is difficult for the embedded heuristic, goodresultscanfollow. Suchanapproach wasproposedbyLourenc ¸o[43]inthecontext
of the job shop scheduling problem (JSP). Her perturbation schemes are basedon defining one- or two-machine sub-problems by fixing a number of variablesin the current solution and solving these sub-problems, either heuristically [44]or to optimality using for instance Carlier’s exact algorithm [14] or the early-latealgorithm [44]. These schemes work well because: (i) local search is unable toundo the perturbations; (ii) after the perturbation, the solutions tend to be verygood and also have “new” parts that are optimized.
3.2.4 Speed
In the context of “easy” problems where ILS can work very well with weak (fixed
size) perturbations, there is another reason why that metaheuristic can performmuch better than random restart: Speed. Indeed,
LocalSearch will usually execute
much faster on a solution obtained by applying a small perturbation to a local
optimum than on a random solution. As a consequence, iterated local search canrun many more local searches than can random restart in the same CPU time. Asa qualitative example, consider again Euclidean TSPs./C7 /B4 /D2 /B5local changes have
to be applied by the local search to reach a local optimum from a random start,whereas empirically a nearly constant number is necessary in ILS when using the/D7
/BCobtained with the double-bridge perturbation. Hence, in agiven amount ofCPU
time, ILScan sample manymore local optima than can random restart. This speed
factorcan give ILSa considerable advantage over other restart schemes.
Let us illustrate this speed factor using the example of the TSP. We compare
thenumberoflocalsearches performed inagivenamount ofCPUtimeby: (i)ran-
dom restart; (ii) ILSusing a double-bridge move; (iii) ILSusing fivesimultaneous
double-bridge moves. (For both ILS implementations, we used random starts and
15

the routine AcceptanceCriterion accepted only shorter tours.) For our numerical
tests weused a fast 3-optimplementation with standard speed-up techniques. In
particular, it used a fixed radius nearest neighbor search within candidate lists ofthe 40 nearest neighbors for each city and don’t look bits [10, 37, 48]. Initially, all
don’t look bits are turned off (set to 0). If for a node no improving move can be
found, its don’t look bit is turned on (set to 1) and the node is not considered as astarting node for finding an improving move in the next iteration. When an arc in-cident toanode ischanged byamove,thenode’s don’t lookbitisturned offagain.In addition, when running ILS,after a perturbation we only turn off the don’t lookbits of the 25 cities around each of the four breakpoints in a current tour. All threealgorithms were run for 120 seconds on a 266 MHz Pentium II processor on a setof TSPLIB
7instances ranging from 100 up to 5915 cities. Results are given in Ta-
ble 3.2.4. For small instances, we see that iterated local search ran between /BEand/BD/BCtimes as many local searches as random restart. Furthermore, this advantage of
ILS grows fast with increasing instance size: for the largest instance, the first ILSalgorithm ran approximately 260 times as many local searches as random restart
in our alloted time. Obviously, this speed advantage of ILS over random restart
is strongly dependent on the strength of the perturbation applied. The larger theperturbation size, the more the solution is modified and generally the longer thesubsequent local search takes. This fact is intuitively obvious and is confirmed byTable 3.2.4 in the context of the TSP.
3.3 Acceptance criterion
ILSdoesarandomizedwalkin /CB
/A3,thespaceofthelocalminima. Theperturbation
mechanism together with the local search defines the possible transitions between
a current solution /D7
/A3in /CB
/A3to a “neighboring” solution /D7
/A3/BCalso in /CB
/A3. The pro-
cedureAcceptanceCriterion then determines whether /D7
/A3/BCis accepted or not as the
new curent solution. AcceptanceCriterion has a strong influence on the nature and
effectiveness of the walk in /CB
/A3. Roughly, it can be used to control the balance
between exploitation (often called intensification) and exploration (diversification)of that search space. A simple way to illustrate this is to consider a Markovianacceptance criterion. A very strong exploitation is achieved if only better solu-tions are accepted. We call this acceptance criterion Better and it is defined for
minimization problems as:
Better/B4 /D7
/A3/BN/D7
/A3/BC/BNhistory /B5/BP
/BK/BO/BM
/D7
/A3/BCif /BV /B4 /D7
/A3/BC/B5 /BO /BV /B4 /D7
/A3/B5/D7
/A3otherwise(1)
7TSPLIBisaccessible atwww.iwr.uni-heidelberg.de/iwr/comopt/software/TSPLIB95.
16

Table 2: The first collumn gives the name of the TSP instance which specifies its size.
The next collumns give the number of local searches performed when using: (i) random
restart ( /AZLS/CA/CA); (ii) ILS with a single double-bridgeperturbation( /AZLS/BD /A0 /BW/BU); (iii) ILS
with a five double-bridgeperturbation( /AZLS/BH /A0 /BW/BU). All algorithms were run 120secs. on
a Pentium266MHzPC.
instance /AZLSRR
/AZLS1-DB
/AZLS5-DB
kroA100 17507 56186 34451
d198 7715 36849 16454
lin318 4271 25540 9430
pcb442 4394 40509 12880
rat783 1340 21937 4631
pr1002 910 17894 3345
pcb1173 712 18999 3229
d1291 835 23842 4312
fl1577 742 22438 3915
pr2392 216 15324 1777
pcb3038 121 13323 1232
fl3795 134 14478 1773
rl5915 34 8820 556
At the opposite extreme is the random walk acceptance criterion (denoted by
RW) which always applies the perturbation to the most recently visited local opti-
mum, irrespective of its cost:
RW /B4 /D7
/A3/BN/D7
/A3/BC/BNhistory /B5/BP /D7
/A3/BC(2)
This criterion clearly favors exploration over exploitation.
Many intermediate choices between these two extreme cases are possible. In
oneofthefirstILSalgorithms,thelarge-stepMarkovchainsalgorithmproposedbyMartin, Otto, and Felten [48, 49], a simulated annealing type acceptance criterionwasapplied. Wecall it LSMC/B4 /D7
/A3/BN/D7
/A3/BC/BNhistory /B5. Inparticular, /D7
/A3/BCisalways accepted
if it is better than /D7
/A3. Otherwise, if /D7
/A3/BCis worse than /D7
/A3, /D7
/A3/BCis accepted with prob-
ability /CT/DC/D4 /CU /B4 /BV /B4 /D7
/A3/B5 /A0/BV /B4 /D7
/A3/BC/B5/B5 /BP/CC /CVwhere /CCis aparameter called temperature and it
is lowered during the run as in simulated annealing. Note that LSMCapproaches
theRWacceptance criterion if /CCisveryhigh, whileatverylowtemperatures LSMC
issimilar tothe Better acceptance criterion. Aninteresting possibility for LSMC
is to allow non-monotonic temperature schedule as proposed in [35] for simulatedannealing orintabuthresholding [26]. Thiscanbemosteffectiveifitisdoneusing
memory: when further exploitation no longer seems useful, increase the tempera-
ture to do exploration for a limited time, then resume exploitation.
17

Morecomplicatedacceptance criteriacanalsobeconsidered. Inparticular, one
may completely restart the ILS algorithm after a phase of exploitation (of coursethisisaratherextemewaytoswitchfromexploitation toexploration). ForinstanceonecanrestarttheILSalgorithmfromanewinitialsolutionifnoimprovedsolution
has been found for a given number of iterations. The restart of the algorithm can
easily modeled by the acceptance criterion called Restart/B4 /D7
/A3/BN/D7
/A3/BC/BNhistory /B5. Let/CXlastbe the last iteration in which a better solution has been found and /CXbe the
iteration counter. Then Restart /B4 /D7
/A3/BN/D7
/A3/BC/BNhistory /B5is defined as
Restart /B4 /D7
/A3/BN/D7
/A3/BC/BNhistory /B5/BP
/BK/BQ/BQ
/BQ
/BQ
/BQ
/BO/BQ/BQ
/BQ
/BQ
/BQ
/BM
/D7
/A3/BC
if /BV /B4 /D7
/A3/BC/B5 /BO /BV /B4 /D7
/A3/B5/D7if /BV /B4 /D7
/A3/BC/B5 /AL/BV /B4 /D7
/A3/B5and /CX /A0 /CXlast
/BQ/CX/D6/D7
/A3otherwise /BM
(3)
where /CX/D6is a parameter that indicates that the algorithm should be restarted if no
improved solution was found for /CX/D6iterations. Typically, /D7can be generated in
different ways. The simplest strategy is to generate a new solution randomly or bya greedy randomized heuristic.
3.3.1 Example 1: TSP
Let us consider the effect of the two acceptance criteria RWandBetter.W e
performed our tests on the TSPas summarized inTable 3.3.1. Wegive the averagepercentage excess over the known optimal solutions when using/BD/BCindependent
runs on our set of benchmark instances. In addition we also give this excess for
therandomrestart 3-optalgorithm. First,weobservethatbothILSschemes lead
to a significantly better average solution quality than random restart, in particular,for larger instances, confirming again the claims given in Section 2 Second, giventhat one expects the good solutions for the TSPtocluster (see Section 3.5), agoodstrategy should incorporate exploitation. It is thus not surprizing to see that theBetter criterion leads to shorter tours than the RWcriterion.
The runs given in this example are rather short. For much longer runs, the
Better strategy comes to a point where it no longer finds improved tours and
exploration should be considered again. Clearly it will be possible to improvesignificantly the results by alternating phases of exploitation and exploration.
18

Table 3: InfluenceoftheacceptancecriterionforvariousTSPinstances. Thefirstcolumn
gives the instance name and its size. The next columns give the excess percentage length
ofthetoursobtainedusing: randomrestart(RR),iteratedlocalsearchwith RW,anditerated
localsearchwith Better. Thedatais averagedover /BD/BCindependentruns. Allalgorithms
wererun120secs. onaPentium266MHz PC.
instance /A1avg(RR) /A1avg(RW) /A1avg(Better)
kroA100 0.0 0.0 0.0
d198 0.003 0.0 0.0
lin318 0.66 0.30 0.12
pcb442 0.83 0.42 0.11
rat783 2.46 1.37 0.12
pr1002 2.72 1.55 0.14
pcb1173 3.12 1.63 0.40
d1291 2.21 0.59 0.28
fl1577 10.3 1.20 0.33
pr2392 4.38 2.29 0.54
pcb3038 4.21 2.62 0.47
fl3795 38.8 1.87 0.58
rl5915 6.90 2.13 0.66
3.3.2 Example 2: QAP
Let us come back to ILS for the QAP discussed previously. For this problem we
found that the acceptance criterion Better together with a (poor) choice of the
perturbation strength could result in worse performance than random restart. InTable 3.3.2 we give results for the same ILS algorithm except that we now also
consider the use of the RWandRestart acceptance criteria. We see that the ILS
algorithm using these modified acceptance criteria are much better than randomrestart,theonlyexceptionbeing RWwithasmallperturbation strengthon tai60b.
This example shows that there are strong inter-dependences between the per-
turbation strength and the acceptance criterion. Rarely is this inter-dependencecompletely understood. But, as a general rule of thumb, when it is necessary toallow for exploration, we believe it is best to do so by accepting numerous smallperturbations rather than by accepting one large perturbation.
Most of the acceptance criteria applied so farin ILSalgorithms are either fully
Markovian or make use of the search history in a very limited way. We expectthat there will be many more ILS applications in the future making strong use
of the search history, and in particular the alternation between exploitation and
exploration is likely to be an essential feature in these applications.
19

Table 4: Furthertests on the QAP benchmarkproblemsusingthe same perturbationsand
CPUtimesasbefore. Hereweconsiderthreedifferentchoicesfortheacceptancecriterion.
Clearly,theinclusionofexplorationsignificantlylowersthemeancostfound.
instance acceptance 3 /D2/BP /BD/BE /D2/BP /BI /D2/BP /BG /D2/BP /BF /D2/BP /BE /BF /D2/BP /BG /D2
kra30a Better 2.51 2.51 2.04 1.06 0.83 0.42 0.0 0.77
kra30a RW 0.0 0.0 0.0 0.0 0.0 0.02 0.47 0.77
kra30a Restart 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.77
sko64 Better 0.65 1.04 0.50 0.37 0.29 0.29 0.82 0.93
sko64 RW 0.11 0.14 0.17 0.24 0.44 0.62 0.88 0.93
sko64 Restart 0.37 0.31 0.14 0.14 0.15 0.41 0.79 0.93
tai60a Better 2.31 2.24 1.91 1.71 1.86 2.94 3.13 3.18
tai60a RW 1.36 1.44 2.08 2.63 2.81 3.02 3.14 3.18
tai60a Restart 1.83 1.74 1.45 1.73 2.29 3.01 3.10 3.18
tai60b Better 2.44 0.97 0.67 0.96 0.82 0.50 0.14 0.43
tai60b RW 0.79 0.80 0.52 0.21 0.08 0.14 0.28 0.43
tai60b Restart 0.08 0.08 0.005 0.02 0.03 0.07 0.17 0.43
3.4 Local search
So far we have treated the local search algorithm as a black box which is called
many times by ILS. Since the behavior and performance of the over-all ILS algo-rithm is quite sensitive to the choice of the embedded heuristic, one should opti-
mize this choice whenever possible. In practice, there maybe many quite different
algorithms that can be used for the embedded heuristic. (As stressed at the begin-ning of the chapter, the heuristic need not even be alocal search.) Onemight thinkthat the better the local search, the better the corresponding ILS. Generally, this istrue. For instance in the context of the TSP, Lin-Kernighan [42] is a better localsearch than/BF-opt which itself is better than /BE-opt. Using a fixed type of perturba-
tion such as the double-bridge move, one finds that iterated Lin-Kernighan givesbetter solutions than iterated/BF-opt which itself gives better solutions than iterated/BE-opt [37, 61]. But if we assume that the total computation time is fixed, it might
be better to apply more frequently a faster but less effective local search algorithmthan aslowerandmorepowerful one. Clearly whichchoice isbestdepends onjust
howmuchmoretimeisneeded torunthebetterheuristic. Ifthespeed difference is
not large, forinstance ifitis independent ofthe instance size, then it usually worthusing the better heuristic. This is the most frequent case; for instance in the TSP,3-opt is a bit slower than 2-opt, but the improvement in quality of the tours arewell worth the extra CPU time, be-it using random restart or iterated local search.The same comparison applies to using L-K rather than 3-opt. However there are
20

other cases where the increase in CPU time is so large that it is best not to use the
“better” local search. For example, again in the context of the TSP, it is knownthat 4-opt gives slightly better solutions than 3-opt, but it is typically/C7 /B4 /C6 /B5times
slower for /C6cities, so it is better not to use /BG-opt as the local search embedded in
ILS.
There are also other aspects that should be considered when selecting a local
search. Clearly, there is not much point in having an excellent local search if itwill systematically undo the perturbation; however this issue is one of globallyoptimizing iterated local search, so it will be postponed till the next sub-section.Another important aspect is whether one can really get the speed-ups that werementioned in sub-section 3.2. There we saw that a standard trick for
LocalSearch
was to introduce don’t look bits. These give a large gain in speed if the bits can beresetalsoaftertheapplication oftheperturbation. Thisrequiresthatthedevelopperbe able to access the source code of
LocalSearch . A state of the art ILS will take
advantageofallpossiblespeed-uptricks,andthusthe LocalSearch mostlikelywill
not be a true black box.
Finally, there may be some advantages in allowing LocalSearch to sometimes
generate worse solutions. For instance, if we replace the local search heuristic bytabu search or short simulated annealing runs, the corresponding ILSmayperformbetter. This seems most promising when standard local search methods performpoorly. Such is indeed the case in the job-shop scheduling problem: the use oftabu search as the embedded heuristic gives rise to a very effective iterated localsearch [45].
3.5 Globaloptimization of ILS
So far, we have considered representative issues arising when optimizing sepa-rately each of the four components of an iterated local search. In particular, whenillustrating various important characteristics of one component, we kept the othercomponents fixed. But clearly the optimization of one component depends on thechoices made for the others; as an example, we made it clear that a good pertur-bation must have the property that it cannot be easily undone by the local search.Thus, at least in principle, one should tackle the globaloptimization of an ILS.
Since at present there is no theory for analyzing a metaheuristic such as iteratedlocal search, wecontent ourselves here withjustgiving arough idea ofhowsuch aglobal optimization can be approached in practice.
Ifwereconsider the sub-section ontheeffect ofthe initial solution, wesee that
GenerateInitialSolution is to a large extent irrelevant when the ILS performs well
and rapidly looses the memory of its starting point. Hereafter we assume that thisis the case; then the optimization of
GenerateInitialSolution can be ignored and we
21

are left with the joint optimization of the three other components. Clearly the best
choice of Perturbation depends onthe choice of LocalSearch while thebest choice
ofAcceptanceCriterion depends on the choices of LocalSearch andPerturbation .
In practice, we can approximate this global optimization problem by successively
optimizing each component assuming the others are fixed until no improvements
are found for any of the components. Thus the only difference with what has beenpresented in the previous sub-sections is that the optimization has to be iterative.This does not guarantee global optimization of the ILS, but it should lead to anadequate optimization of the overall algorithm.
Given these approximations, we should make more precise what it is that we
are in fact trying to optimize. For most users, it will be the mean (over startingsolutions) of the best cost found during a run of a given length. Then the “best”choiceforthedifferentcomponentsisawellposedproblem,thoughitisintractablewithout further restrictions. Furthermore, in general the detailed instance that willbe considered by the user is not known ahead of time, so it is important that theresulting ILS algorithm be robust. Thus it is preferable not to optimize it to the
point where it is sensitive to the details of the instance. This robustness seems to
be achieved in practice: researchers implement versions of iterated local searchwith some level of global optimization, and then test with some degree of successthe performance on standard benchmarks.
Search space characteristics.
At the risk of repeating ourselves, let us highlight the main dependencies of the
components:
1. the perturbation should not be easily undone by the local search; if the local
search has obvious short-comings, a good perturbation should compensatefor them.
2. the combination
Perturbation –AcceptanceCriterion determines the relative
balance of exploitation and exploration; large perturbations are only usefulif they can be accepted, which occurs only if the acceptance criterion is nottoo biased towards better solutions.
As a general guideling,
LocalSearch should be as powerful as possible as long as
it is not too costly in CPU time. Given such a choice, then find a well adapted
perturbation following the discussion in Section 3.2; to the extent possible, take
advantage of the structure of the problem. Finally, set the AcceptanceCriterion
routine so that /CB
/A3is sampled adequately. With this point of view, the overall op-
timization of the ILS is nearly a bottom-up process, but with iteration. Perhaps
22

the core issue is what to put into Perturbation : can one restrict the perturbations
to be weak? From a theoretical point of view, the answer to this question dependson whether the best solutions “cluster” in/CB
/A3. In some problems (and the TSP is
one of them), there is a strong correlation between the cost of a solution and its
“distance” to the optimum: in effect, the best solutions cluster together, i.e., have
manysimilarcomponents. Thishasbeenreferredtoinmanydifferentways: “Mas-sif Central” phenomenon [22], principle of proximate optimality [28], and replicasymmetry [52]. If the problem under consideration has this property, it is not un-reasonable to hope to find the true optimum using a biased sampling of/CB
/A3.I n
particular, it is clear that is is useful to use exploitation to improve the probabilityof hitting the global optimum.
There are however other types of problems where the clustering is incomplete,
i.e., where very distant solutions can be nearly as good as the optimum. Examplesof combinatorial optimization problems in this category are graph bi-section andMAX-SAT.Whenthespaceofsolutionshasthisproperty,newstrategieshavetobeused. Clearly,itisstillnecessarytouseexploitation togetthebestsolutioninone’s
current neighborhood, but generally this will not lead to the optimum. After an
exploitation phase,onemustgoexploreotherregions of/CB
/A3. Thiscanbeattempted
by using “large” perturbations whose strength grows with the instance. Anotherpossibility is to restart the algorithm from scratch and repeat another exploitationphase. Clearly, the balance exploitation – exploration isvery important here and isa challenging problem.
4 Selected applications ofILS
ILSalgorithms havebeenapplied successfully toavarietycombinatorial optimiza-
tion problems. In some cases, these algorithms achieve extremely high perfor-
manceandevenconstitutethecurrentstate-of-the-art metaheuristics, whileinothercasestheILSapproachismerelycompetitivewithothermetaheuristics. Inthissec-tion, we cover some of the most studied problems, with a stress on the travelingsalesman problem and scheduling problems.
4.1 ILS for the TSP
The TSP is probably the best-known combinatorial optimization problem. De
facto, it is a standard test-bed for the developement of new algorithmic ideas; a
good performance on the TSP is taken as evidence of the value of new methods.
Like for many metaheuristic algorithms, the first ILS algorithms were introducedandtestedontheTSP.ProbablytheoldestattemptisduetoBaum[9,8]. Hecoined
23

hismethod iterateddescent ; histestsused2-optastheembeddedheuristic,random
3-changes as the perturbations, and imposed the tour length to decrease (thus thename of the method). His results were not impressive, in part because he consid-ered the non-Euclidean TSP, which in practice is substantially more difficult than
the Euclidean TSP. A major improvement in the performance of ILS algorithms
came from the large-step Markov chain (LSMC) algorithm proposed by Martin,
Otto, and Felten [48]. They used a simulated annealing like acceptance criterion(LSMC) from which the algorithm’s name is derived and considered both the ap-
plication of 3-optlocal search andthe Lin-Kernighan heuristic ( LK)which isthe
best performing local search algorithm for the TSP. But probably the key ingredi-entoftheirworkistheintroduction ofthedouble-bridge movefortheperturbation.This choice made the approach very powerful for the Euclidean TSP, and that en-couraged muchmoreworkalongtheselines. Inparticular, Johnson [36,37]coinedthe term “iterated Lin-Kernighan” (IKL) for his implementation of ILS using theLin-Kernighan as the local search. The main differences with the LSMC imple-mentation are: (i)double-bridge movesarerandomrather thanbiased ones; (ii)the
costs are improving (only better tours are accepted, corresponding to the choice
Better in our notation). Since these initial studies, other ILS variants have been
proposed, and Johnson and McGeoch [37] give a summary of the situation as of1997.
Currently the highest performance ILS for the TSP is the chained LKcode
by Applegate, Bixby, Chvatal, and Cook which is available as a part of the Con-corde software package at www.keck.caam.rice.edu/concorde.html. These authorshave provided very detailed descriptions of their implementation, and so we re-fer the reader to their latest article [1] for details. Furthermore, Applegate, Cook,and Rohe [2] performed thorough experimental tests of this code by consideringthe effect of the different modules: (i) initial tour; (ii) implementations of the LK
heuristic; (iii) types of perturbations. Their tests were performed on very large
instances with up to 25 million cities. For the double-bridge move, they consid-ered the effects of forcing the edges involved to be “short” as well as the randomcase. Their conclusion is that the best performance is obtained when the double-bridge moves are biased towards short edge lengths. However, the strength of thebias towards short edges should be adapted to the available computation time: theshorter the computation time, the shorter the edges should be. In their tests of theinfluence of the initial tour, they concluded that the worst performance is obtainedwith random initial tours or those returned by the nearest neighbor heuristic, whilebest results were obtained with the Christofides algorithm [16], the greedy heuris-tic [10] or the Quick-Boruvka heuristic proposed in that article. With long runs of
their algorithm on TSPLIB instances with more than 10.000 cities they obtained
an impressive performance, always obtaining solutions that have less than 0.3%
24

excess length over the lower bounds for these instances. For the largest instance
considered, a 25 million city instance, they reached a solution of only 0.3% overthe estimated optimum.
Apart from these works, two new ILS algorithms for the TSP have been pro-
posed since the review article of Johnson and McGeoch. The first algorithm is
due to St¨utzle [59, 61]; he examined the run-time behavior of ILS algorithms for
the TSP and concluded that ILS algorithms with the Better acceptance crite-
rion show a type of stagnation behavior for long run-times [59] as expected whenperforming an intense exploitation search. To avoid such stagnation, restarts anda particular acceptance criterion to diversify the search were proposed. The goalof this strategy is to force the search to continue from a position that is beyonda certain minimal distance from the current position. This idea is implemented asfollows. Let/D7/CRbethesolutionfromwhichtoescape( /D7/CRistypicallychosenas /D7
/A3
best,
the best solution found in the current search) and let the distance /CS /B4 /D7/BN /D7
/BC/B5between
two tours /D7and /D7
/BCbe the number of edges that differ. Then the following steps are
repeated until a solution beyond a minimal distance /CS/D1/CX/D2from /D7/CRis obtained./B4/BD/B5Generate /D4copies of /D7/CR./B4/BE/B5Toeach of the /D4solutions apply Perturbation followed by LocalSearch ./B4/BF/B5Choose the best /D5solutions, /BD /AK /D5 /AK /D4,as candidate solutions./B4/BG/B5Let /D7
/A3be the candidate solution with maximal distance to /D7/CR.I f /CS /B4 /D7
/A3/BN/D7/CR
/B5 /AK/CS/D1/CX/D2then repeat at /B4/BE/B5; otherwise return /D7
/A3.
The purpose of step 3 is to choose good quality solutions, while step 4guaran-
tees that the point from which the search will be continued is sufficiently different(far from)/D7/CR. The attempts are continued until a new solution is accepted, but one
gives up after some maximum number of iterations. Computational results for this
way of alternating between exploitation and exploration show that the method isvery effective, even when using only a 3-optlocal search [61, 60].
The second ILS developped for the TSP since 1997 is that of Katayama and
Narisha [38]. They introduce a new perturbation mechanism which they called agenetictransformation . Thegenetictransformation mechanismusestwotours,one
of which is the best found so far/D7
/A3
best, while the second solution /D7
/BCis a tour found
earlier in the search. First a random 4-opt move is performed on /D7
/A3
best, resulting
in /D7
/A3/BC. Then the subtours that are shared among /D7
/A3/BCand /D7
/BCare enumerated. The
resulting parts are then reconnected with a greedy algorithm. Computational ex-periments with an iterated LKalgorithm using the genetic transformation method
instead of the standard double-bridge move have shown that the approach is very
effective; further studies should be forthcoming.
25

4.2 ILS for scheduling problems
ILS has also been applied successfully to scheduling problems. Here we summa-
rize the different uses of ILS for tackling these types of systems, ranging fromsingle machine to complex multi-machine scheduling.
4.2.1 Single Machine Total Weighted Tardiness Problem (SMTWTP)
Congram, Potts and van de Velde [18] have presented an ILS algorithm for the
SMTWTPbasedonadynasearchlocalsearch. Dynasearchusesdynamicprogram-ming to find a best move which is composed of a set of independent interchangemoves; each such move exchanges the jobs at positions/CXand /CY, /CY /BI/BP /CX. Two in-
terchange moves are independent if they do not overlap, that is if for two movesinvolving positions/CX/BN /CYand /CZ/BN /D0wehave that /D1/CX/D2 /CU /CX/BN /CY /CV /BQ /D1/CP/DC /CU /CZ/BN /D0 /CVor vice versa.
This neighborhood is of exponential size and dynasearch explores this neighbor-
hood in polynomial time.
The perturbation consists of a series of random interchange moves. They also
exploit a well-known property of the SMTWTP: there exists an optimal solutionin which non-late jobs are sequenced in non-decreasing order of the due dates.This property is used in two ways: to diversify the search in the perturbation stepand to reduce the computation time of the dynasearch. In the acceptance criterion,Congram et al. introduce a backtrack step : after/ACiterations in which every new
localoptimumisaccepted,thealgorithmrestartswiththebestsolutionfoundsofar.Inournotation, thebacktrack step isaparticular choice forthehistory dependencebuilt into
AcceptanceCriterion .
Congram et al. used several different embedded LocalSearch , all associated
with the interchange neighborhood. These heuristics were: (i) dynasearch; (ii)
a local search based on first-improvement descent; (iii) a local search based onbest-improvement descent. Then they performed tests to evaluate these algorithmsusing random restart and compared them to using iterated local search. While ran-domrestart dynasearch performed onlyslightly betterthanthetwosimplerdescentmethods,theILSwithdynasearchsignificantly outperformedtheothertwoiterateddescent algorithms, which in turn were far superior to the random restart versions.Theauthorsalsoshowthattheiterateddynasearchalgorithmsignificanltyimprovesover the previously best known algorithm, a tabu search presented in [19].
4.2.2 Single andparallel machine scheduling
Brucker, Hurink, and Werner [11, 12] apply the principles of ILS to a number
of one-machine and parallel-machine scheduling problems. They introduce a lo-cal search method which is based on two types of neighborhoods. At each step
26

one goes from one feasible solution to a neighboring one with respect to the sec-
ondary neighborhood. The main difference with standard local search methods isthat this secondary neighborhood is defined on the set of locally optimal solutionswith respect to the first neighborhood. Thus in fact this is an ILS with two nested
neighborhoods; searching in their primary neighborhood corresponds to our local
search phase; searching in their secondary neighborhood is like our perturbationphase. The authors also note that the second neighborhood is problem specific;this is whatarises inILSwhere the perturbation should be adapted to the problem.The search at a higher level reduces the search space and at the same time leads tobetter results.
4.2.3 Flow shopscheduling
St¨utzle [58] applied ILS to the flow shop problem (FSP). The algorithm is based
on a straightforward first-improvement local search using the insertion neighbor-hood, where a job at position/CXis removed and inserted at position /CY /BI/BP /CX. The
initial schedule is constructed by the NEH heuristic [55] while the perturbation iscomposed ofanumber oftwodifferent kinds ofmoves: swapswhichexchange thepositions oftwoadjacent jobs, andinterchange moveswhichhavenoconstraint onadjacency. Experimentally, it was found that perturbations with just a few swapand interchange moves weresufficient toobtain very good results. Thearticle also
compares different acceptance criteria of which ConstTemp , which is the same
astheLSMCacceptance criterion except thatitusesaconstant temperature/CC/CR,was
found to be superior to Better. The computational results show that despite the
simplicity of the approach, the quality of the solutions obtained is comparable tothat ofthebestperforming local search algorithms fortheFSP;wereferto[58]fora more detailed discussion.
ILShasalsobeenusedtosolveflow-shopproblemwithseveralstagesinseries.
Yang, Kreipl and Pinedo [65] presented such a method; at each stage, instead of asingle machine,there isagroup ofidentical parallel machines. Theirmetaheuristichas two phases that are repeated iteratively. In the first phase, the operations areassigned to the machines and an initial sequence is constructed. The second phase
uses an ILS to find better schedules for each machine at each stage by modifying
thesequence ofeachmachine. (Thispartisverysimilar inspirittotheapproach ofKreiplfortheminimumtotalweightedtardinessjob-shop problem[41]thatwillbepresented below.) Yang, Kreipl and Pinedo also proposed a “hybrid” metaheuris-tic: they first apply a decomposition procedure that solves a series of single stagesubproblems; then they follow this by their ILS. The process is repeated until asatisfactory solution is obtained.
27

4.2.4 Job shopscheduling
Lourenc¸o[43]andLourenc ¸oandZwijnenburg [45]used ILStotacklethejobshop
scheduling problem (JSP). They performed extensive computational tests, com-paring different ways to generate initial solutions, various local search algorithms,different perturbations, and three acceptance criteria. While they found that theinitial solution had only a very limited influence, the other components were veryimportant. Perhaps the heart of their work is the way they perform the pertur-bations. They consider relaxations of the problem at hand corresponding to the
optimization of just some of the jobs. Then they use exact methods to solve these
sub-problems, generating the perturbation move. This has the great advantage thatmuch problem-specific knowledge is built into the perturbation. Such a problemspecific perturbation also leads to solutions which are difficult to find when usinglocal moves. Now for the local search, three alternatives were considered: localdescent, short simulated annealing runs, and short tabu search runs. Best resultswere obtained using the tabu search in the local search phase. Not surprisingly,ILS performed better than random restart given the same amount of time, for anychoice of the embedded local search heuristic.
In more recent work on the job-shop scheduling problem, Balas and Vaza-
copoulos [4] presented a variable depth search heuristic which they called guidedlocal search (GLS).This is in fact a case of an ILSalgorithm. GLSis based on the
concept of neighborhood trees, proposed by the authors, where each node corre-
sponds toasolution andthechildnodesareobtained byperforming aninterchangeonsomecriticalarc. Intheirwork,theinterchange moveconsistsinreversingmorethan one arc and can be seen as a particular kind of variable depth interchange.They embedded GLS within the shifting bottleneck (SB) procedure by replacingthe reoptimization cycle of the SB procedure with a number of cycles of the GLSprocedure. They call this procedure SB-GLS1. Later, they also proposed a variantof this method, SB-GLS2, which works as follows: After all machines have beensequenced, they iteratively remove one machine and apply GLS to a smaller in-stancedefinedbytheremainingmachines. ThenagainGLSisappliedontheinitialinstance containing allmachines. Thisprocedure isanILSwhereaperturbed solu-
tion isobtained by applying a(variable depth) local search to apart of aninstance.
In some sense, both heuristics are similar to the one proposed by Lourenc ¸o [43],
the main differences is that Lourenc ¸o’s heuristic applies perturbation to complete
schedules and the SB-GLSheuristics starts by an empty (unfeasible) schedule anditeratively optimize machine by machine, until all machines have been scheduled,in a SB style followed by a local search application. They also differ in the localsearch algorithm applied. The authors perform a computational experiment andcompare withother metaheuristics and conclude that SB-GLS(1ands 2)are robust
28

andefficient,andprovideschedules ofhighquality inareasonable computingtime
Recently, Kreipl applied ILS to the total weighted tardiness job shop schedul-
ing problem (TWTJSP)[41]. The TWTJSPis closer to real life problems than theclassical JSP with makespan objective, because it takes into account release and
due dates and also weights that indicate the importance of each job. Kreipl uses an
ILS algorithm with the RWacceptance criterion. The algorithms starts with an ini-
tial solution obtained by the shortest processing time rule [33]. The local search isbased onaneighborhood whichconsists inreversing critical arcsandadjacent arcstothelastones,whereacritical archastobeanelementofatleastonecritical path(theremayexistseveralcriticalpaths). OneoriginalaspectofthisILSisthepertur-bation step: Kreipl applies a few steps of simulated annealing type algorithm withthe Metropolis acceptance criterion [51] but with a fixed temperature/CC/CR. For this
perturbation phase a smaller neighborhood than the one used in the local searchphase is taken: While in the local search phase any critical arc can be reversed,during the diversification phase only the critical arcs belonging to the critical pathhaving the job with highest impact on the objective function are considered.
8The
number of iterations performed in the perturbation depends how good is the in-
cumbent solution. In promising regions, only a few steps are applied to stay neargood solutions, otherwise, a”large” perturbation isapplied topermitthe algorithmto escape from this poor region. Computational results with the ILS algorithm ona set of benchmark instances has shown a very promising performance comparedto an earlier shifting bottleneck heuristic [57] proposed for the same problem.
4.3 ILS for other problems
4.3.1 Graphbipartitioning
ILS algorithms have been proposed and tested on a number of other problems,
thoughnotasthoroughlyastheoneswehavediscussedsofar. Weconsiderherethegraphbipartitioning problem. Givena(weighted)graphandabisectionorpartition
of its vertices into two sets/BTand /BUof equal size, call the cut of the partition the
sum of the weights of the edges connecting the two parts. The graph partitioningproblem is to find the partition with the minimum cut. Martin and Otto [46, 47]introduced anILSforthis problem following their earlier workontheTSP.Forthelocal search, they used the Kernighan-Lin variable depth local search algorithm(KL) [39] which is the analog for this problem of the LKalgorithm. In effect,
KLfinds intelligently/D1vertices of one set to be exchanged with /D1of the other.
Then, when considering possible perturbations, they noticed aparticular weakness
8It should be noted that the perturbation phase leads, in general, to some intermediate solution
which isnot locally optimal.
29

of theKLlocal search: KLfrequently generates partitions with many “islands”,
i.e., the two sets /BTand /BUare typically highly fragmented (disconnected). Thus
they introduced perturbations that exchanged vertices between these islands ratherthan between the whole sets/BTand /BU. This works as follows: choose at random
one of the cut edges, i.e., an edge connecting /BTand /BU. This edge connects two
“seed” vertices each belonging to their island. Around each seed, iteratively growa connected cluster of vertices within each island. When a target cluster size or awhole island size is reached, stop the growth. Thetwo clusters are then exchangedand this is the perturbation move. Finally, for the acceptance criterion, Martin andOtto used the Better acceptance criterion. The overall algorithm significantly
improved over the embedded local search (random restart of KL);it also improved
over simulated annealing if the acceptance criterion was optimized.
Atthetimeofthatwork,simulatedannealingwasthestateoftheartmethodfor
the graph bisection problem. Since then, there have been many other metaheuris-tics[5,50]developped forthisproblem,sotheperformance thatmustbereachedismuch higher now. Furthermore, given that the graph bipartitioning problem has a
lowcost-distance correlation[50],ILShasdifficultysamplingallgoodlowcostso-
lutions. To overcome this, some form of history dependence most certainly wouldhave to be built into either the perturbation or the acceptance criterion.
4.3.2 MAX-SAT
BattitiandProtasipresent anapplication of reactive search totheMAX-SATprob-
lem [6]. Their algorithm consists of two phases: a local search phase and a diver-sification (perturbation) phase. Because of this, their approach fits perfectly intothe ILS framework. Their perturbation is obtained by running a tabu search onthe current local minimum so as to guarantee that the modified solution/D7
/BCis suf-
ficiently different from the current solution /D7
/A3. Their measure of difference is just
the Hamming distance; the minimum distance is set by the length of a tabu listthat is adjusted during the run of the algorithm. For the
LocalSearch , they use a
standard greedy descent local search for the MAX-SAT problem. Depending onthe distance between/D7
/A3/BCand /D7
/A3, the tabu list length for the perturbation phase is
dynamically adjusted. The next perturbation phase is then started based on solu-
tion /D7
/A3/BC—corresponding to the RWacceptance criterion. This work illustrates very
nicely howonecan adjust dynamically the perturbation strength inanILSrun. Weconjecture that similar schemes will prove useful to optimize ILS algorithms in anearly automatic way.
30

4.3.3 Prize-collecting Steiner tree problem
The last combinatorial optimization problem we discuss is the prize-collecting
Steiner tree problem on graphs. Canudo, Resende and Ribeiro [13] presented sev-erallocalsearchstrategiesforthisproblem: iterativeimprovement,multi-startwithperturbations, path-relinking, variable neighborhood search, andaalgorithm basedon the integration of all these. They showed that all these strategies are effectivein improving solutions; in fact in many of their tests they found the optimal solu-tion. One of their proposed heuristics, local search with perturbations, is in fact an
ILS.Inthat approach, theyfirstgenerated initial solutions bythe primal-dual algo-
rithmofGoemansandWiliamson(GW)[29]butwherethecostfunctionisslightlymodified. Canudo et al.proposed twoperturbation schemes: perturbation by elim-inations and perturbations by prize changes. In the first scheme, the perturbationis done by resetting to zero the prizes of some persistent node which appeared inthe solution build by GW and remained at the end of local search in the previousiteration. The perturbation by prize changes consists in the introduction of somenoise into the node prize.
This feature of always applying the perturbation to the last solution obtained
by the local search phase is clearly in our notation the ILS-RWchoice.
4.4 Summary
Theexamples wehavechosen inthissection stress several pointsthathave alreadybeen mentioned before. First, the choice of the local search algorithm is usuallyquite critical if one is to obtain peak performance. In most of the applications,the best performing ILS algorithms apply much more sophisticated local searchalgorithms than simple best- or first-improvement descent methods. Second, theother components of an ILS also need to be optimized if the state of the art is tobe achieved. This optimization should be global, and to succeed should involvethe use of problem-specific properties. Examples of this last point were given
for instance in the scheduling applications: there the good perturbations were not
simply random moves,rather they involved re-optimizations of significant parts ofthe instance (c.f. the job shop case).
The final picture we reach is one where (i) ILS is a versatile metaheuristic
which can easily be adapted to different combinatorial optimization problems; (ii)sophisticated perturbation schemes and search space exploration are the essentialingredients to achieve the best possible ILSperformance.
31

5 Relation to other metaheuristics
Inthissection wehighlight thesimilarities anddifferences whencomparing ILSto
otherwell-knownmetaheuristics. Weshalldistinguishmetaheuristicswhicharees-sentially variants of local search and those which generate solutions using amech-anism that is not necessarily based on an explicit neighborhood structure. Among
the first class which we call neighborhood based metaheuristics are methods like
simulated annealing (SA) [40, 15], tabu search (TS) [24, 25, 28] or guided localsearch (GLS) [64]. The second class comprises metaheuristics like GRASP [21],ant colony optimization (ACO) [20], evolutionary algorithms (EA) [3, 53], scattersearch [27], variable neighborhood search (VNS) [31, 54] and ILS. Some meta-heuristics of this second class, like EAs and ACO, do not necessarily make useof local search algorithms; however a local search can be embedded in them, inwhich case the performance is usually enhanced. The other metaheuristics in thisclass explicitely use embedded local search algorithms as an essential part of theirstructure. Forsimplicity, wewillassumeinwhatfollowsthatallthemetaheuristicsofthissecond classdoincorporate localsearchalgorithms. Inthiscase,suchmeta-heuristics generate iteratively input solutions that are passed to alocal search; they
can thus be interpreted as multi-start algorithms, using the most general meaning
of that term. This is why we call them multi-start based metaheuristics
5.1 Neighborhood based metaheuristics
Neighborhood based metaheuristics are extensions of iterative improvement al-
gorithms and avoid getting stuck in locally optimal solutions by allowing movesto worse solutions in the local search. Different metaheuristics of this class differmainlyinthemovestrategies. Inthecaseofsimulatedannealing,theneighborhood
issampledrandomlyandworsesolutions areacceptedwithaprobability whichde-
pends on a temperature parameter and the degree of deterioration incurred; betterneighboring solutions are usually accepted while much worse neighboring solu-tions areaccepted withalowprobability. Inthecaseof(simple) tabusearch strate-gies, the neighborhood is explored in an agressive way and cycles are avoided bydeclaring attributes of visted solutions as tabu. Finally, in the case of guided localsearch, the evaluation function is dynamically modified by penalizing certain so-lution components. This allows the search to escape from a solution that is a localoptimum of the original objective function.
Obviously, any of these neighborhood based metaheuristics can be used as the
LocalSearch algorithm ofanILS.Ingeneralhowever,thosealgorithms donothalt,
so it is necessary to limit their run time if they are to be embedded in ILS. One
particular advantage of combining neighborhood based metaheuristics with ILS
32

is that they often obtain much better solutions than iterative descent algorithms.
But this advantage usually comes at the cost of larger computation times. Sincethese metaheuristics allow one to obtain better solutions at the expense of greatercomputation times, we are confronted with the following optimization problem
when using them within an ILS:
9“For how long should one run the embedded
search in order toachieve the best tradeoff between computation time and solutionquality?” This is very analogous to the question of whether it is best to have afast but not so good local search or a slower but more powerful one. The answerdepends of course on the total amount of computation time available, and on justhow the costs improve with time.
A different type of connection between ILS, SA and TS arises from certain
similarities in the algorithms. For example, SA can be seen as an ILS withouta local search phase (SA samples the original space/CBand not the reduced space/CB
/A3)andwheretheacceptance criteriais LSMC /B4 /D7
/A3/BN/D7
/A3/BC/BNhistory /B5. WhileSAdoesnot
employmemory,theuseofmemoryisthemainfeatureofTSwhichmakesastronguse of historical information at multiple levels. Given its effectiveness, it can be
expected that this kind of use of memory will be more frequently used in future
ILSapplications. Furthermore, TS,as one prototype of a memory intensive searchprocedure, can be a valuable source of inspiration for deriving ILS variants with amoredirect memoryusage. Ontheotherhand,TSstrategies mayalsobeimprovedby features of ILS algorithms and by some insights gained from the research onILS.
5.2 Multi-start based metaheuristics
Multi-start based metaheuristics can be classified into constructive metaheuristics
andperturbation-based metaheuristics.
Well-known examples of constructive metaheuristics are ant colony optimiza-
tion and GRASP which both use a probabilistic solution construction phase. Animportant difference between ACOand GRASPis that ACOhas an indirect mem-ory of the search process which is used to bias the construction process, whereasGRASP does not have that kind of memory. An obvious difference between ILSand constructive metaheuristics is that ILS never constructs soutions. Howeverboth generate asequence ofsolutions, andiftheconstructive metaheuristic usesanembedded local search, both go from one local minimum to another. So it mightbe said that the perturbation phase of an ILSis replaced by a (memory-dependent)
construction phase in these constructive metaheuristics. But another connection
can be made: ILS can be used instead of the embedded “local search” in an al-
9Thisis not a particular problem of ILS;itarises for allmulti-starttype metaheuristics.
33

gorithm like ant colony optimization or GRASP. This is one way to generalize
ILS, but it is not specific to these kinds of metaheuristics: whenever one has anembedded local search, one can try to replace it by an iterated local search.
Perturbation-based metaheuristics differ in the techniques they use to actually
perturb solutions. Before going into details, let us introduce one additional fea-
ture for classifying metaheuristics: we will distinguish between population-basedalgorithms and those that use asingle current solution (the population isof size 1).Forexample,EA,scattersearch,andantcolonyoptimizationarepopulation-based,while ILS uses a single solution at each step. Whether or not a metaheuristics ispopulation-based isimportantforthetypeofperturbation thatcanbeapplied. Ifnopopulation isused, newsolutions aregenerated byapplying perturbations tosinglesolutions; this is what happens for ILS and VNS. If a population is present, onecan also use the possibility of recombining several solutions into a new one. Suchcombinations of solutions are implemented by “crossover” operators in EAs or inthe recombination of multiple solutions in scatter search.
Ingeneral,population-based metaheuristics aremorecomplextousethanthose
following a single solution: they require mechanisms to manage a population of
solutions and more importantly it is necessary to find effective operators for thecombination of solutions. Most often, this last task is a real challenge. The com-plexity of these population-based local search hybrid methods can be justified ifthey lead to better performance than non-population based methods. Therefore,one question of interest is whether using a population of solutions is really use-ful. Unfortunately, there arevery fewsystematic comparisons of algorithms whichaddress that issue exist [19, 23, 37, 60, 63]. Clearly for some problems such asthe TSP with high cost-distance correlations, the use of a single element in thepopulation leads to good results, so the advantage of population-based methods issmall or nil. However, for other problems (with less cost-distance correlations), it
is clear that the use of a population is an appropriate way to achieve search space
exploration. Thuspopulation basedmethodsaredesirable iftheircomplexityisnotoverwhelming. Because of this, population-based extensions of ILS are promisingapproaches.
Todate,severalpopulation-based extensions ofILS havebeenproposed [1,34,
59]. Theapproaches proposed in[34,59]keep the simplicity of ILSalgorithms bymaintaining unchanged theperturbations: oneparentisperturbedtogiveonechild.But given that there is a population, the evolution depends on competition amongits members and only the fittests survive. One can give up this simplicity as wasdone in the approach of Applegate et al. [1]. Given the solutions in a populationthat have been generated by an ILS, they define a smaller instance by freezing the
components that are in common in all parents. (They do this in the context of the
TSP; the subtours that are in common are then fixed in the sub-problem.) They
34

then reoptimize this smaller problem using ILS.This idea is tested in [1],and they
find very high quality solutions to large TSPinstances.
Finally, let us discuss variable neighborhood search (VNS) which is the meta-
heuristic closest to ILS. VNS begins by observing that the concept of local opti-
mality is conditional on the neighborhood structure used in a local search. Then
VNSsystemizes the idea of changing the neighborhood during the search to avoidgetting stuck in poor quality solutions. Several VNS variants have been proposed.The most widely used one, basic VNS , is in fact an ILS algorithm which typi-
cally uses the Better acceptance criterion and a systematic way of varying the
perturbation strength. To do so, basic VNS orders neighborhoods as/C6/BD
/BN/BM/BM/BM /BN /C6/D1
where the order is chosen according to the neighborhood size. Let /CZbe a counter
variable, /CZ /BP/BD /BN /BE /BN/BM/BM/BM /BN/D1, and initially set /CZ /BP/BD. If the perturbation and the sub-
sequent local search results in a newbest solution, then /CZis reset to /BD, otherwise /CZ
is increased by one.
There are several extensions of this basic VNS scheme. In skewed VNS the
acceptance criterion is modified in such a way that worse solutions are accepted,
takingintoaccounttheirdistancefromthecurrentsolution. Itisnoteworthythatthe
ILS variants of VNS have obtained world-class performance on a variety of com-binatorial optimization problems such as p-Median problem [30], clustering [32],as well as others [31].
A major difference between ILS and VNS is the philosophy underlying the
two metaheuristics: ILS explicitly has the goal of building a walk in the set oflocally optimal solutions, while VNS is based on changing neighborhoods duringthe search.
Clearly, there are major points in common between most of today’s high per-
formancemetaheuristics. Howcanonesummarizehowiteratedlocalsearchdiffersfromtheothers? Weshallproceed byenumeration asthediversity oftoday’s meta-
heuristics seems to forbid any simpler approach. When comparing to ACO and
GRASP, we see that ILS uses perturbations to create new solutions; this is quitedifferent in principle and in practice from using construction. When comparingto EAs and scatter search, we see that ILS as we defined it has a population sizeof/BD; therefore no recombination operators need be defined. We could continue
like this, but we cannot expect the boundaries between all metaheuristics to be soclear-cut. Not only are hybrid methods very often the way to go, but most oftenone can smoothly go from one metaheuristic to another. In addition, as mentionedatthebeginning ofthischapter, thedistinction betweenheuristic andmetaheuristicis rarely unambiguous. So our point of view is not that iterated local search hasessential features that are absent in other metaheuristics; rather, when considering
the basic structure of iterated local search, some simple yet powerful ideas tran-
spire, and these can be of use in most metaheuristics, being close or not in spirit to
35

iterated local search.
6 Conclusions
ILS has many of the desirable features of a metaheuristic: it is simple, easy to
implement, robust, and highly effective. The essential idea of ILS lies in focusingthe search not on the full space of solutions but on a smaller subspace defined bythe solutions that are locally optimal for a given optimization engine. The successof ILS lies in the biasedsampling of this set of local optima. How effective this
approach turns out to be depends mainly on the choice of the local search, theperturbations, and the acceptance criterion. Interestingly, even when using themost na¨ıve implementations of these parts, ILS can do much better than random
restart. But with further work so that the different modules are well adapted tothe problem at hand, ILS can often become a competitive or even state of the artalgorithm. This dichotomy is important because the optimization of the algorithm
canbedoneprogressively, andsoILScanbekeptatanydesiredlevelofsimplicity.
This, plus the modular nature of iterated local search, leads to short developmenttimes and gives ILS an edge over more complex metaheuristics in the world ofindustrial applications. As an example of this, recall that ILS essentially treatsthe embedded heuristic as a black box; then upgrading an ILS to take advantageof a new and better local search algorithm is nearly immediate. Because of allthese features, we believe that ILS is a promising and powerful algorithm to solvereal complex problems in industry and services, in areas ranging from finance toproduction management and logistics. Finally, let us note that although all of thepresent review was given in the context of tackling combinatorial optimizationproblems,inrealitymuchofwhatwecoveredcanbeextendedinastraight-forward
manner to continuous optimization problems.
The ideas and results presented in this chapter leave many questions unan-
swered. Clearly, more work needs to be done to better understand the interplaybetween the ILS modules
GenerateInitialSolution ,Perturbation ,LocalSearch , and
Perturbation . Inparticular, weexpectsignificantimprovementstoarisethroughthe
intelligent useofmemory,explicitintensification anddiversificationstrategies, andgreater problem-specific tuning. The exploration of these issues has barely begunbut should lead to higher performance iterated local search algorithms.
Looking ahead towards future research directions, weexpect ILSto beapplied
to new kinds of problems. Some challenging examples are: (i) problems wherethe constraints are very severe and so most metaheuristics fail; (ii) multi-objectiveproblems,bringingoneclosertorealproblems; (iii)dynamicorreal-timeproblems
where the problem data vary during the solution process.
36

References
[1] D. Applegate, R. Bixby, V. Chv´ atal, and W. Cook. Finding tours
in the TSP. Preliminary version of a book chapter available viawww.keck.caam.rice.edu/concorde.html, 2000.
[2] D.Applegate, W.Cook, and A.Rohe. Chained Lin-Kernighan for large trav-
eling salesman problems. Technical Report No. 99887, Forschungsinstitutf¨ur Diskrete Mathematik, University of Bonn, Germany, 1999.
[3] T.B¨ack.Evolutionary Algorithms inTheoryandPractice . OxfordUniversity
Press, 1996.
[4] E. Balas and A. Vazacopoulos. Guided local search with shifting bottleneck
for job shop scheduling. Management Science , 44(2):262–275, 1998.
[5] R. Battiti and A. Bertossi. Greedy, prohibition, and reactive heuristics for
graph-partitioning. IEEETransactions on Computers , 48(4):361–385, 1999.
[6] R.BattitiandM.Protasi. Reactivesearch,ahistory-based heuristicforMAX-
SAT.ACMJournal of Experimental Algorithmics , 2, 1997.
[7] R. Battiti and G. Tecchiolli. The reactive tabu search. ORSA Journal on
Computing , 6(2):126–140, 1994.
[8] E.B.Baum. Iterated descent: Abetteralgorithm forlocalsearch incombina-
torial optimization problems. Technical report, Caltech,Pasadena, CA,1986.manuscript.
[9] E. B. Baum. Towards practical “neural” computation for combinatorial op-
timization problems. In J. Denker, editor, Neural Networks for Computing ,
pages 53–64, 1986. AIPconference proceedings.
[10] J. L. Bentley. Fast algorithms for geometric traveling salesman problems.
ORSAJournal on Computing , 4(4):387–411, 1992.
[11] P. Brucker, J. Hurink, and F. Werner. Improving local search heuristics for
some scheduling problems — part I. Discrete Applied Mathematics , 65(1–
3):97–122, 1996.
[12] P. Brucker, J. Hurink, and F. Werner. Improving local search heuristics for
some scheduling problems — part II. Discrete Applied Mathematics , 72(1–
2):47–69, 1997.
37

[13] S. A. Canuto, M. G. C. Resende, and C. C. Ribeiro. Local search with per-
turbations for the prize-collecting steiner tree problem in graphs. Submittedto Networks, 2000.
[14] J. Carlier. The one-machine sequencing problem. European Journal of Op-
erational Research , 11:42–47, 1982.
[15] V. Cern´ y. A thermodynamical approach to the traveling salesman problem.
Journal of Optimization Theory and Applications , 45(1):41–51, 1985.
[16] N. Christofides. Worst-Case Analysis of a New Heuristic for the Travelling
Salesman Problem. Technical Report 388, Graduate School of Industrial Ad-ministration, Carnegie-Mellon University, Pittsburgh, PA,1976.
[17] B. Codenotti, G. Manzini, L. Margara, and G. Resta. Perturbation: An effi-
cient technique for the solution of very large instances of the Euclidean TSP.INFORMSJournal on Computing , 8:125–133, 1996.
[18] R. K. Congram, C. N. Potts, and S. L. Van de Velde. An iterated dynasearch
algorithm for the single–machine total weighted tardiness scheduling prob-lem. INFORMSJournal on Computing, to appear, 2000.
[19] H. A. J. Crauwels, C. N. Potts, and L. N. Van Wassenhove. Local search
heuristicsforthesinglemachinetotalweightedtardinessschedulingproblem.
INFORMSJournal on Computing , 10(3):341–350, 1998.
[20] M. Dorigo and G. Di Caro. The Ant Colony Optimization meta-heuristic.
In D. Corne, M. Dorigo, and F. Glover, editors, New Ideas in Optimization ,
pages 11–32. McGrawHill, 1999.
[21] T. A. Feo and M. G.C. Resende. Greedy randomized adaptive search proce-
dures.Journal of Global Optimization , 6:109–133, 1995.
[22] C. Fonlupt, D. Robilliard, P. Preux, and E.-G. Talbi. Fitness landscape and
performance of meta-heuristics. In S. Voss, S. Martello, I.H. Osman, andC.Roucairol,editors, Meta-Heuristics: AdvancesandTrendsinLocalSearch
Paradigms for Optimization , pages 257–268. Kluwer Academic Publishers,
Boston, MA,1999.
[23] C. Glass and C. Potts. A comparison of local search methods for flow shop
scheduling. Annals of Operations Research , 63:489–509, 1996.
[24] F.Glover. TabuSearch –PartI. ORSAJournal onComputing , 1(3):190–206,
1989.
38

[25] F. Glover. Tabu Search – Part II. ORSA Journal on Computing , 2(1):4–32,
1990.
[26] F.Glover. Tabuthresholding: Improvedsearchbynonmonotonic trajectories.
ORSAJournal on Computing , 7(4):426–442, 1995.
[27] F. Glover. Scatter search and path relinking. In D. Corne, M. Dorigo, and
F.Glover, editors, NewIdeas in Optimization , pages 297–316. McGraw Hill,
1999.
[28] F. Glover and M. Laguna. Tabu Search . Kluwer Academic Publishers,
Boston, MA,1997.
[29] M. X. Goemans and D. P. Williamson. The primal dual method for ap-
proximation algorithms and its application to network design problems.In D. Hochbaum, editor, Approximation algorithms for NP-hard problems ,
pages 144–191. PWSPublishing, 1996.
[30] P. Hansen and N. Mladenovi´ c. Variable neighbourhood search for the/D4-
Median.Location Science , 5:207–226, 1997.
[31] P. Hansen and N. Mladenovi´ c. An introduction to variable neighborhood
search. InS.Voss,S.Martello, I.H.Osman,andC.Roucairol, editors, Meta-
Heuristics: Advances and Trends in Local Search Paradigms for Optimiza-tion, pages 433–458. Kluwer Academic Publishers, Boston, MA,1999.
[32] P. Hansen and N. Mladenovi´ c. J-MEANS: A new local search heuristic for
minimum sum-of-squares clustering. Pattern Recognition, to appear, 2000.
[33] R. Haupt. A survey of priority rule-based scheduling. OR Spektrum , 11:3–6,
1989.
[34] I. Hong, A. B. Kahng, and B. R. Moon. Improved large-step Markov chain
variants for the symmetric TSP. Journal of Heuristics , 3(1):63–81, 1997.
[35] T. C. Hu, A. B. Kahng, and C.-W. A. Tsao. Old bachelor acceptance: A
new class of non-monotone threshold accepting methods. ORSA Journal on
Computing , 7(4):417–425, 1995.
[36] D. S. Johnson. Local optimization and the travelling salesman problem. In
Proceedings of the17th Colloquium on Automata, Languages, and Program-ming, volume 443 of LNCS,pages 446–461. Springer Verlag, Berlin, 1990.
39

[37] D.S.Johnson and L.A.McGeoch. Thetravelling salesman problem: Acase
study in local optimization. In E.H.L. Aarts and J.K. Lenstra, editors, Local
Search in Combinatorial Optimization , pages 215–310. John Wiley & Sons,
Chichester, England, 1997.
[38] K. Katayama and H. Narihisa. Iterated local search approach using genetic
transformation to the traveling salesman problem. In Proc. of GECCO’99 ,
volume 1, pages 321–328. Morgan Kaufmann, 1999.
[39] B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning
graphs.Bell Systems Technology Journal , 49:213–219, 1970.
[40] S.Kirkpatrick, C.D.Gelatt Jr., and M.P.Vecchi. Optimization by simulated
annealing. Science, 220:671–680, 1983.
[41] S. Kreipl. A large step random walk for minimizing total weighted tardiness
in a job shop. Journal of Scheduling , 3(3):125–138, 2000.
[42] S.LinandB.W.Kernighan. Aneffectiveheuristicalgorithmforthetravelling
salesman problem. Operations Research , 21:498–516, 1973.
[43] H. R. Lourenc ¸o. Job-shop scheduling: Computational study of local search
and large-step optimization methods. European Journal of Operational Re-
search, 83:347–364, 1995.
[44] H. R. Lourenc ¸o. A polynomial algorithm for a special case of the one–
machine scheduling problem with time–lags. Technical Report EconomicWorking Papers Series, No.339, Universitat Pompeu Fabra, 1998. submittedto Journal of Scheduling.
[45] H. R. Lourenc ¸o and M. Zwijnenburg. Combining the large-step optimiza-
tion with tabu-search: Application to the job-shop scheduling problem. In
I.H. Osman and J.P. Kelly, editors, Meta-Heuristics: Theory & Applications ,
pages 219–236. Kluwer Academic Publishers, 1996.
[46] O. Martin and S. W. Otto. Partitoning of unstructured meshes for load bal-
ancing.Concurrency: Practice and Experience , 7:303–314, 1995.
[47] O. Martin and S. W. Otto. Combining simulated annealing with local search
heuristics. Annals of Operations Research , 63:57–75, 1996.
[48] O. Martin, S. W. Otto, and E. W. Felten. Large-step Markov chains for the
traveling salesman problem. Complex Systems , 5(3):299–326, 1991.
40

[49] O. Martin, S. W. Otto, and E. W. Felten. Large-step Markov chains for
the TSP incorporating local search heuristics. Operations Research Letters ,
11:219–224, 1992.
[50] P. Merz and B. Freisleben. Fitness landscapes, memetic algorithms and
greedy operators for graph bi-partitioning. Evolutionary Computation ,
8(1):61–91, 2000.
[51] N.Metropolis, A.Rosenbluth, M.Rosenbluth, ATeller,andM.Teller. Equa-
tion of state calculations for fast computing machines. Journal of Chemical
Physics, 21:1087–1092, 1953.
[52] M. M´ ezard, G. Parisi, and M. A. Virasoro. Spin-Glass Theory and Beyond ,
volume 9 of Lecture Notes in Physics . World Scientific, Singapore, 1987.
[53] Z. Michalewicz and D. B. Fogel. How to Solve it: Modern Heuristics .
Springer Verlag, Berlin, 2000.
[54] N. Mladenovi´ c and P. Hansen. Variable neighborhood search. Computers &
Operations Research , 24:1097–1100, 1997.
[55] M. Nawaz, E. Enscore Jr., and I. Ham. A Heuristic Algorithm for the /D1-
Machine, /D2-Job Flow-Shop Sequencing Problem. OMEGA, 11(1):91–95,
1983.
[56] G.R.SchreiberandO.C.Martin. Cutsizestatisticsofgraphbisectionheuris-
tics.SIAM Journal on Optimization , 10(1):231–251, 1999.
[57] M. Singer and M. Pinedo. A shifting bottleneck heuristic for minimizing the
total weighted tardiness inajob shop. IIEScheduling and Logistics , 30:109–
118, 1997.
[58] T. St¨ utzle. Applying iterated local search to the permutation flow shop prob-
lem. Technical Report AIDA–98–04, FG Intellektik, TU Darmstadt, August1998.
[59] T.St¨ utzle.LocalSearchAlgorithmsforCombinatorial Problems—Analysis,
Improvements, and New Applications . PhD thesis, Darmstadt University of
Technology, Department of Computer Science, 1998.
[60] T. St¨ utzle, A. Gr¨ un, S. Linke, and M. R¨ uttger. A comparison of nature in-
spired heuristics on the traveling salesman problem. In Deb et al., editor,
Proc. of PPSN-VI , volume 1917 of LNCS, pages 661–670. Springer Verlag,
Berlin, 2000.
41

[61] T.St¨ utzle andH.H.Hoos. Analyzing therun-timebehaviour ofiterated local
search for the TSP. Technical Report IRIDIA/2000-01, IRIDIA, Universit´ e
Libre de Bruxelles, 2000. Available at http://www.intellektik.informatik.tu-darmstadt.de/˜tom/pub.html.
[62]´E.D.Taillard. Comparison of iterative searches for the quadratic assignment
problem. Location Science , 3:87–105, 1995.
[63] R. J. M. Vaessens, E. H. L. Aarts, and J. K. Lenstra. Job shop scheduling by
local search. INFORMSJournal on Computing , 8:302–317, 1996.
[64] C. Voudouris and E. Tsang. Guided Local Search. Technical Report Techni-
cal Report CSM-247,Department of Computer Science, University of Essex,1995.
[65] Y. Yang, S. Kreipl, and M. Pinedo. Heuristics for minimizing total weighted
tardiness in flexible flowshops. Journal of Scheduling , 3(2):89–108, 2000.
42

Similar Posts