The Digital Labyrinth
Chapter 4
The Lie of the Land
What makes some fitness landscapes smooth and tractable while others are hellishly rugged and uncorrelated? Factors in the description of the problem domain – engineering, biology, economics or computing – can have a radical impact on the randomness and hence searchability of the underlying fitness landscape.1
Why are some fitness landscapes smooth with a small number of peaks of similar height? Other landscapes are rugged like the Aplps, with large numbers of high peaks separated by deep valleys? In the extreme case, landscapes can be totally random with neighbouring points having no obvious relationship. This chapter will explore the factors that shape the degree of ruggedness of a fitness landscape.
Let us look at a couple of hypothetical and very different landscapes.
Firstly consider a landscape containing only one peak, with smoothly sloping sides that glide away in all directions from the central peak. This is a ‘Mount Fujiyama’ landscape if you like. In such a landscape each point differs only slightly in fitness from its neighbours; points down the slope having slightly lower fitness and points upstream being slightly fitter.
In the landscape of Mount Fujiyama our simplest strategy, Hill Climbing, will always find the optimal solution – there are no local peaks on which we can get stranded. Wherever we start on the landscape, we just inspect our immediate neighbours and follow a path to the neighbour with the highest local fitness. Wherever we start, the gradient always points inexorably upwards and over time our meandering will always find the central peak of highest global fitness.
Such a landscape is called highly correlated – points with the most similar fitness ratings are located in the same proximity in the hyperspace of possibilities. Each point is close in fitness to its nearest neighbours. By always moving towards points of higher fitness we must move inexorably uphill to the optimally fit solution.
Now consider an alternative landscape where adjacent points can have wildly different fitness ratings. Such rugged landscapes are poorly correlated and are not realistically searchable using either hill climbing or simulated annealing strategies. Any movement will almost certainly strand us on a local peak of intermediate fitness. If we gradually increase the degree of ruggedness we move from a highly jagged landscape of sharp peaks with very steep sides to a totally random landscape.
In a random landscape the fitness values of neighbouring points vary totally at random – each point is an island to itself with vertical sides. From any starting point, a random landscape offers no clues or gradients to help us locate points of even modestly higher fitness – let alone the global fittest peak. A random landscape offers no smooth gradient to guide us in which way our next move should be. The highest peaks are randomly distributed and at large distances apart in ‘design space’.
Local Peaks Everywhere!
A key question for any landscape is how many local peaks are there and how close together are they?
Consider our generic landscape where each point has a fitness value determined by N parameters (or ‘genes’ if you like). Each parameter can take one of M possible values, but to keep things simple lets limit each parameter to one of two possible values: 0 and 1 say. In such a simple setup, a system of 5 parameters (N=5) might be represented by 25 possible candidate solutions. Each solution (or “cell”) is a 5-digit binary string giving the value of each of the N parameters – 11011, 11100, 10100 for example.
In our binary landscape, each cell is surrounded by N immediate neighbours, where each neighbour differs by a single change (flipping a ‘0’ to a ‘1’ for example) in one of the N parameter slots. Each neighbour will have a slightly different fitness value to our central cell. Therefore the chance that the central point is a local peak is simply 1/ (N+1). Given that the entire landscape contains 2N points, the number of local peaks is given by 2N/(N+1).
Consider some numbers with a simple example. Let there be N parameters, each of which can either be 0 or 1. Then there are 2N points in our fitness landscape and each point is surrounded by N neighbours. There are 2N/(N+1) local peaks and so even for a modest value of N=50 we have over 22 billion local peaks! Real world fitness landscapes are many orders of magnitude larger than this!
The next thing to consider is how far we must move across the landscape before we arrive at the next local peak. In the world of Mount Fujiyama we will always move the distance between our starting point and the central global peak. In highly rugged landscapes however there are a very large number of local peaks. These local peaks are close together and have similar and modest fitness values. From any starting point, any uphill path will quickly terminate on a nearby local peak - we can move only a short distance in design space before being trapped. The chance that the next local peak will be the global optimal peak will be infinitesimally small – only 1 in 2N/(N+1).
In general then, for almost any starting point on such a landscape, we will only ever explore a very tiny part of the entire landscape and never even get close to the global optimum fitness. The optimal solution we seek may lie either just on the other side of the next un-crossable valley or be vastly far away. It doesn’t matter since we can never escape our immediate neighbourhood.
For the naive hill climber the problem is even harder than it first appears. As one moves gradually uphill, there are fewer and fewer upward-leading paths and the myopic searcher must try an ever increasing number of blind alleys before the next upward step is found. This means that, since it requires some finite time and effort to try alternate paths, improvement becomes slower and slower, with each step delivering diminishing returns.
For random landscapes, the rate at which progress slows is determined by the number of possible values M for each cell. For each step upwards, the number of routes uphill available to us decreases by a factor of M. Thus for our simple binary landscape (M=2), then the number of possible routes upward halves after each step. At each step we will have to try twice (or M times in general) as many routes as before, so that after the J th step we will have to try MJ routes! Thus for our simple binary landscape, after only 10 steps we will need to test 1024 routes.
What does this mean in practice? In rugged fitness landscapes we see an initial rapid increase in fitness as we explore the starting neighbourhood, but this increase slows exponentially while also taking exponentially longer for each succeeding step. Simple hill climbing strategies can take a long time to achieve modest improvements in fitness and will generally become isolated near to our starting point on a low peak of poor fitness. It becomes absurd to think of searching such spaces using our simple gradient-ascent based strategies: Hill Climbing and Simulated Annealing.
The ruggedness, or degree of correlation, of the fitness landscape has a direct and profound impact upon the difficulty of design. A perfectly correlated landscape (Mount Fujiyama) offers an obvious and continuous path towards an optimal design that satisfies all our constraints and performance criteria. If our domain results in a highly rugged fitness landscape, then it will be VASTLY difficult to search the space for the best designs. Any slight variation of a solution in such a landscape is likely to plunge us down a ravine of poorer fitness. In other worlds, solutions will be highly brittle and lacking in any ability to adapt smoothly to changes in the domain (and hence the underlying fitness landscape).
Does this sound familiar? The fitness landscape of computer programs – the Library of Turing – is indeed a rugged and uncharted landscape. Taking a single step in almost any direction (corresponding perhaps to changing a single bit in a computer program) is likely to plunge us into a deep chasm of malfunction. Computer programs are fragile and brittle creatures that, if they work at all, balance precariously on their (local?) peaks of fitness, courting disaster all around.
The Cost of Compromise
So what determines whether a fitness landscape for some domain is smooth or rugged? What is it about some areas of optimisation that makes them solvable in finite time and resources and what makes others highly expensive, brittle and possibly intractable?
Most problems in design are about finding the optimal settings for the parameters that affect the design and determine its fitness rating. If our design parameters are completely independent then we can optimise each individually without affecting the fitness contribution of the other parameters. We simply assign the most optimal value to each parameter and there will emerge a single obvious combination which is the single globally fittest solution. With independent parameters therefore we have a single-peaked Mount Fujiyama landscape. Such landscapes are easy to search and the fittest solution is readily found.
In most real world situations however, our design parameters are not independent. Changing the value of one parameter will have some effect on one or more other parameters. Our parameters are said to be cross-coupled, and each can have a positive or negative effect on the fitness contributions of linked parameters. For example, if we optimise for cost then speed might be compromised. If we optimise for power then weight, cost and fuel consumption become compromised.
In genetics this phenomenon is well-known and is called epistatis. Genes do not act independently but operate within a complex web of interactions with other genes - one gene will have some effect, positive or negative, on how well other genes are able to express themselves. These genes in turn affect others in a complex web of positive and negative feedback.
The NK Model
How can we model this cross-coupling and investigate how this affects the shape of the resulting fitness landscape? An original and very powerful idea for this is the NK Model, devised by Stuart Kauffman at the Santa Fe Institute – a leading authority on the study and chaos and complexity.
The fitness contribution of each of the N parameters might be affected by input from K other parameters selected at random. Each of the K inputs to each parameter affects its fitness contribution by some randomly assigned weighting, positive or negative. In real systems, particularly those involving genetic epistatic couplings, the parameters interact in highly complex and unpredictable ways that are difficult to quantise. Even if we admit that we cannot model the epistatis exactly, it turns out that randomising the degree that epistatic coupling has on the fitness contributions of coupled parameters yields a useful analytical model.
As each parameter tries to optimise its own fitness contribution, it may adversely or positively affect the fitness contributions of parameters with which it has linkage. This cross-coupling of K inputs for each of the N parameters creates a web of conflicting constraints where none of the parameters can achieve its optimal fitness contribution in isolation. To see this simply, consider two parameters A and B. If neither are coupled to any other parameters then they are free to take their most optimum values, thus maximising the fitness of the design as a whole. However, if A and B are linked to another parameter, C say, then the value of A is modified by the value of C and likewise for B. Typically though the value of C that increases the fitness of A might decrease the fitness of B, and vice versa. Neither A nor B can achieve the same level of fitness contribution that they can without this epistatic coupling.
Suppose for simplicity our N parameters are binary and can each only be either 0 or 1. This is analogous to a simple genome where each gene can be one of two possible alternatives, called alleles. Suppose we consider the model [N=3, K=2], then we have 8 possible combinations of the3 parameters, each of which is coupled to 2 others.
What happens to the fitness landscape as we vary K between zero and the maximum value N-1?
Consider the case where K=0. Now each parameter is totally independent as it is not coupled to any other parameter. Each parameter is free to take any of its allowed values and the fitness of the design (organism if you like) is the normalised sum of the fitness contributions from each parameter. There must necessarily be one special case where all of the parameters are optimal and we have a point of greatest fitness – the global peak in our fitness landscape. Any other point on the landscape will have lower fitness but can move to a point of higher fitness by changing the value of just one of its parameters. Where our parameters are essentially discrete values (i.e. digital in nature as are genes) this corresponds to a single point mutation. Thus any point in the landscape lies on a route that leads to the global maximum by a series of single-parameter mutations. There can be no local minima in such a landscape since all points of lower fitness are linked to points of higher fitness. So for K=0 we have the simple single-peaked perfectly correlated landscape of Mount Fujiyama. Such a landscape is said to be highly correlated and smooth – all neighbouring points have only a slight variation in fitness rating. The fitness value of any one point tells me a lot about the fitness of my immediate neighbours. Furthermore, my neighbours provide a clear and unequivocal clue to the path I must take to globalise my overall fitness.
At the opposite extreme, suppose that every parameter is linked to every other parameter – the worst case where K is N-1. Any slight variation in the value of a single parameter will ripple throughout the other parameters, resulting in randomised changes of fitness contribution in all the other parameters. The overall result is the maximally random fitness landscape where there is a vast number of local peaks of poor average fitness. In such a landscape we can only explore an infinitesimal region of the whole universe of possibilities before being trapped on some lowly local peak.
The value of K, as a fraction of the total number of parameters N, becomes a critical control for the degree of ruggedness of the landscape. K is essentially a measure of how cross-coupled the system is. As the value of K is increased from 0 to its maximum value N-1, the landscape becomes progressively more rugged. The number of conflicting constraints increases with K so that there emerge a large number of low-fitness compromise solutions. As K increases the fitness values of the immediate neighbours of every point in our fitness landscape begin to differ to a larger and larger degree. Finally, at the maximally connected value, K = N-1, neighbouring points are totally randomised and have no correlation (or gradient) at all. Adaptation and search across such fitness landscapes becomes impossible using the simple peak-seeking strategies we have covered so far. The high peaks become lower in fitness and move further apart and the number of low fitness local peaks rapidly becomes hyper astronomical.
As K increases, not only does the number of peaks increase exponentially, upward paths through the landscape become progressively more difficult to find. Consequently, the time taken to make each improvement step becomes exponentially longer as we have an ever greater number of dead-ends to explore and discard. The value of K affects the ease with which a system can explore the landscape and adapt to find a good design solution. With high values of K even a small change in the value of one parameter can have large effects on parameters throughout the system.