The Digital Labyrinth

Chapter 3

The Library Of Babel

What actually is design in the most abstract terms? We explore the concept of search spaces and fitness landscapes. How large is the space of possible design? How can we best explore these to find optimal solutions in engineering, architecture, computing, medicine, chemistry and ultimately life itself?

“By this art you may contemplate the variations of the 23 letters..”.
The Anatomy of Melancholy, Robert Burton, 1621

Next Chapter

The process of design is one of winnowing the wheat from the chaff. How do we find the best solution from a host of competing possibilities? Is there a single optimal solution or are there many equally valid alternatives – or none at all? How large is the number of possible solutions that we must explore? Can we just examine each in turn to find the solution that best fits the object of our design – whether in function, performance, ergonomics, thermodynamics or even aesthetics? Is blind search ever a remotely practical strategy or does it become rapidly and forever useless once we move beyond toy problems? In this chapter we examine just how large search spaces become once we begin to quantify the parameters that characterise our domain of design.

In this chapter the capitalised word ‘Design’ means the abstract process of design itself rather than any specific product of Design – a lower-case ‘design’.

Whatever the field of application, the nature of Design is characterised by a given set of parameters which can take a range of possible values. These parameters may be direct physical correlates to some property of the design – height, width, angle or density for example. More generally the design parameters relate to more abstract or complex properties and behaviours – much in the way that the specific coding of a gene has a deterministic but indirect effect on the physical shape the resulting protein.

The ‘best’ design is the one that has the optimal combination of values for the subject parameters. The definition of what is ‘best’ depends wholly upon the application domain of the Design in question. For example, this may be the wing shape with the greatest lift and least drag. Alternatively, the ‘best’ design for a girder may be the one with the highest load-bearing strength with the least cross-section. One can imagine a host of possible designs for the simple paper-clip. Only a very few can deliver the functional purpose of the paper-clip while using the least amount of wire while being suitably sized and rounded for human fingers. To borrow a clichéd yet powerful metaphor from nature, we can replace the concept of ‘best’ with the ‘fittest’ design.

For example, we might be designing a new turbine blade for a jet engine. We can characterise the blade in terms of its profile, angle of attack, thickness, centre of gravity, taper, length and other similar physical properties. These are the parameters of our candidate designs. We also need to be clear about the desired properties we seek in the successful design. Perhaps the best blade will deliver the greatest airflow with the least drag and turbulence and thus also be the most energy-efficient. Other possible criteria might be the quantity of expensive material used (titanium perhaps) or the cost and difficulty of manufacture. Given our criteria for success, we can suggest a host of possible turbine blades, each with different values for the parameters, and then test its performance (in a wind tunnel or in a computer model simulation perhaps.) How well does each candidate blade meet our success criteria? Some blades will be better, or “fitter”, than others – the candidate blades forming an ordered set arranged by performance.

Consider the space of possible parameter settings. This is the space of possible designs that must be explored to find an optimal solution. Being very simplistic, if we have N parameters which can each take one of M possible discrete values then we have a total space of MN (M raised to the power N) possible solutions. In real world design of course our parameters are usually continuous variables rather than discrete values.

For kindergarten domains defined by perhaps 5 parameters, each with say 4 possible values, then we have a small and easily searchable space of only 45 (i.e. 512) possible designs. However, real world search spaces, even for seemingly trivial examples, rapidly become astronomically vast. Although many magnitudes larger, such search spaces are analogous to the famous parable of the chessboard with 1 grain of rice in square one, then two in the next and then four etcetera. The last square on the chessboard will contain 264 grains of rice – more than has ever been produced in the entire history of the universe.

Here is another counter-intuitive example. Take a sheet of paper and fold it in two upon itself. Then fold this in two again and repeat the process; each time folding the paper back on itself to double the thickness. Suppose that a sheet of paper is 1 mm thick, then after ten folds our folded pile is 210 mm high – 1024 mm or just over a meter. This seems about right. But if we continue to fold, doubling each time, after a mere 64 folds our pile is 264 mm high – roughly 1016 mm or 10 thousand billion kilometres. This distance is far larger than our entire solar system and is well on the way towards the nearest star! This seems incredible but is irrefutable given the nature of an exponential geometric progression.

The Library of Babel

Let us explore another far more interesting thought experiment which demonstrates how quickly problems quickly get exponentially out of control. The problem of Design is concerned with hyper-astronomically VAST spaces which must somehow be explored in finite time.

The Argentinean writer Jorge Luis Borges created a wonderful thought experiment in his short story The Library of Babel, part of the anthology Labyrinths (1962). Imagine a library, he said, containing all the possible books that have been and could ever be written. Borges described the library as a vast and unknown number of hexagonal chambers lined with bookshelves. Looking over the railings around the bookcases, the chambers have vertical air shafts that appear to soar both upwards and downwards as far as the eye can see. Each hexagonal chamber is connected through doorways to six adjacent hexagonal book-lined chambers. The entire library is therefore a vast yet finite honeycomb of almost uncountable chambers, each containing many billions of books.

The Library of Babel

In his story, Borges describes the plight of the human inhabitants, where the library is their entire universe. They are each born in some random chamber and must spend their lifetimes endlessly searching the Library for some specific book, only to die fruitlessly a few leagues from the hexagon where they were born. They are then tossed over the railing around the bookshelf, to fall endlessly down the almost infinitely long air shaft. A bleak tale indeed!

First: how many possible books are there? Limit each book to 500 pages, each containing a unique but random combination of letters, numbers and punctuation symbols (including space). How large would such a library have to be to accommodate this finite and, at first sight, searchable collection?

Suppose each book has 500 pages, each page containing 40 lines of 50 characters. If each character can be one of, say, 100 symbols (to include the upper and lower cases of the Latin alphabet, numerals and all of the punctuation symbols.) The total number of possible books is 100 raised to the power 500 x 40 x 50, i.e. 100 to the power of 1,000,000 or 100 1000000! This number is beyond any capacity to understand its VAST size. In comparison, the number of elementary particles in the entire visible universe is 100 40 – a number infinitesimally insignificant compared to 100 1000000. The Library of Babel therefore would be VASTLY bigger than quintillions of entire universes and then some!

A view down an airshaft in the Library of Babel

Although we have arbitrarily limited our books to 500 pages, the Library of Babel still contains ALL possible books. Longer works can simply be split into 500 page volumes that fit easily into the library. The library must then contain the sum of human knowledge, art, religion, literature, philosophy and scientific theory no less.

What would the books in the library look like?

The first book might have entirely blank pages. The second might contain only a single letter ‘A’ as the first character. The next might only contain a single ‘A’ but in the second character position. And so on, and so on. Most books in the library will be nothing but gibberish. Billions upon billions of books will contain the odd recognisable word or phrase amongst the gibberish. However, it is a certainty that among the books MUST be perfectly rendered copies of classics such as Anna Karenina. There must be one perfect copy of Anna Karenina as well as copies that contain a single letter typo, and then two typos etc. Consider the astonishing fact that there must be 100,000,000 books that differ from the canonical Anna Karenina by a single typo. Even copies containing 1000 or so typos will still be easily recognisable and perfectly readable as Anna Karenina. The Library also contains perfect and error-strewn translations of Anna Karenina into French, German, Italian, Spanish etc. The number of books perfectly recognisable as just Anna Karenina would themselves occupy whole galaxies of space requiring many centuries to explore, even at the speed of light!

It is also astonishing that amongst the books there must be the most perfect and accurate biography of your life from birth up to the moment of your death. Also of course there will be perfectly accurate biographies of every person who is living, has ever lived or, indeed, will live in the future! As well as the perfectly accurate biographies there will be uncountable spoof versions where I become Superman or invent a time machine and return to murder Caesar on his way to the Senate.

How would one even begin to search such a library for a specific volume? Do our normal pragmatic notions of search really have any remote application in such an inconceivably vast hyperspace. In his story, Borges made things awkward by placing the books at random on the bookshelves. But suppose that in our version of the library the books are ordered alphabetically. Let us initially group together all the books starting with ‘A’. Within this group we can create a subgroup whose second character is ‘A’. Repeating this for each of our possible 100 characters we get 100 subgroups. Each of these subgroups contains a further 100 subgroups which each in turn contain 100 subgroups, and so on. If each subgroup represents a logical ‘shelf’ of our library, and each shelf contains 100 other shelves at right-angles to it, then physically we would require a 1,000,000 dimensions to represent our whole library!

Suppose we find ourselves at the bookshelf containing the perfect canonical copy of Anna Karenina. Which direction should we move through the library, shelf by innumerable shelf, book by book, to find the perfect copy of, say, David Copperfield? Our path through the library between these two books will be some gazillions of light years long but at each step, each decision point, there is no clue as to which direction to take through our 1,000,000-dimensional hyperspace.

The Library of Mendel

The Library of Babel is actually an all-encompassing superset which contains some important and interesting subsidiary libraries. The Library of Mendel is a fascinating library containing all possible genomes. Each genome in the library is a set of genes that together encode for the physical body – the phenotype – of some logically possible creature.

The cells of living organisms each contain long strips of DNA, assembled into the famous double helix structure discovered by Watson and Crick in 1953. The DNA strands are grouped together into separate chromosomes, where each strip of DNA is divided into separate genes of varying lengths. The purpose of each gene is to encode for a particular protein molecule, to be synthesised by the cell as part of its chemical metabolism.

Each protein is assembled from a specific sequence of its building blocks – amino acids. Each gene specifies a specific protein by encoding the exact sequence of amino acids using a digital code expressed in terms of four simple chemicals; the four nucleotide bases A,G, C and T ( Adenine, Guanine, Cytosine and Thymine). Since there are twenty different amino acids that must be encoded, we shall need 3 bases to cover the 20 possibilities ( 43 = 32). Bases in the gene sequence are grouped into codons – sets of three which each encode one of the twenty possible amino acids. Since 3 bases are enough to encode up to 32 amino acids and we only need 20, there is a lot of redundancy in this coding scheme. While containing well-defined genes, each strand of DNA also includes long sequences of information that do not encode useful proteins. Once called ‘junk DNA’ these non-coding sequences actually play a vital part in the machinery of cell division and chromosome organisation.

So how large is the Library of Mendel? The human genome contains some 10 million base pairs; each one of the 4 bases A,G,C,T. Thus there can be 410,000,000 possible genomes in the library. Each genome will typically span across many ‘books’ in the Library of Babel, but this, as we have said, is not an issue logically.

One can think of the Library of Mendel as the space of all possible creatures that can be encoded as finite strings of DNA. Like the books in the Library of Babel, the vast majority of these genomes are gibberish: describing creatures that have not and could never live – being physically impossible or grotesque monsters. More accurately, the gibberish genomes describe phenotypes that could never exist as viable organisms. The Library of Mendel must also however contain descriptions of real creatures, plants, bacteria, viruses and animals. Each will be surrounded in ‘Mendel space’ by closely related cousins; ancestors, related species or even future descendents. The genomes of real extinct creatures such as T-Rex and the woolly mammoth exist in the Library of Mendel, as do fantastic but logically possible creations such as Pegasus and the Medusa. Fascinating as well, the Library contains species of humanity yet to evolve a million years in the future.

The Library of Turing

As another example, consider the set of possible Turing Machines. Each Turing Machine is characterised but its unique rule table. Just as Turing proposed for his Universal Turing Machine, we can devise some arbitrary coding scheme by which each rule table can be encoded as a finite set of alphanumeric symbols. Thus each possible Turing Machine can be found within the Library of Babel. The Library of Turing then contains all possible algorithms that can be finitely defined in terms of a Turing Machine (i.e. the computable functions.) The art of computer programming then can be seen as an exploration of the gazillions of bookshelves in the Library of Turing, seeking to find the best computer program for our specific application.

The Library of Turing is a remarkably brittle place where a single character variation in a particular Turing Machine may probably render it useless and unable to perform its computation. In contrast, books in the Library of Babel are extremely tolerant of typographic errors. Recall how our copies of Anna Karenina were all perfectly recognisable and readable even with many typos per page. Natural languages such as English are expressed in coding schemes with high levels of redundancy. Computer programs by contrast are said to be maximally compressed and are intolerant of even single character (byte) changes.

These theoretical libraries are each repositories of encoded information that must be interpreted through some medium or mechanism in order to have some effect – whether physical or mental – in the real world. Minds are able to interpret the natural language text stored in books to form mental concepts – literary experience. Genomes are translated into physical phenotypes by the cellular processes of protein transcription and assembly within cells. Turing Machine rule tables are interpreted by a Universal Turing Machine into actual Turing Machines to implement a specific algorithm.

Computing
Substrate

Genotype

Expression

Phenotype

Library of Babel

Human
Mind

Text

Reading

Literary
Experience

Library of Mendel

Ribosome

DNA

RNA Transcription

Protein

Library of Turing

Universal
Turing Machine

Rule Table

Execution

Algorithm

The libraries of Babel, Mendel and Turing are particular examples of vast search spaces which are devoid of signposts and direction signs to guide us in our exploration. Starting at any shelf in the library, what direction must we travel in to find a specific entry (book, genome or Turing Machine)? Given the hyper-astronomical spaces involved, there is simply not enough time left before the heat death of the universe. How can we find our proverbial needle in a quadrillion quadrillion universes of hay? This is the topic of the next section.

A Fitness Landscape

Each location in our logical search space, such as the Library of Babel, may be characterised by some measure of performance or ‘fitness’. For example, in the Library of Babel we may rate each book with a score depending upon how many errors it contains from the perfect canonical version of Anna Karenina. In biology the ‘fitness’ may be a measure of how well an organism (phenotype) is adapted to survive and reproduce given the current environment. In engineering we may measure fitness in terms of cost, speed, performance, throughput or, more typically, some combination.

If our search space only had two degrees of freedom (rather than millions in real world examples) then we could draw out a two dimensional surface divided into cells; one cell per permutation of our two parameters. For each cell we could then plot vertically the fitness score of each cell. The result in 3-dimensions would be landscape of hills and valleys, peak, troughs and plateaus.

Fitness landscapes can be visualised as ranges of mountains. Each ‘cell’ in the landscape represents a possible candidate design in the search space. At each peak in the landscape, neighbouring cells have lower fitness and all paths therefore go downwards. In the valleys most paths point upwards towards points of higher fitness. A peak may represent the highest fitness in its neighbourhood (a local peak) but not the best possible fitness is represented by a single special peak of greatest global fitness.

For the moment let us assume that our fitness landscape has two important properties (not necessarily true as we shall see later):

Neighbouring cells in the landscape are arranged to have similar fitness values. Each cell is surrounded by a neighbourhood of cells whose fitness varies by some small degree and therefore there is a clear gradient (slope up or down) to the landscape in any direction. Such landscapes are called ‘correlated’ and the degree of smoothness of the correlation can range from almost nothing (a very flat landscape) to a large value (a very jagged and randomised landscape).
The topology of the fitness landscape is static in time. Once defined by the starting conditions (design variables) the hills and valleys of the landscape do not move. From any given starting point, we always have the same opportunities and gradients to explore in searching the landscape.

Some Hill Climbing Strategies

How are we to explore our fitness landscape from any starting point and move to a point of optimal fitness?

As a starting point, note that we have no ‘God’s eye view’ of the fitness landscape – we are bound like ants to its surface. All we can ‘see’ are our nearest neighbours clustered around us. Given an all-encompassing viewpoint, we could easily see the lie of the land, spy out the best path to the highest peak and reach the optimal solution. Given our somewhat myopic ‘ant’s eye view’, are there strategies we can use to find higher points of fitness in the landscape using only the clues provided by our immediate neighbours?

The simplest strategy is called Hill Climbing: move to the cell with the highest fitness among your immediate neighbours – provided it is fitter than my current cell. In other words, move up if you can but never down. If I am fitter than all of my neighbours then stay put. If I repeat this strategy I will migrate inexorably uphill to points of higher and higher fitness until I reach a point where all my neighbours have lower fitness. I have reached a peak in the fitness landscape and can get no higher.

Usually however my peak will be only a local peak in some region of the landscape. This local peak may only have mediocre fitness and be far inferior to other local peaks and certainly poorer than the global peak. Peaks of much higher fitness may exist relatively close to our local peak but remain forever beyond us across an un-crossable valley. Thus simple hill climbing will usually strand us on poorly rated local peaks when we could be finding far better performing positions.

In nature, how do organisms use hill climbing to explore and adapt on fitness landscapes? The physical bodies of living organisms - phenotypes – are determined by their unique set of genes, packaged together into separate chromosomes within each cell. Ignoring species that reproduce sexually, primitive organisms such as bacteria (i.e. the prokaryotes) reproduce by simple cell division – mitosis – giving rise to a new daughter cell genetically identical to the parent.

If this process was perfect, the organism would increase in number exponentially but would remain at the same point in the fitness landscape as its genotype remains unchanged over time. Assuming the underlying fitness landscape remains unchanged (which is not remotely the case normally), there is no driver for adaptation and evolution since all genetic variation has ceased.

However, cell division can give rise to slight variants in genetic makeup due to two factors: random mutation and copying errors. Electromagnetic radiation or chemical factors can occasionally cause damage to the DNA molecules that constitute the cell’s genes. Some mutations affect just a few nucleotide bases – these are single gene mutations – while others affect the structure of one or more chromosomes - these are chromosome structure mutations. In single gene mutations, a random change in one or more nucleotides in a gene will encode for a different amino acid. Mutation can also result in gibberish that does not encode any valid amino acid. Either no protein can then be synthesised by the altered gene or a different protein is assembled. The same effect can arise because the cellular machinery for mitosis may occasionally mis-copy a base nucleotide within a gene, giving the same effect as a random mutation. Chromosome structure mutations result from errors in cell division that cause a section of a chromosome to break off, be duplicated or move onto another chromosome.

Most mutations are deleterious and often lethal to the organism and are not passed on to succeeding generations. Some mutations though may have either a neutral or an advantageous effect upon the fitness of the offspring. For example, a new protein caused by a point mutation may prove a more efficient catalyst (enzyme) for an important biochemical pathway. In such cases we have a viable new organism with a slightly altered genome occupying a new position on the fitness landscape. Mutations with a lower fitness score will normally be outnumbered over time by variants of higher fitness since reproductive success is a primary measure of fitness. Thus, over time, our organism flows over the landscape to points of successively higher fitness until becoming stranded on a local peak.

The rate of mutation is an important factor in determining how well an organism can migrate across the landscape. Too low a mutation rate and the organism will tend to lose advantageous mutations over time and become stranded. Too high a rate of mutation and the organism will be shaken off any peaks it finds – its advantageous mutations becoming swamped by deleterious ones.

Hill climbing is a good start as a search strategy but only really works on very smooth and gently sloping landscapes. Are there better strategies we might use?

Simulated Annealing

A strategy that can be much better than simple hill climbing comes from a technique used in metal-working called annealing. A blacksmith will repeatedly heat a piece of iron, hammer it and then quench it in cold water. Over multiple cycles of heating, hammering and cooling the iron becomes tempered– making it stronger.

How does this work? As the iron is heated its molecules become randomised due to thermal motion. As the iron cools its molecules group themselves into crystals of various shapes and sizes. One each heating cycle the iron will be raised to a slightly cooler temperature where the molecular crystals are disrupted less and less. Over time the crystals settle into more and more optimal configurations where their size and shape allow then to pack together to give the strongest molecular structure. After many cycles the crystals will be larger and have fewer defects so as to give the iron greater strength.

In computer science this technique has been adopted into an algorithm called Simulated Annealing or SA. This algorithm can be used to explore a fitness landscape just like simple hill climbing but has the advantage of being able to escape from local peaks and find much better peaks of fitness (but not necessarily the global peak.) Simulated annealing is still one of the most powerful numerical techniques used for hard optimisation problems. SA is highly applicable in a computer to some forms of hard problem but, as we shall see, is rarely applicable in the real world because of the dreadful penalties and pains one would endure in trying to live one’s life using simulated annealing!

How does simulated annealing work?

Starting from any random point in the fitness landscape we examine the fitness of our immediate neighbours. If we were simple hill climbing we would always move to the neighbour with the highest fitness and then repeat the process. In simulated annealing we can choose the nearest fittest peak or, with a certain probability, make a chaotic jump and choose at random a near neighbour with a lower fitness. Thus while we are in general hill climbing upwards, we are also exploring less fit regions which might uncover new upward paths to even higher peaks. Just as the blacksmith slowly reduces the temperature of the heated iron in annealing, we reduce the probability of choosing a lower cell as the process continues until, eventually, the probability (temperature) is zero.

Essentially the algorithm works by shaking us off any local maximum onto another path that will eventually find an optimal peak. This is just like annealing where the heating shakes the molecules of iron into new configurations where they may find new, better crystalline structures. On each iteration we may choose to step downhill and adopt a poorer solution in the hope that it will lead us to another local peak of better fitness than the previous one. Note that SA does not guarantee to find the global optimal peak and its success very much depends upon the degree of ruggedness of the fitness landscape and the initial ‘temperature’ chosen and its rate of cooling.

A good example of a hard combinatorial problem well-suited to simulated annealing is the Travelling Salesman Problem. A salesman has to visit a fixed number of towns in a day’s work. What is the best sequence of towns to visit where the overall distance travelled is the least? What is the shortest route that covers all towns once only, starting and finishing at the same town?

For even a relatively small number of towns the number of possible routes quickly becomes astronomical. For example, with only 20 towns to visit, the number of combinations is 19 x 18 x ... 2 x 1, written as 19 factorial or 19! We can halve this number because we can drive in either direction so two routes reversed count as the same. Astonishingly 19! /2 is roughly 1016. If we evaluate each and every possible route in turn, even one per millisecond on a computer, then the time taken will exceed 300 thousand years.

Clearly finding the optimal solution by blind search is not an option – even for a modest number of towns. Maybe we could just find a solution that is ‘good enough’ – not the very best but still much shorter than most routes?

Travelling Salesman Problem for only 10 cities

We can use simulated annealing to find excellent solutions without having to build the entire fitness landscape first – equivalent to a blind search of the entire space. Firstly set a start ‘temperature’ – the probability of choosing a less fit path over a fitter one. Now start at any point at random – i.e. pick a possible route at random. Now we can construct our immediate neighbours by swapping pairs of towns in our current route. Evaluate the fitness of each neighbour by adding the distances between towns. Now we select either the route with the shortest path or, with a probability determined by the current temperature, a random neighbour with a lower fitness. We move to the selected new route and continue the process, reducing the temperature a fraction on each iteration. After a number of iterations depending upon the rate of temperature decrease, we settle on our final route when the temperature reaches zero.

For fitness landscapes with the right degree of ruggedness, simulated annealing can work very well. If the temperature is reduced too quickly then the system will settle too fast on a not-so-good solution. If the temperature is reduced too slowly, the system will take huge amounts of time wandering at random, without achieving much improvement.

Simulated annealing can be a superb algorithm for certain hard mathematical optimisation problems but in the real world it has some severe drawbacks. Firstly, it can take an awful long time exploring solutions – usually far too long for useful human decision making. Secondly, in real world scenarios there is usually a tangible and real cost to selecting a worse solution – possibly one that endangers life or economic viability.

The strategy we adopt to search our fitness landscape is radically affected by its topology or ‘ruggedness’ – how steep are the sides of the hills and valleys and how closely related in fitness are neighbouring points? But what exactly do we mean by ‘ruggedness’? What factors determine if the landscape is smooth with shallow peaks or massively random and craggy with deep ravines between close peaks? This is the subject of the next chapter.

Next Chapter