Evolutionary computing lets anyone discover new laws of nature – but does this mean the end of science as we know it?
IT WAS long after midnight when Michael Schmidt noticed a strangely familiar equation pop up on his monitor. His computer was crunching the data from an experiment to measure the chaotic motion of a double pendulum, a type that has one swinging arm hanging from another.
Schmidt recorded this movement using a motion tracking camera which fed numbers into his computer. What he was looking for was an equation describing the motion of the pendulums.
Initially, the task looked hopeless. When a double pendulum moves chaotically, its arms swing in a way that is almost impossible to predict, with seemingly no pattern whatsoever. For a human, finding an equation for this would be almost impossible. And yet the computer found something. To Schmidt, a PhD student studying computer science at Cornell University in Ithaca, New York, it was a hugely significant moment. “It’s probably the most exciting thing that has happened to me in science,” he says.
That’s because Schmidt’s computer had found one of the immutable laws of nature: the law of conservation of energy, which says you can never add or take energy away from a system. What had taken many scientists hundreds of years to discover took his computer just one day (see diagram).
Schmidt and his supervisor Hod Lipson had hit upon a new way of doing science, no less. Their method bodes well for areas of research thought to be too complicated to follow set rules. It also promises a future in which computers can find the laws of nature faster than we can, leaving humans forever playing catch-up.
The approach is a radical departure from the usual scientific method. Normally, scientists propose a hypothesis to explain an observation. They then devise an experiment to test their hypothesis, throwing it out if the experiment shows it to be wrong. The hypothesis is then revised and tested over and over again in experiments until it is generally accepted to be true.
Schmidt and Lipson’s approach is the very opposite: rather than coming up with a hypothesis to test, they carry out experiments first, feeding the data into their computer to discover the laws of nature (see diagram).
Their success is all down to something called evolutionary computing. This is where robots or computers are given a goal – learning to fly, say – and produce lots of programs that could potentially achieve it. These programs are tested against the goal, and the most promising ones are selected and merged. This process repeats until, after many generations of testing, a program is produced that can complete the set task perfectly. In fact, Lipson is best known for his work as a robotics engineer, and in particular for creating software that can evolve to control weird and wonderful machines, like robotic aircraft, walking robots and the parts for a device that prints food.
Evolutionary computing allows computers to do things that they haven’t been programmed to do and is already being use to solve problems as diverse as creating train timetables to designing aircraft.
The same process is behind Lipson and Schmidt’s law-finding computer. It begins by randomly stringing together simple mathematical expressions to create equations: 10,000 of them to be exact. It then tests each equation to see how well it describes the data – a practice that has been used by researchers since the start of the 19th century, when mathematicians Carl Friedrich Gauss and Adrien-Marie Legendre developed a method for finding which mathematical equation best fits a set of data points. But that is where the similarity with traditional methods ends.
None of Schmidt and Lipson’s equations work particularly well – most are complete nonsense – but by simple chance some fit a little better than others. The software then “breeds” these equations together to produce the next generation, ensuring that the offspring are different to the parents.
The difference could be small, with just one term in the equation being altered – equivalent to a genetic mutation in living creatures. Other offspring are created by combining one half of a “father” equation with another half of a “mother” equation to create one that shares characteristics of both. The software then tries to fit these new equations to the data, before repeating the whole process. For the first few thousand generations, most of the equations produced are useless. But gradually, some emerge that fit the measurements rather well.
Wheat from the chaff
So far, so good. The problem is that while most of these equations look the part, they have nothing to do with the laws of nature. Finding the laws of nature turns out to be extremely difficult.
Yet it is possible. That’s because physical laws have an important property: they are invariant in some way. This means that some aspect of the equation cannot change, whatever is done to it. For example, the total amount of energy in a system will never change due to the law of conservation of energy.
Finding these invariants is hard for both humans and computers. The trouble is that there are an infinite number of equations that fit the data perfectly and are also invariant. Think x=x or 2x=2x and so on. And, of course, the evolutionary algorithm finds plenty of these too. These trivial answers swamp the solutions, making it impossible to find the important ones. “It’s like arguing with a teenager,” says Lipson. “It just keeps coming back with things that are irrelevant.”
Schmidt and Lipson knew they needed a way to separate the wheat from the chaff. “But the longer I worked at it, the harder it seemed,” says Schmidt.
It took six months, but Schmidt and Lipson hit upon a solution. To get an idea of how it works, imagine you find a number of equations that seem to relate the height of the pendulum’s midpoint and its horizontal position. While a trivial answer would simply relate these two variables, a physical law of motion describes a much deeper connection between them. This means that it can be used to predict, for example, how the height and position change with time. So Schmidt and Lipson decided to use this to test their equations: not only must an equation match the data, in the pendulum’s case it must also describe how the pendulum changes in time.
The pair first tested this idea using a simple experiment, and immediately hit pay dirt. They collected data from a simple harmonic oscillator consisting of a mass slung between two springs. After a few hours of crunching this data, the computer spat out an equation that describes the motion of the mass in terms of its acceleration, velocity and position, and two other equations that Lipson and Schmidt verified describes the total energy in the system.
They went on to try a more complex system of springs and masses, then a simple pendulum, before finally taking on the most complex system of them all: the chaotic double pendulum. Schmidt and Lipson published their work in the journal Science in 2009 ( vol 324, p 81).
The reaction was immediate. The pair were inundated with calls from other researchers wanting to put their own data through the evolutionary algorithm. “We got requests from all kinds of people working in different fields,” says Lipson.
Unable to cope with the demand, they decided to automate the system and make it available for free online. It wasn’t easy, though. Whenever you start an evolutionary algorithm working, there are all kinds of parameters that need to be fine-tuned by a human. Without this fine tuning, the process can take 10 times longer than it should.
So Schmidt tweaked the algorithm and is now confident that it is well set-up for anyone who wants to use it. “That’s a pretty big deal in the world of evolutionary algorithms,” he says. “It could turn anyone into a Newton or an Einstein.”
Today, the algorithm is called Eureqa and has thousands of users all over the world, with people using it for everything from financial forecasting to particle physics. One person even uses it to analyse the statistics of Australian rules football games.
Perhaps the most interesting example of Eureqa’s use is on data gathered by Gürol Süel, a biologist at the University of Texas Southwestern Medical Centre in Dallas. Süel studies a harmless rod-shaped bacteria called Bacillus subtilis, commonly found in soil. “It’s often studied because it is related to anthrax,” he says.
B. subtilis is interesting because it can change into a spore with a relatively tough shell when it is exposed to harsh conditions. Süel’s interest is in unpicking the network of genes that govern this transformation, known as differentiation. “We want to know how cells make these decisions,” he says.
To find out, he studied which genes are active at any one time by attaching different fluorescent markers to the proteins they produce. Measuring the intensity of the fluorescence tells him how much protein is being made by each gene. Over time, Süel has built up a remarkable database showing which factors switch different genes on and off – the role that nutrients play, for example, and how some genes control others. It turns out there is a huge network of effects at work.
Making sense of all this data is extremely difficult. On the molecular level, Süel says there is a probabilistic interaction between biomolecules that controls what is going on. This is analogous to the way that molecules hit each other in a gas. It is impossible to determine which molecules are moving at what speed and how they interact with each other; the system is just too complex. Instead, you have to think in terms of probabilities, the chances that molecules will bump into each other.
Süel wondered whether there was a deeper underlying law for the control of cell differentiation, however, and noted that there are well-defined laws that perfectly describe the behaviour of gas molecules en masse. These are the laws of thermodynamics and they form one of the cornerstones of modern science.
Based on this, he puzzled over his measurements and came up with a mathematical formula that more or less described the data. However, it was a complex beast with about 16 variables corresponding to things like gene expression rates and the rate at which various different proteins degraded.
Such a complex expression is obviously hard to work with. So Süel turned to Eureqa to see what the evolutionary algorithm would make of his data. Eureqa also spat out an equation that described the data, but this one was much simpler, containing only seven variables.
Süel took a closer look and noticed that it wasn’t so different after all. In fact, the only difference was that some of the terms had been lumped together. Clearly, there were variables that he thought were important but turned out to have little influence on proceedings. “That in itself was a useful insight,” he says.
Beyond our grasp
Eureqa had another surprise up its sleeve. It is one thing to find an equation that seems to describe your data but quite another to find a natural law that has much broader predictive power. Yet that is exactly what Eureqa came up with – a biological law of invariance that is equivalent to a conservation law in physics.
Many philosophers of science question whether something as complex as biology really can be reduced to laws. Süel’s work suggests that it can, but there is a catch. “This equation is symmetric, it’s beautiful, it has to be true,” says Lipson. “But we don’t know what it means.”
Süel, who has been puzzling over the equation for months, thinks it may be related to a feedback mechanism in the cell. “We have an idea but we don’t know for sure,” he says.
That’s exciting but also worrying. What if we never find out what Süel’s new law describes? “Maybe we will just have to call it ‘smell’, or something like that, and accept it for what it is without ever really understanding it,” suggests Lipson.
An important part of science is the gaining of insight – the feeling of knowing something, says Steven Strogatz, an applied mathematician also at Cornell who was not involved in Lipson or Süel’s work. “We’ve had a period of a few hundred years or so when humans could understand things in a very satisfying way,” he says. “That period may be coming to an end.”
Much of physics is about systems that have only a handful of variables, and even then many people struggle to understand the equations that describe them. When it comes to biology, economics and climate science, however, most of the systems we wish to understand have many more variables, says Strogatz. Cells may have millions, making them very complex indeed.
It may be that we are unable to understand systems as complex as these on an intuitive level, he says. And perhaps this exposes a very human type of arrogance – that we feel we are entitled to know how the universe works. Strogatz questions this: “Why should we expect to understand everything?”
That leads to a somewhat depressing picture for the future of science. Lipson speculates that his algorithm will allow laws of nature to be extracted from data at rates that are currently unheard of, and that this kind of machine learning will become the norm. It means that we humans will forever be playing catch-up, never quite understanding what the machines are telling us.
This has more than a passing resemblance to the singularity, the long-fabled point at which machines start learning faster than humans can keep up. “This is a post-singularity vision of science,” says Lipson.
Süel is more optimistic about the role of humans. He points out that you need to have some understanding of a system to know what questions to ask of it. Only then can you generate data that will help answer new questions. Without the years he spent studying B. subtilis, he wouldn’t have been able to design a suitable experiment. But he is also the first to admit that Eureqa is important. “It’s going to change the way we do biology,” he says. “And scientists are going to be at the heart of this change.”
Justin Mullins is a consultant for New Scientist