DeepMind compares AlphaDev’s discovery to one of AlphaGo’s weird but winning moves in its Go match against grandmaster Lee Sedol in 2016. “All the experts looked at this move and said ‘This isn’t the right thing to do, this is a poor move,’” says Mankowitz. “But actually it was the right move and AlphaGo ended up not just winning the game but also influencing the strategies that professional Go players started using.”
Sanders is impressed, but does not think the results should be oversold. “I agree that machine learning techniques are increasingly a game changer in programming and everybody is expecting that AIs will soon be able to invent new better algorithms,” he says. “But we are not quite there yet.”
For one thing, Sanders points out that AlphaDev only uses a subset of the instructions available in assembly. Many existing sorting algorithms use instructions that AlphaDev did not try, he says. Without using those instructions it is harder to compare AlphaDev to the best rival approaches.
It’s true that AlphaDev has its limits. The longest algorithm that AlphaDev produced was 130 instructions long, for sorting a list of up to five items. At each step, AlphaDev picked from 297 possible assembly instructions (out of many more). “Beyond 297 instructions and assembly games of more than 130 instructions long, learning became slow,” says Mankowitz.
That’s because even with 297 instructions (or games moves) the number of possible algorithms AlphaDev could construct is larger than the possible number of games in chess (10^120) and the number of atoms in the universe (around 10^80).
For longer algorithms, the team plans to adapt AlphaDev to work with C++ instructions instead of assembly. With less fine-grained control AlphaDev might miss certain shortcuts, but it would make the approach applicable to a wider range of algorithms.
Sanders would also like to see a more exhaustive comparison with the best human-devised approaches, especially for longer algorithms. DeepMind says that’s part of its plan. Mankowitz wants to combine AlphaDev with the best human-devised methods, getting AlphaDev to build on human intuition rather than starting from scratch.
After all, there may be more speed ups to be found. “For a human to do this, it requires significant expertise and a huge amount of hours—maybe days, maybe weeks—to look through these programs and identify improvements,” says Mankowitz. “As a result, it hasn’t been attempted before.”