Tech could someday let people even in dry climates
get clean water straight from the atmosphere›››
In a game of chess, the outsider often gauges the quality of the gameplay by the entertainment and surprise factor offered by the players’ choice of moves. A good chess match is a mind-bending dance of strategy and power: a true spectator sport.
The advent of artificial intelligence (AI) — specifically, software chess engines — seems to have painted this vibrant tableau in shades of monotony.
After chess world champion Garry Kasparov’s defeat against IBM’s Deep Blue in the late ‘90s, human chess players have grudgingly accepted that chess engines can beat them. However, they found solace in the fact that chess engines were utilitarian in their style: A classic chess engine may well consider tens of millions of alternative moves per second, but its playing style — especially with a limited lookahead — is boring. Unadorned and calculated. There’s none of the dynamism or creativity. None of the humanness. Developers did add in some randomness to introduce a semblance of unpredictability, but this resulted in predictable sequences of generally good moves interspersed with occasional mistakes.
As the AI meticulously generates all possible move sequences, it assigns an evaluation score to each resulting game state. This score takes into account the relative power balance between the players after a move. For example, if white gains a pawn, the score is +1, but if black gains a knight, the score for white is -3. More sophisticated engines incorporate positional information, such as the location of the pieces on the board and even factors like piece mobility and the safety of the king.
The addition of randomness forces chess engines to choose a move sequence at random from those with similar scores, or even occasionally play a randomly generated move, just to spice things up.
But these chess engines still spit out long sequences of unimaginative good moves, peppered with the odd bad one.
And, as human players know all too well, blind randomization is unlikely to land a player on a winning streak.
The subsequent generation of chess programs, powered by artificial neural networks (ANNs), learn from gameplay examples rather than predetermined formulas. These neural networks encode a binary representation of each position on the board, piece type and player color.
The outputs are the evaluation function values, leading to rapid and unpredictable gameplay as the bulk of computations are conducted during training, not live games. Developers use millions of online chess games played between humans or by humans against machines to set up training data. As the outcome of each game is known, it’s possible to more finely tune the models, selecting the best move for each scenario. Once trained, the ANN can be used by the chess engine to evaluate quickly and efficiently the score for each possible move — ANN-based chess engines become even more positional. And boring.
Google’s AlphaZero changed the landscape. It implements a new ANN-based approach that can be trained to play not just chess, but other board games. Given a game state, the AlphaZero engine computes a policy that maps the game state to the probability distribution of making each of the possible moves. In chess, that’s 4,672 possible moves — for white’s first move. For human players, this is ridiculous: There are 20 legal moves for white’s first move. And that’s the point.
AlphaZero includes all sorts of moves that are illegal, like selecting empty squares, selecting opponent’s pieces, making knight moves for rooks, or making long diagonal moves for pawns. It also includes moves that pass through other blocking pieces.
During training, nothing is learned or imposed about avoiding non-valid moves. The engine just post-processes the ANN output, filtering out illegal or impossible moves by setting their effective probability to zero. Then it re-normalizes the probabilities across the remaining valid moves. A philosopher could argue that this engine is devoid of ethics as it does not distinguish illegal from impossible, but the resulting ANN structure is simpler than one expressing only valid moves. On a modern processer, AlphaZero needs just a few tens of milliseconds to make a move.
This speed enabled AlphaZero to play against itself in millions of games, completing its training with reinforcement learning, which privileges moves that lie on a sequence that led to victory in the past. Of course, to know whether a move is on a winning sequence, one must complete the game, so AlphaZero reinforcement was performed by playing “fast and dumb,” i.e., using a very shallow search depth. Playing dumb in the reinforcement training phase maximizes the number of games that end in victories and defeats rather than in uninformative draws. As a result of this, AlphaZero considers fewer positions than the algorithmic chess engines of the past.
A classic chess engine may well consider tens of millions of alternative moves per second, but its playing style is boring.
AlphaZero and successors like Deep Chess have a distinctive style, steering away from merely seeking positional or material advantage. Their style is alien — best likened to atonal music: difficult to appreciate for anyone but the chess elite, and certainly useless for the average human amateur to learn or improve their game.
It is interesting that we humans still describe an intelligent chess engine’s style in positional terms.
Like the ‘90s Deep Blue victory, today’s post-AlphaZero scenario highlights some general problems that we will have to solve to be able to work together with the super-human AI engines of the future. AI decision-making must be intelligible to humans for us to accept its decisions. This interpretability needs to be wired into the AI training. Plus, interacting with humans is a crucial step for AI engine evolution: Playing against all possible competitors makes them stronger than any human can individually hope to become.
The game of chess, always a metaphor for life, suggests that controlling the evolution of future AI engines may become more akin to taming a tiger than training a pet.
Ernesto Damiani is senior director of the Robotics and Intelligent Systems Institute at Khalifa University.