Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more
DeepMind, the AI lab backed by Google parent company Alphabet, has long invested in game-playing AI systems. It’s the lab’s philosophy that games, while lacking an obvious commercial application, are uniquely relevant challenges of cognitive and reasoning capabilities. This makes them useful benchmarks of AI progress. In recent decades, games have given rise to the kind of self-learning AI that powers computer vision, self-driving cars, and natural language processing.
In a continuation of its work, DeepMind has created a system called Player of Games, which the company first revealed in a research paper published on the preprint server Arxiv.org this week. Unlike the other game-playing systems DeepMind developed previously, like the chess-winning AlphaZero and StarCraft II-besting AlphaStar, Player of Games can perform well at both imperfect information games (e.g., the Chinese board game Go and chess) as well as imperfect information games (e.g., poker).
Tasks like route planning around congestion, contract negotiations, and even interacting with customers all involve compromise and consideration of how people’s preferences coincide and conflict, as in games. Even when AI systems are self-interested, they might stand to gain by coordinating, cooperating, and interacting among groups of people or organizations. Systems like Player of Games, then, which can reason about others’ goals and motivations, could pave the way for AI that can successfully work with others — including handling questions that arise around maintaining trust.
Imperfect versus perfect
Games of imperfect information have information that’s hidden from players during the game. By contrast, perfect information games show all information at the start.
The 2nd Annual GamesBeat and Facebook Gaming Summit and GamesBeat: Into the Metaverse 2
Perfect information games require a decent amount of forethought and planning to play well. Players have to process what they see on the board and determine what their opponents are likely to do while working toward the ultimate goal of winning. On the other hand, imperfect information games require taking into account the information that players have to try to figure out how they should act next in order to win — including potentially bluffing or teaming up against an opponent.
Systems like AlphaZero excel at perfect information games like chess, while algorithms like DeepStack and Libratus perform remarkably well at imperfect information games like poker. But DeepMind claims that Player of Games is the first “general and sound search algorithm” to achieve strong performance across both perfect and imperfect information games.
“[Player of Games] learns to play [games] from scratch, simply by repeatedly playing the game in self-play,” DeepMind senior research scientist Martin Schmid, one of the co-creators of Player of Games, told VentureBeat via email. “This is a step towards generality — Player of Games is able to play both perfect and imperfect information games, while trading away some strength in performance. AlphaZero is stronger than Player of Games in perfect information games, but [it’s] not designed for imperfect information games.”
While Player of Games is extremely generalizable, it can’t play just any game. Schmid says that the system needs to think about all the possible perspectives of each player given an in-game situation. While there’s only a single perspective in perfect information games, there can be many such perspectives in imperfect information games — for example, around 2,000 for poker. Moreover, unlike MuZero, DeepMind’s successor to AlphaZero, Player of Games also needs knowledge of the rules of the game it’s playing. MuZero can pick up the rules of perfect information games on the fly.
In its research, DeepMind evaluated Player of Games — trained using Google’s TPUv4 accelerator chipsets — on chess, Go, Texas Hold’Em, and the strategy board game Scotland Yard. For Go, it set up a 200-game tournament between AlphaZero and Player of Games, while for chess, DeepMind pitted Player of Games against top-performing systems including GnuGo, Pachi, and Stockfish as well as AlphaZero. Player of Games’ Texas Hold’Em match was played with the openly-available Slumbot, and the algorithm played Scotland Yard against a bot developed by Joseph Antonius Maria Nijssen — which the DeepMind coauthors nicknamed “PimBot.”
In chess and Go, Player of Games proved to be stronger than Stockfish and Pachi in certain — but not all — configurations, and it won 0.5% of its games against the strongest AlphaZero agent. Despite the steep losses against AlphaZero, DeepMind believes that Player of Games was performing at the level of “a top human amateur,” and possibly even at the professional level.
Player of Games was a better poker and Scotland Yard player. Against Slumbot, the algorithm won on average by 7 milli big blinds per hand (mbb/hand), where a mbb/hand is the average number of big blinds won per 1,000 hands. (A big blind is equal to the minimum bet.) Meanwhile, in Scotland Yard, DeepMind reports that Player of Games won “significantly” against PimBot, even when PimBot was given more opportunities to search for the winning moves.
Schmid believes that Player of Games is a big step toward truly general game-playing systems — but far from the last one. The general trend in the experiments was that the algorithm performed better given more computational resources (Player of Games trained on a dataset of 17 million “steps,” or actions, for Scotland Yard alone) , and Schmid expects this approach will scale in the foreseeable future.
“[O]ne would expect that the applications that benefited from AlphaZero might also benefit from Player of Games,” Schmid said. “Making these algorithms even more general is exciting research.”
Of course, approaches that favor massive amounts of compute put organizations with fewer resources, like startups and academic institutions, at a disadvantage. This has become especially true in the language domain, where massive models like OpenAI’s GPT-3 have achieved leading performance but at resource requirements — often millions of dollars — far exceeding the budgets of most research groups.
Costs sometimes rise above what’s considered acceptable even at a deep-pocketed firm like DeepMind. For AlphaStar, the company’s researchers purposefully didn’t try multiple ways of architecting a key component because the training cost would have been too high in executives’ minds. DeepMind notched its first profit only last year, when it raked in £826 million ($1.13 billion) in revenue. The year prior, DeepMind recorded losses of $572 million and took on a billion-dollar debt.
It’s estimated that AlphaZero cost tens of millions of dollars to train. DeepMind didn’t disclose the research budget for Player of Games, but it isn’t likely to be low considering the number of training steps for each game ranged from the hundreds of thousands to millions.
As the research eventually transitions from games to other, more commercial domains, like app recommendations, datacenter cooling optimization, weather forecasting, materials modeling, mathematics, health care, and atomic energy computation, the effects of the inequity are likely to become starker. “[A]n interesting question is whether this level of play is achievable with less computational resources,” Schmid and his fellow coauthors ponder — but leave unanswered — in the paper.