Learning in game-theoretical models
Amazon Research Award recipient Éva Tardos studies complex theoretical questions that have far-ranging practical consequences.
Game theory is a mathematical way to describe strategic reasoning. In the game-theoretical sense, a game involves players who choose actions and, depending on their choices and those of the other players, receive different levels of reward. Since 2005, the Nobel Prize in economics has gone to game-theoretical work four times.
Éva Tardos, the Jacob Gould Schurman Professor of Computer Science at Cornell University, is the winner of both the Association for Computer Machinery’s Gödel Prize and the Institute of Electrical and Electronics Engineers’ John von Neumann medal (named for the man who created game theory, among other accomplishments). Her research focuses on algorithmic game theory, or the application of game theory to algorithm design.
In 2018, Tardos received an Amazon Research Award to pursue the topic of learning in games: Over repeated iterations of the same game, can the players learn the strategies that will maximize their rewards? And can the game be structured so that the individual players’ reward maximization strategies also maximize the common good?
“The question I’m most fascinated by has three prongs,” Tardos says. “One is, ‘What can we say about the quality of outcomes if people learn?’ Another is, ‘What does it mean to learn?’ When I look at what users did, what conditions of learning do people actually satisfy?
“And third — and maybe that's in some ways the most actionable — is, ‘What is the right form of learning in a changing environment?’ If you’re Amazon, and you want to learn how to price your products, what is your stockpile? How many books do you have? If you're selling them, you're going to have less. There is some carryover effect over time. What does that tell you? What is the right format of learning in cases when there's a changing environment, and there's a carryover effect? And then of course, do people learn in that way?”
Concepts of learning
As an example of a game, consider a penalty kick in soccer, in which the kicker shoots at either the right or left half of the goal, and the goalie guesses which way to dive. In the simplest game-theoretical model of the game, if both goalie and kicker pick the same direction, the goalie wins; if they pick different directions, the kicker wins.
On this model, if both players are trying to maximize their chances of winning, their optimal strategy is to go left or right randomly, with equal probability in either direction. If one player deviates from that strategy, the other player has an opportunity to increase his or her winning percentage.
A set of strategies that no player in a game has an incentive to change unilaterally is called a Nash equilibrium. The penalty-kick game is a zero-sum game: if one player wins, the other loses. But many real-world scenarios — for instance, choosing driving routes during rush hour — can be modeled as non-zero-sum games, and they have Nash equilibria, too.
One early hypothesis about learning in game theory was that, over repeated iterations of a game, players converge toward the Nash equilibrium. But more recent research suggests that that’s unlikely, as Nash equilibria for complex games are intractably hard to compute.
Your learning should be good enough to observe that that's better than what you are doing. And that's called no-regret learning.
In many circumstances, Tardos explains, game theorists have settled for a more relaxed standard of learning, called “no-regret learning”, which has the advantage of being algorithmically achievable.
“If there is a single strategy that would have been consistently pretty good over time, then please do at least as well as that one,” Tardos says. “If there is a route on which every day you would get to work pretty damn fast, you don't have to drive that, but if you're doing worse than that, something went wrong. Your learning should be good enough to observe that that's better than what you are doing. And that's called no-regret learning.”
Much of Tardos’s recent work on learning in game theory has focused on games with carryover effects. What are the best learning algorithms for such games? Under what circumstances can learning occur? And how do the learned strategies compare to the optimal distribution of strategies?
Tardos has investigated these questions in the contexts of two applications in particular: ad auctions, in which advertisers bid for ad space on websites, and packet-switched network routing, of the type we see on the Internet.
In the case of ad auctions, the carryover effect is that a successful bid for an ad reduces the budget that the ad buyer has for additional purchases. Tardos and her colleagues have analyzed real-world data and concluded that, in ad auctions, no-regret learning can occur, but only for ad buyers with adequate resources. Otherwise, budget limitations prevent them from exploring the space of options thoroughly enough to identify good strategies.
In the case of packet-switched routing, the carryover effect is that unsuccessful packet transmission causes senders to resend packets, which increases network congestion. Tardos and her colleagues showed that learning can ensure efficient system performance, but only if each router in the network can handle enough incoming packets simultaneously.
Here, however, Tardos and her colleagues’ analysis was theoretical, so they could compare players’ learned strategies to the optimal strategy if some omniscient planner allocated network bandwidth according senders’ transmission needs. They found that, if senders are simply trying to learn strategies that maximize their own network throughputs, then in order to ensure that everyone’s packets get through, the router capacity needs to be about twice what it would be in the optimal case.
In a follow-up study, however, Tardos and one of her students showed that a better learning algorithm could nudge the players’ learned strategies closer to the optimum. If the players are patient enough — if they adhere to a given transmission strategy long enough to get a reliable signal of its long-term efficacy — then learning will lead to efficient routing with only about 1.6 times the optimal router capacity.
These are preliminary results, but they demonstrate a methodology for making progress on a set of very difficult, interrelated problems. In ongoing work, Tardos is generalizing the same techniques of analysis to the relationships between product pricing and inventory management, in which the carryover effect is the amount of inventory on hand, depending on sales rates at different price points. And that’s a problem of obvious interest to Amazon.
“There are questions that we’re not answering that would be lovely to answer,” Tardos says. “And these are all ongoing projects. So maybe we will answer them eventually.”