NeurIPS reinforcement-learning-challenge winners announced
AWS sponsored the challenge and provided resources for participants to prepare and process data and then train, deploy, and test their models.
Competitions are a key part of the annual NeurIPS conference program. This year, 16 competitions were accepted, and a quarter of them focused on facilitating scientific progress in deep reinforcement learning (RL), in which agents learn to maximize some reward through trial-and-error exploration of their environments.
In recent years, RL has led to breakthroughs in gaming, autonomous driving, electric-grid management, and other areas. The Amazon SageMaker RL team was excited to collaborate with AIcrowd in supporting training and evaluations for the Procgen Challenge, which was sponsored by Amazon Web Services.
To win this challenge, competitors had to develop new RL models that maximized sample efficiency and generalization. The Amazon SageMaker RL team open-sourced a starter notebook using AnyScale’s Ray RLlib, a library for implementing RL applications with the Ray distributed-learning framework. This helped participants iterate faster; in fact, with Amazon SageMaker notebook instances, the competitors got results in less than an hour for a few US dollars.
The challenge featured two tracks — generalization and sample efficiency —and comprised three rounds of competition that attracted more than 500 participants on 82 teams. Participants could compete in one or both of the tracks.
Round one thinned the field to 50 teams, and round two identified 10 finalists. In the final two rounds, AIcrowd ran 33,000 models that generated more than 230,000 virtual-CPU and 28.5k GPU hours. During the entire competition, 172,000 models were evaluated using Amazon SageMaker.
The winning teams
Congratulations to the winner of the generalization track, the two-person team of Dipam Chakraborty and Nhat Quang Tran, and the winner of the sample-efficiency track, the two-person team of Adrien Gaidon and Blake Wulfe. Both teams’ solutions were based on modifications to the phasic policy gradient (PPG) algorithm, a new reinforcement learning algorithm that preserves feature sharing between the policy and value function, while otherwise decoupling their training. Both teams used hyperparameter tuning to optimize their approaches.
Dipam and Quang applied several modifications to the original PPG algorithm, which allowed them to achieve the best performance on generalizing RL agents learning from previously unseen environments. More details about their approach can be found in their presentation video from the competition, while AIcrowd hosts their evaluation videos and code.
Adrien and Blake’s modifications of PPG included data augmentation during the auxiliary phase but not during the policy phase. They also experimented with reward normalization and reward shaping. Their approach achieved the best performance on sample efficiency, or using the smallest number of samples to reach a specified reward value. This made their model the fastest to train. Their presentation video is also online, as are their evaluation videos and code.
As sponsor, AWS awarded the top teams $9,000 in cash and $9,000 in AWS credits.
Background on the challenge
The challenge, designed by AICrowd in collaboration with OpenAI, was based on the OpenAI Procgen Benchmark. One of the designers’ goals was a centralized and accessible leaderboard to measure sample efficiency and generalization in RL. More information about the design of the challenge is available online.
The Procgen Benchmark is a suite of 16 procedurally generated gym environments that provide direct measures of how quickly an RL agent learns generalizable skills. Agents were evaluated in procedurally generated instances of each of these environments, which were publicly accessible, and in four secret test environments created for the competition. By aggregating performance across so many diverse environments, we obtained high-quality metrics with which to judge the underlying algorithms.
Since each Procgen environment was generated procedurally, it required agents to generalize to never-before-seen situations. As a result, these environments provided a robust test of an agent's ability to learn in many diverse settings. Moreover, Procgen environments are designed to be lightweight and simple to use. Participants with limited computational resources could easily reproduce baseline results and run new experiments. More details about the design principles and details of individual environments can be found in the paper “Leveraging procedural generation to benchmark reinforcement learning”.
The Amazon SageMaker RL team is grateful for the opportunity to sponsor this challenge. We want to once again congratulate all participants, particularly our winners, and would like to especially thank AIcrowd for its role in supporting the competition.