Inverse reinforcement learning with natural language goals
Humans generally use natural language (NL) to communicate task requirements to each other. Ideally, NL should also be usable for communicating goals to autonomous machines (e.g., robots) to minimize friction in task specification. However, understanding and mapping NL goals to sequences of states and actions is challenging. Specifically, existing work along these lines has encountered difficulty in generalizing learned policies to new NL goals and environments. In this paper, we propose a novel adversarial inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function. To improve generalization of the learned policy and reward function, we use a variational goal generator to relabel trajectories and sample diverse goals during training. Our algorithm outperforms multiple baselines by a large margin on a vision-based NL instruction-following dataset (Room-2-Room), demonstrating a promising advance in enabling the use of NL instructions in specifying agent goals.