T3GDT: Three-tier tokens to Guide Decision Transformer for offline meta reinforcement learning
2023
Offline meta-reinforcement learning (OMRL) aims to generalize an agent’s knowledge from training tasks with offline data to a new unknown RL task with few demonstration trajectories. This paper proposes T3GDT: Three-tier tokens to Guide Decision Transformer for OMRL. First, our approach learns a global token from its demonstrations to summarize a RL task’s transition dynamic and reward pattern. This global token specifies the task identity and prepends as the first token for prompting this task’s RL roll-out. Second, for each time step t, we learn adaptive tokens retrieved from top-relevant experiences in the demonstration. These tokens are fused to improve action prediction at timestep t. Third, we replace lookup table-based time embedding with Time2Vec embedding that combines time neighboring relationships into better time representation for RL. Empirically, we compare T3GDT with prompt decision transformer variants and MACAW across five different RL environments from both MUJOCO control and METAWORLD benchmarks.
Research areas