T3GDT: Three-tier tokens to Guide Decision Transformer for offline meta reinforcement learning

Zhe Wang; Haozhu Wang; Yanjun (Jane) Qi

Publication

T³GDT: Three-tier tokens to Guide Decision Transformer for offline meta reinforcement learning

By Zhe Wang, Haozhu Wang, Yanjun (Jane) Qi

2023

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Offline meta-reinforcement learning (OMRL) aims to generalize an agent’s knowledge from training tasks with offline data to a new unknown RL task with few demonstration trajectories. This paper proposes T³GDT: Three-tier tokens to Guide Decision Transformer for OMRL. First, our approach learns a global token from its demonstrations to summarize a RL task’s transition dynamic and reward pattern. This global token specifies the task identity and prepends as the first token for prompting this task’s RL roll-out. Second, for each time step t, we learn adaptive tokens retrieved from top-relevant experiences in the demonstration. These tokens are fused to improve action prediction at timestep t. Third, we replace lookup table-based time embedding with Time2Vec embedding that combines time neighboring relationships into better time representation for RL. Empirically, we compare T³GDT with prompt decision transformer variants and MACAW across five different RL environments from both MUJOCO control and METAWORLD benchmarks.

T³GDT: Three-tier tokens to Guide Decision Transformer for offline meta reinforcement learning

Latest news

Work with us