Contextual deep reinforcement learning with adaptive value-based clustering
2025
Applications of reinforcement learning (RL) in real-world scenarios are often limited by its generalizability across multiple different environments. Contextual RL offers a principled solution to this issue by capturing environmental heterogeneity through observable contextual variables. However, directly applying Contextual RL may not achieve optimal results when contexts exhibit high randomness and variance, and model complexity is constrained by computational resources. In this paper, we introduce a novel approach that automatically clusters contextual environments and learns customized policies for each cluster. Our algorithm leverages embedded contexts derived from the hidden layers of the value function of a pretrained RL agent, ensuring that environments within each cluster share similar transition kernels and reward functions. This general meta-framework can be applied with any RL algorithm with value functions. Empirical results from our simulations demonstrate that the composite policy, formed by aggregating contextual RL policies from each cluster, significantly outperforms a single baseline policy trained on all contexts.
Research areas