Bayesian online non-stationary detection for robust reinforcement learning

Alex Shmakov; Pankaj Rajak; Yuhao Feng; Wojciech Kowalinski; Fei Wang

Publication

Bayesian online non-stationary detection for robust reinforcement learning

By Alex Shmakov, Pankaj Rajak, Yuhao Feng, Wojciech Kowalinski, Fei Wang

2024

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Reinforcement Learning (RL) has achieved state-of-the-art performance in station-ary environments with effective simulators. However, lifelong and open-world RL applications, such as robotics, stock trading, and recommendation systems, change over time in adversarial ways. Non-stationary environments pose challenges for RL agents due to constant distribution shifts from the training data, leading to deteriorating performance. We propose using a robust Bayesian online detector, which tracks agent performance to detect non-stationarities in the environment. Additionally, we propose a new metric called hindsight approximate reward (HAR) that solely relies on state and action information to detect adversarial changes in the environment, making it well-suited for real-world settings with missing or delayed feedback. We demonstrate that the proposed Bayesian detector, combined with HAR or expected reward as a metric, can detect a range of non-stationary changes in dynamic control tasks more effectively compared to baseline non-stationary tests.

Bayesian online non-stationary detection for robust reinforcement learning

Latest news

Work with us