CHMARL: A multimodal benchmark for cooperative, heterogeneous multi-agent reinforcement learning
We propose a vision-and-language benchmark for cooperative and heterogeneous multi-agent learning. We introduce a benchmark multimodal dataset with tasks involving collaboration between multiple heterogeneous agents in a rich multiroom home environment. We provide an integrated learning framework, multimodal implementation of the state-of-the-art, and consistent evaluation protocol. Our experiments investigate the impact of different modalities on the learning performance. We also introduce a simple message passing method between agents. The results suggest that multi-modality introduces unique challenges for cooperative multi-agent learning and there is significant room for advancing MARL methods in such settings.