ICLR-23

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

A model-based offline RL algorithm that is able to trade-off the uncertainty of the learned dynamics model with that of the value function through Bayesian posterior estimation, achieving state-of-the-art performance on a variety of D4RL benchmark tasks.

Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization