Benchmark

Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. …

Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita

Evaluating the Robustness of Off-Policy Evaluation

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only …

Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, Kei Tateno