Off-Policy Evaluation

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. …

Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno

Counterfactual Evaluation and Learning for Interactive Systems

Counterfactual estimators enable the use of existing log data to estimate how some new target recommendation policy would have …

Yuta Saito, Thorsten Joachims

Off-Policy Evaluation for Large Action Spaces via Embeddings

Off-policy evaluation (OPE) in contextual bandits has seen rapid adoption in real-world systems, since it enables offline evaluation of …

Yuta Saito, Thorsten Joachims

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is …

Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto

Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. …

Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita

Counterfactual Learning and Evaluation for Recommender Systems

Counterfactual estimators enable the use of existing log data to estimate how some new target recommendation policy would have …

Yuta Saito, Thorsten Joachims

Evaluating the Robustness of Off-Policy Evaluation

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only …

Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, Kei Tateno

Optimal Off-Policy Evaluation from Multiple Logging Policies

We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i.e., stratified …

Nathan Kallus, Yuta Saito, Masatoshi Uehara

Doubly Robust Estimator for Ranking Metrics with Post-Click Conversions

Post-click conversion, a pre-defined action on a web service after a click, is an essential form of feedback, as it directly …

Yuta Saito

Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service

Off-policy evaluation (OPE) is the method that attempts to estimate the performance of decision making policies using historical data …

Yuta Saito, Takuma Udagawa, Kei Tateno