1

Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits

We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the …

Tatsuhiro Shimizu, Koichi Tanaka, Ren Kishimoto, Haruka Kiyohara, Masahiro Nomura, Yuta Saito

Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It

There has been a growing interest in off-policy evaluation in the literature such as recommender systems and personalized medicine. We …

Yuta Saito, Masahiro Nomura

Long-term Off-Policy Evaluation and Learning

Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait …

Yuta Saito, Himan Abdollahpouri, Jesse Anderton, Ben Carterette, Mounia Lalmas

Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction

We study off-policy evaluation (OPE) in slate contextual bandits where a policy selects multi-dimensional actions known as slates. This …

Haruka Kiyohara, Masahiro Nomura, Yuta Saito

Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems

Typical recommendation and ranking methods aim to optimize the satisfaction of users, but they are often oblivious to their impact on …

Riku Togashi, Kenshi Abe, Yuta Saito

Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often …

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), …

Haruka Kiyohara, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional …

Yuta Saito, Qingyang Ren, Thorsten Joachims

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. …

Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno

Fair Ranking as Fair Division: Impact-Based Individual Fairness in Ranking

Rankings have become the primary interface of many two-sided markets. Many have noted that the rankings not only affect the …

Yuta Saito, Thorsten Joachims