Yuta Saito
Yuta Saito
Home
Publications
Contact
CV
Light
Dark
Automatic
English
日本語
Off-Policy Evaluation
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the …
Tatsuhiro Shimizu
,
Koichi Tanaka
,
Ren Kishimoto
,
Haruka Kiyohara
,
Masahiro Nomura
,
Yuta Saito
Cite
Long-term Off-Policy Evaluation and Learning
Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait …
Yuta Saito
,
Himan Abdollahpouri
,
Jesse Anderton
,
Ben Carterette
,
Mounia Lalmas
Cite
Code
arXiv
Proceedings
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
We study off-policy evaluation (OPE) in slate contextual bandits where a policy selects multi-dimensional actions known as slates. This …
Haruka Kiyohara
,
Masahiro Nomura
,
Yuta Saito
Cite
Code
arXiv
Proceedings
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often …
Haruka Kiyohara
,
Ren Kishimoto
,
Kosuke Kawakami
,
Ken Kobayashi
,
Kazuhide Nakata
,
Yuta Saito
Cite
Code
arXiv
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), …
Haruka Kiyohara
,
Tatsuya Matsuhiro
,
Yusuke Narita
,
Nobuyuki Shimizu
,
Yasuo Yamamoto
,
Yuta Saito
Cite
Code
Poster
Slides
arXiv
Proceedings
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional …
Yuta Saito
,
Qingyang Ren
,
Thorsten Joachims
Cite
Code
Poster
arXiv
Proceedings
Policy-Adaptive Estimator Selection for Off-Policy Evaluation
Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. …
Takuma Udagawa
,
Haruka Kiyohara
,
Yusuke Narita
,
Yuta Saito
,
Kei Tateno
Cite
Code
Slides
arXiv
Counterfactual Evaluation and Learning for Interactive Systems
Counterfactual estimators enable the use of existing log data to estimate how some new target recommendation policy would have …
Yuta Saito
,
Thorsten Joachims
Cite
Code
Proceedings
Website
Off-Policy Evaluation for Large Action Spaces via Embeddings
Off-policy evaluation (OPE) in contextual bandits has seen rapid adoption in real-world systems, since it enables offline evaluation of …
Yuta Saito
,
Thorsten Joachims
Cite
Code
Video
Slides
arXiv
Proceedings
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model
In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is …
Haruka Kiyohara
,
Yuta Saito
,
Tatsuya Matsuhiro
,
Yusuke Narita
,
Nobuyuki Shimizu
,
Yasuo Yamamoto
Cite
Code
arXiv
Proceedings
»
Cite
×