齋藤優太
齋藤優太
ホーム
論文・出版
ブログ
連絡先
履歴書
英語
Light
Dark
Automatic
日本語
English
Off-Policy Evaluation
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the …
Tatsuhiro Shimizu
,
Koichi Tanaka
,
Ren Kishimoto
,
Haruka Kiyohara
,
Masahiro Nomura
,
Yuta Saito
引用
Long-term Off-Policy Evaluation and Learning
Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait …
Yuta Saito
,
Himan Abdollahpouri
,
Jesse Anderton
,
Ben Carterette
,
Mounia Lalmas
引用
コード
arXiv
Proceedings
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
We study off-policy evaluation (OPE) in slate contextual bandits where a policy selects multi-dimensional actions known as slates. This …
Haruka Kiyohara
,
Masahiro Nomura
,
Yuta Saito
引用
コード
arXiv
Proceedings
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often …
Haruka Kiyohara
,
Ren Kishimoto
,
Kosuke Kawakami
,
Ken Kobayashi
,
Kazuhide Nakata
,
Yuta Saito
引用
コード
arXiv
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), …
Haruka Kiyohara
,
Tatsuya Matsuhiro
,
Yusuke Narita
,
Nobuyuki Shimizu
,
Yasuo Yamamoto
,
Yuta Saito
引用
コード
ポスター
スライド
arXiv
Proceedings
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional …
Yuta Saito
,
Qingyang Ren
,
Thorsten Joachims
引用
コード
ポスター
スライド
arXiv
Proceedings
Policy-Adaptive Estimator Selection for Off-Policy Evaluation
Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. …
Takuma Udagawa
,
Haruka Kiyohara
,
Yusuke Narita
,
Yuta Saito
,
Kei Tateno
引用
コード
スライド
arXiv
オフ方策評価の基礎と動向
齋藤優太
学会誌
Counterfactual Evaluation and Learning for Interactive Systems
Counterfactual estimators enable the use of existing log data to estimate how some new target recommendation policy would have …
Yuta Saito
,
Thorsten Joachims
引用
コード
会議録
Website
Off-Policy Evaluation for Large Action Spaces via Embeddings
Off-policy evaluation (OPE) in contextual bandits has seen rapid adoption in real-world systems, since it enables offline evaluation of …
Yuta Saito
,
Thorsten Joachims
引用
コード
動画
スライド
arXiv
Proceedings
»
引用
×