齋藤優太
齋藤優太
ホーム
論文・出版
連絡先
履歴書
英語
Light
Dark
Automatic
日本語
English
Off-Policy Evaluation
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), …
Tatsuhiro Shimizu
,
Kazuki Kawamura
,
Takanori Muroi
,
Yusuke Narita
,
Kei Tateno
,
Takuma Udagawa
,
Yuta Saito
引用
OpenReview
A General Framework for Off-Policy Learning with Partially-Observed Reward
Off-policy learning (OPL) in contextual bandits aims to learn a decision-making policy that maximizes the target rewards by using only …
Rikiya Takehi
,
Masahiro Asami
,
Kosuke Kawakami
,
Yuta Saito
引用
OpenReview
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits
Off-Policy Evaluation and Learning (OPE/L) in contextual bandits is rapidly gaining popularity in real systems because new policies can …
Yuta Natsubori
,
Masataka Ushiku
,
Yuta Saito
引用
OpenReview
POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition
We study off-policy learning (OPL) of contextual bandit policies in large discrete action spaces where existing methods – most of …
Yuta Saito
,
Jihan Yao
,
Thorsten Joachims
引用
OpenReview
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the …
Tatsuhiro Shimizu
,
Koichi Tanaka
,
Ren Kishimoto
,
Haruka Kiyohara
,
Masahiro Nomura
,
Yuta Saito
引用
Long-term Off-Policy Evaluation and Learning
Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait …
Yuta Saito
,
Himan Abdollahpouri
,
Jesse Anderton
,
Ben Carterette
,
Mounia Lalmas
引用
コード
arXiv
Proceedings
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
We study off-policy evaluation (OPE) in slate contextual bandits where a policy selects multi-dimensional actions known as slates. This …
Haruka Kiyohara
,
Masahiro Nomura
,
Yuta Saito
引用
コード
arXiv
Proceedings
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often …
Haruka Kiyohara
,
Ren Kishimoto
,
Kosuke Kawakami
,
Ken Kobayashi
,
Kazuhide Nakata
,
Yuta Saito
引用
コード
arXiv
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), …
Haruka Kiyohara
,
Tatsuya Matsuhiro
,
Yusuke Narita
,
Nobuyuki Shimizu
,
Yasuo Yamamoto
,
Yuta Saito
引用
コード
ポスター
スライド
arXiv
Proceedings
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional …
Yuta Saito
,
Qingyang Ren
,
Thorsten Joachims
引用
コード
ポスター
スライド
arXiv
Proceedings
»
引用
×