首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:A linear response bandit problem
  • 本地全文:下载
  • 作者:Alexander Goldenshluger ; Assaf Zeevi
  • 期刊名称:Stochastic Systems
  • 印刷版ISSN:1946-5238
  • 出版年度:2013
  • 卷号:3
  • 期号:1
  • 页码:230-261
  • 出版社:Institute for Operations Research and the Management Sciences (INFORMS), Applied Probability Society
  • 摘要:We consider a two–armed bandit problem which involves sequen-tial sampling from two non-homogeneous populations. The responsein each is determined by a random covariate vector and a vectorof parameters whose values are not known a priori. The goal is tomaximize cumulative expected reward. We study this problem in aminimax setting, and develop rate-optimal polices that combine my-opic action based on least squares estimates with a suitable "forcedsampling" strategy. It is shown that the regret grows logarithmicallyin the time horizon n and no policy can achieve a slower growthrate over all feasible problem instances. In this setting of linear re-sponse bandits, the identity of the sub-optimal action changes withthe values of the covariate vector, and the optimal policy is sub-ject to sampling from the inferior population at a rate that growslike √n
国家哲学社会科学文献中心版权所有