Our paper "Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits" got accepted by Neural Information Processing Systems (NIPS) 2015, one of the top conferences on machine learning. Please refer to these links for our paper:
http://arxiv.org/abs/1504.06937
http://www.scholat.com/portalPaperInfo.html?paperID=24402&Entry=huasenwu
The contextual bandit problem is an important extension of the classic multi-armed bandit (MAB) problem. Budget and time constraints are important issues in practical systems and efficient algorithms for constrained contextual bandits has not been studied. We proposed a simple algorithm UCB-ALP that can achieve logarithmic regrect except for the boundary cases. The framework of UCB-ALP is presented as follows. To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits.



评论 0