The weekly SILO Seminar Series is made possible through the generous support of the 3M Company and its Advanced Technology Group


with additional support from the Analytics Group of the Northwestern Mutual Life Insurance Company

Northwestern Mutual

Towards a better understanding of best arm identification in bounded multi-armed bandits

Ervin Tanczos, Postdoc - UW–Madison

Date and Time: Feb 01, 2017 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building



We present ongoing work regarding best arm identification in multi-armed bandit problems, when the reward distributions are bounded. Although this is a standard assumption in this context, state of the art methods (such as the lil-UCB algorithm) use sub-Gaussian concentration bounds for the mean rewards.
However, one can take advantage of the boundedness of the rewards to derive stronger concentration inequalities for the empirical means that involve the Kullback-Leibler divergence of certain Bernoulli distributions. Although such methods exist in the literature, they exhibit sub-optimal performance in other parameters of the problem (such as the number of arms). Our goal is to combine ideas of the lil-UCB algorithm and the KL concentration bounds to obtain methods with improved performance.
This is joint work with Robert Nowak and Kevin Jamieson.