Confirmed invited speakers | Exploration and Exploitation 3

(order is the same than in the schedule)

Leon Bottou ,Jonas Peters, D. Max Chickering, Patrice Simard.

Title: Causal reasoning and Learning Systems.

Abstract: Using the ad placement problem as an example, we describe an
intricate link between causation and the interpretation of
experimental data in learning systems. Following the lead gives
(a) a “non-regret” perspective on exploration/exploitation decisions,
(b) means to leverage the fine structure of the causal graph,
(c) interesting differential techniques,
(d) plenty of open questions.

Emilie Kaufmann

Title: On Bayesian Upper Confidence Bounds for Bandits Problems and Optimality of Thomson sampling

Abstract: We present here two algorithms based on Bayesian modelling of the MAB, that are proved to be optimal in term of frequentist regret. The first, Bayes-UCB uses a Bayesian Upper Confidence Bound based on quantiles of the posterior distribution as an index. The second is Thompson Sampling, a randomized Bayesian algorithm dating back to 1933 whose first finite-time regret analysis was proposed at COLT (2012) and for which we now have an optimal finite-time analysis.

Sebastien Bubeck

Title: Two basic problems in finite stochastic optimization

Abstract: I will present a new theoretical perspective on two basic problems arising in stochastic optimization. The first one is arguably the most elementary problem in stochastic optimization: assume that one wants to find the maximum of function defined on a finite set, and that one is given a budget of n noisy evaluations. What is the best sequential allocation procedure for the evaluations?

The second problem that I will discuss is inspired from the issue of security analysis of a power system. We formalize this problem as follows: Let X be a set, and A a subset of X of “interesting” elements in X. One can access X only through requests to a finite set of probabilistic experts. More precisely, when one makes a request to the i^th expert, the latter draws independently at random a point from a fixed probability distribution P_i over X. One is interested in discovering rapidly as many elements of A as possible, by making sequential requests to the experts.

Warren B. Powell

Title : The knowledge gradient for optimal learning

Abstract : The knowledge gradient is a policy for efficiently learning the best of a set of choices by maximizing the marginal value of information, a form of steepest ascent for a belief model. The policy has no tunable parameters, and has been adapted to both online (bandit) and offline (ranking and selection) problems. Versions of the knowledge gradient have been derived for both discrete and continuous alternatives, and a variety of Bayesian belief models including lookup tables with correlated beliefs, parametric models and two forms of nonparametric models. We have also derived the knowledge gradient for a robust (min-max) objective. We report on comparisons against popular policies such as Boltzmann, epsilon-greedy, interval estimation and UCB policies. We will discuss applications in drug discovery (a problem with 90,000 alternatives) and the tuning of parameters for black-box simulations (vector-valued continuous parameters). Additional information may be found at http://optimallearning.princeton.edu.