By continuing your visit to this site, you accept the use of cookies. They ensure the proper functioning of our services and display relevant ads. Learn more about cookies and act

Not yet registered? Create a OverBlog!

Create my blog

Odalric-Ambrym Maillard

Odalric-Ambrym Maillard

I am currently a Postdoctoral Researcher at the Technion.
Associated tags : discussing articles

Blogs

Odalric-Ambrym Maillard

Odalric-Ambrym Maillard

odalric-ambrym.maillard.over-blog.com
Odalric-Ambrym Maillard Odalric-Ambrym Maillard
Articles : 19
Since : 11/10/2009
Category : Tech & Science

Articles to discover

Selecting the State-Representation in Reinforcement Learning

This page is dedicated to start discussions about the article "Selecting the State-Representation in Reinforcement Learning". Feel free to post any comment, sugggestion, question, correction, extension... I will enjoy discussing this with you. Abstract: " The problem of selecting the right state-representation in a reinforcement learning problem is

Online allocation and homogeneous partitioning for piecewise constant mean-approximation.

By Alexandra Carpentier and Odalric-Ambrym Maillard. In Proceedings of the 25th conference on advances in Neural Information Processing Systems, NIPS '12, 2012. Abstract: In the setting of active learning for the multi-armed bandit, where the goal of a learner is to estimate with equal precision the mean of a finite number of arms, recent results s

Linear regression with random projections.

By Odalric-Ambrym Maillard and Rémi Munos, In Journal of Machine Learning Research 2012, vol:13, pp:2735-2772. Abstract: We investigate a method for regression that makes use of a randomly generated subspace G_P (of finite dimension P) of a given large (possibly infinite) dimensional function space F, for example, L_{2}([0,1]^d). G_P is defined as

Robust Risk-averse Multi-armed Bandits

Odalric-Ambrym Maillard In Algorithmic Learning Theory, 2013. Abstract: We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less i

Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation.

Olivier Cappé, Aurélien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. In The Annals of Statistics, 2013. Abstract: We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of t

Optimal regret bounds for selecting the state representation in reinforcement learning.

Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner, Daniil Ryabko. In Proceedings of the 30th international conference on machine learning, ICML 2013, 2013. Abstract: We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Pr

Concentration inequalities for sampling without replacement.

Rémi Bardenet, Odalric-Ambrym Maillard. In Bernoulli Journal, 2014. Abstract: Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures, few concentration results are known for the setting of sampling without replacement fr
Hierarchical Optimistic Region Selection driven by Curiosity.

Hierarchical Optimistic Region Selection driven by Curiosity.

By Odalric-Ambrym Maillard. In Proceedings of the 25th conference on advances in Neural Information Processing Systems, NIPS '12, 2012. Abstract: This paper aims to take a step forwards making the term ''intrinsic motivation'' from reinforcement learning theoretically well founded, focusing on curiosity-driven learning. To that end, we consider the

Latent Bandits

Odalric-Ambrym Maillard, Shie Mannor In International Conference on Machine Learning, 2014. Abstract: We consider a multi-armed bandit problem where the reward distributions are indexed by two sets –one for arms, one for type– and can be partitioned into a small number of clusters according to the type. First,we consider the setting where all r

Competing with an Infinite Set of Models in Reinforcement Learning

Phuong Nguyen, Odalric-Ambrym Maillard, Daniil Ryabko,Ronald Ortner. In International Conference on Artificial Intelligence and Statistics, 2013. Abstract: We consider a reinforcement learning setting where the learner also has to deal with the problem of finding a suitable state-representation function from a given set of models. This has to be do