If you want to make good decisions from data, you need good data. Traditional statistics and machine learning has made great progress in learning form fixed datasets, but even an optimal learning algorithm for learning from a fixed dataset can be arbitrarily bad when the decisions it makes affect the data it gets. I'm trying to design algorithms which can learn to take good actions (and so may affect its environment) in a manner which is simultaneously computationally tractable and statistically efficient.
Statistically efficient RL requires "deep exploration". Previous approaches to deep exploration have not been computationally tractable beyond small scale problems. This dissertation presents an alternative approach through the use of randomized value functions.
Some of the previously published results for posterior sampling without episodic reset are incorrect. This note clarifies some of the issues in this space and presents some conjectures towards future solutions.
A previously published proof for the lower bounds on what is possible for any reinforcement learning algorithm are incorrect. posterior sampling without episodic reset is incorrect. This note clarifies some of the issues in this space and presents some further conjectures on what might be true in this space.
Computational results demonstrate that PSRL dramatically outperforms UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it.
Deep exploration and deep reinforcement learning. Takes the insight from efficient exploration via randomized value functions and attains state of the art results on Atari. Includes some sweet vids.
A principled approach to efficient exploration with generalization that can be implemented for deep learning models at scale. Use an augmented bootstrap to approximate the posterior distribution.
You can combine efficient exploration and generalization, all without a model-based planning step. Some cool empirical results and also some theory. My favorite paper.
The first general analysis of model based RL in terms of the dimensionality, rather than the cardinality, of the system. Several new state of the art results including linear systems.
If the environment is a structured graph (aka factored MDP), then you can exploit that to learn quickly. You can adapt UCB-style approaches for this, posterior sampling gets it for free.
You don't need to use loose UCB-style algorithms to get regret bounds for reinforcement learning. Posterior sampling is more efficient in terms of computation and data and shares similar gaurantees.
We apply deep learning techniques to energy load forecasting across 20 geographic regions. We found that recurrent network architectures were particularly suited to this task. Class project for CS 229 in my first quarter at Stanford.