About me

I am a research scientist at Google Deepmind working to solve artificial intelligence. My research focus is on decision making under uncertainty, which is often called reinforcement learning. I want to design autonomous agents that teach themselves to do well in any task. If we can do this, then we will be well on our way to general AI.

I completed my Ph.D. at Stanford University advised by Benjamin Van Roy. My thesis Deep Exploration via Randomized Value Functions takes some steps towards a practical reinforcement learning algorithm that combines efficient generalization and exploration.

Before coming to Stanford I studied maths at Oxford University and worked for J.P.Morgan as a credit derivatives strategist. I spent the summer of 2015 working for Google in Mountain View and, after a great internship in 2016 joined DeepMind full time in London. If you want to know more about what I'm thinking check out my blog.

Research Highlights

Quick links and catchy taglines

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

EWRL 2016 (full oral)

Computational results demonstrate that PSRL dramatically outperforms UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it.

Deep Exploration via Bootstrapped Deep Q-Networks

NIPS 2016

Deep exploration and deep reinforcement learning. Takes the insight from efficient exploration via randomized value functions and attains state of the art results on Atari. Includes some sweet vids.

Generalization and Exploration via Randomized Value functions

ICML 2016

You can combine efficient exploration and generalization, all without a model-based planning step. Some cool empirical results and also some theory. My favorite paper.

Model-based Reinforcement Learning and the Eluder Dimension

NIPS 2014

The first general analysis of model based RL in terms of the dimensionality, rather than the cardinality, of the system. Several new state of the art results including linear systems.

Near-optimal Reinforcement Learning in Factored MDPs

NIPS 2014 (Spotlight), INFORMS 2014

If the environment is a structured graph (aka factored MDP), then you can exploit that to learn quickly. You can adapt UCB-style approaches for this, posterior sampling gets it for free.

(More) Efficient Reinforcement Learning via Posterior Sampling

NIPS 2013, RLDM 2013

You don't need to use loose UCB-style algorithms to get regret bounds for reinforcement learning. Posterior sampling is more efficient in terms of computation and data and shares similar gaurantees.


Past courses

MS&E 145 - Introduction to Financial analysis - lead instructor

Finance for engineers aimed for over 75 undergraduate juniors and seniors. Everything from the time value of money to CAPM to elementary option pricing and portfolio optimization. Practical data analysis skills taught through spreadsheets.

MS&E 338 - Reinforcement learning - assistant instructor

An advanced PhD level course aimed at graduate students looking to engage in research. I managed the class research projects and gave several lectures throughout the course.

Want more?

Hit me up at any of the links below.

Here is a copy of my CV.