CS 294: Deep Reinforcement Learning, Fall 2015

Instructors: John Schulman, Pieter Abbeel
GSI: Rocky Duan
Lectures: Mondays and Wednesday, Session 1: 10:00am-11:30am in 405 Soda Hall / Session 2: 2:30pm-4:00pm in 250 Sutardja Dai Hall.
Office Hours: Tuesday 4pm-5pm, Thursday 11am-12pm, both in 511 Soda Hall.
Communication: Piazza will be used for announcements, general questions about the course, clarifications about assignments, student questions to each other, discussions about material, and so on. To sign up, go to the Piazza website and sign up with “UC Berkeley” and “CS294-112” for your school and class.

Table of Contents


This course will assume some familiarity with reinforcement learning, numerical optimization and machine learning. Students who are not familiar with the concepts below are encouraged to brush up using the references provided right below this list. We’ll review this material in class, but it will be rather cursory.

  • Reinforcement learning and MDPs
    • Definition of MDPs
    • Exact algorithms: policy and value iteration
    • Search algorithms
  • Numerical Optimization
    • gradient descent, stochastic gradient descent
    • backpropagation algorithm
  • Machine Learning
    • Classification and regression problems: what loss functions are used, how to fit linear and nonlinear models
    • Training/test error, overfitting.

For introductory material on RL and MDPs, see

For introductory material on machine learning and neural networks, see


The assignments will be provided as Jupyter (formerly called IPython) notebooks (docs) and will use NumPy (docs) with Python 2.7. You may find the following tutorial helpful (from Stanford CS231): Python/Numpy. Here’s our installation guide for the homework.

There will be four problem sets (the first one divided into two parts) and a final assignment in which you write a 1-page research proposal about a research idea or an application of Deep RL.

1: Markov Decision Processes

Homework 1 is released: Download it here. After installing dependencies, unzip the file, navigate into the hw1 directory, and type ipython notebook. It will be due Monday September 7th at 11:59PM. The URL and credentials for uploading your homework will be provided on Piazza.

2: Policy Gradient Methods

Homework 2 is released: Download it here. It will be due Sunday October 25th at 11:59pm. The URL and credentials for uploading your homework will be provided on Piazza. Problems 4 and 5 optionally use CGT, if you choose to use the reference implementations. The Atari wrapper doesn’t work on Windows, but all of the code should work on Mac and Linux.

3: Approximate Dynamic Programming Methods

4: Search + Supervised Learning

5: Research Proposal


Problem sets will each be worth 20% of your grade and the final proposal will be worth the last 20%.


Below you can find a tentative outline of the course. Slides, videos, and references will be posted as the course proceeds. Dates are tentative.

Course Introduction and Overview

  • Date: 8/26
  • Topics:
    • What is deep reinforcement learning?
    • Current applications of RL
    • Frontiers: where might deep RL be applied?
  • Slides
  • References and further reading
    • See Powell textbook for more information on applications in operations research.
    • See Stephane Ross’ thesis (Introduction) for more info on structured prediction as reinforcement learning and how is RL difference from supervised learning?

Markov Decision Processes

Review of Backpropagation and Numerical Optimization

  • Date: 9/2
  • For a very thorough analysis of reverse mode automatic differentiation, see Griewank and Walther’s textbook Evaluating Derivatives. Chances are, you’re computing derivatives all day, so it pays off to go into some depth on this topic!

Policy Gradient Methods

Approximate Dynamic Programming Methods

Search + Supervised Learning


Lecture Videos

We did not record lecture videos for the course, but I (John) gave a lecture series at MLSS, and videos are available:

  • Lecture 1: intro, derivative free optimization
  • Lecture 2: score function gradient estimation and policy gradients
  • Lecture 3: actor critic methods
  • Lecture 4: trust region and natural gradient methods, open problems




Send feedback to the instructor. Feel free to remain anonymous.