CSE4/510: Reinforcement Learning

Summer 2019, Lecture: Tue/Thu 3:00-6:15 pm, Knox 104

Description

This course is intended for students interested in artificial intelligence. Reinforcement learning is an area of machine learning where an agent learns how to behave in a environment by performing actions and assessing the results. Reinforcement learning is how Google DeepMind created the AlphaGo system that beat a high-ranking Go player and how AlphaStar become the first artificial intelligent to defeat a top professional player in StarCraft II. We will study the fundamentals and practical applications of reinforcement learning and will cover the latest techniques used to create agents that can solve a variety of complex tasks, with applications ranging from gaming to finance to robotics.

Syllabus

Date Lecture Topic Reading Quiz Project
May 28 Introduction to Reinforcement Learning None Quiz 0
(UBlearns > Assignments)
None
Course Logistics (slides)
Defining RL and Markov Decision Process
Modeling Choises
May 30 Polices, Value Functions & Bellman Equations SB (Sutton and Barton) Ch. 3
Python Tutorial [from Stanford]
Quiz 0
Due Sunday @11:59pm
Project 1 Released
UBlearns > Assignments
Python/Google Colab overview + Gym environments basics by Nathan Margaglio
[Recitation] Return & Reward exploration
June 4 Dynamic Programming & Monte Carlo SB Ch. 4, 5.1-5.4, 6.1-6.5 Quiz 1
Released
Project 1
Temporal-Difference learning methods (Q-Learning)
Q-Learning Demo by Anurag Anil Saykar
[Recitation] Policy Iteration
June 6 Learning and Planning with Tabular Methods (Model Based) SB Ch. 6.1-6.5 Quiz 1
Due Sunday @11:59pm
Project 1
Due Sunday @11:59pm
Temporal-Difference learning methods (TD, SARSA, Q-Learning)
[Recitation] Q-learning Step-by-Step
June 11 Summary of Tabular Solution Methods - SB Ch. 9.1-9.4
- Human-level control through deep reinforcement learning
Quiz 2 Project 2
Release
RL with function approximation
Deep Q-networks (DQN)
June 13 Imitation Learning: Behavior Cloning - Dave-2 Presentation (NVIDIA self-driving car, 2016) Watch on YouTube
- Dave-2 Demo (NVIDIA self-driving car, 2016) - Watch on YouTube
Quiz 2
Due Sunday @11:59pm
Project 2
Imitation Learning: Inverse Reinforcement Learning
DQN: Recap
Deep Learning Demo by Nathan Margaglio
[Recitation] Q-learning and Policy Iterations
June 18 Policy Gradient SB Ch. 13 Quiz 3
Release
Project 2
Due Wednesday @11:59pm
REINFORCE and Actor-Critic
DQN Demo by Anurag Anil Saykar
June 20 Policy Gradient Methods - SB Ch. 13
- Trust Region Policy Optimization by John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel (Read)
- Proximal Policy Optimization Algorithms by John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov (Read)
Quiz 3
Due Sunday @11:59pm
Project 3
Release
TRPO, PPO
June 25 Safety in AI None Quiz 4
Release
Project 3
Course Material Review
June 27 RL Challenge Final Round (Presentations) None Quiz 4
Due Sunday @11:59
Project 3
Students Projects Presentations
Q&A before Final
July 2 Final None None Project 3
Due Wednesday @11:59pm

Logistics

  • Instructor: Alina Vereshchaka
  • Session: May 28 - Jul 05
  • Lectures: Tue/Thu 3:00 - 6:15pm, Knox 104
  • Recitations: Tue/Thu 6:15 - 7:15pm, Knox 104
  • Office hours: Tue/Thu 1:30 - 2:30pm, Mon/Wed 12:30 - 1:30pm
How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu

Calendar

Add our schedule to your calendar here.

Reference Materials

There is no official textbook for the class but a number of the supporting readings will come from: Additional references, that can be useful:

Useful RL Materials

Evaluation

Office hours and recitations

Projects

The course consists of three projects. Projects will be done individually.

Late Day Policy

Weekly Quizes - How does it work?

FAQ

Will we have a class on July 4?

No, our last day of classes is July 2, our Final examination is scheduled on that day.

What do I need to do before class starts?

Where is the course syllabus?

Syllabus is here.

What programming language will be used?

We will be using Python as the programming language for the projects.

Attendance

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself

I am highly interested in the course, but I cannot register, can I attend?

Yes, you are welcome to audit the course.