Description

This course is intended for students interested in artificial intelligence. Reinforcement learning is an area of machine learning where an agent learns how to behave in a environment by performing actions and assessing the results. Reinforcement learning is how Google DeepMind created the AlphaGo system that beat a high-ranking Go player and how AlphaStar become the first artificial intelligent to defeat a top professional player in StarCraft II. We will study the fundamentals and practical applications of reinforcement learning and will cover the latest techniques used to create agents that can solve a variety of complex tasks, with applications ranging from gaming to finance to robotics.

Syllabus

Date	Lecture Topic	Reading	Quiz	Project
May 28	Introduction to Reinforcement Learning	None	Quiz 0 (UBlearns > Assignments)	None
	Course Logistics (slides)
	Defining RL and Markov Decision Process
	Modeling Choises
May 30	Polices, Value Functions & Bellman Equations	SB (Sutton and Barton) Ch. 3 Python Tutorial [from Stanford]	Quiz 0 Due Sunday @11:59pm	Project 1 Released UBlearns > Assignments
	Python/Google Colab overview + Gym environments basics by Nathan Margaglio
	[Recitation] Return & Reward exploration
June 4	Dynamic Programming & Monte Carlo	SB Ch. 4, 5.1-5.4, 6.1-6.5	Quiz 1 Released	Project 1
	Temporal-Difference learning methods (Q-Learning)
	Q-Learning Demo by Anurag Anil Saykar
	[Recitation] Policy Iteration
June 6	Learning and Planning with Tabular Methods (Model Based)	SB Ch. 6.1-6.5	Quiz 1 Due Sunday @11:59pm	Project 1 Due Sunday @11:59pm
	Temporal-Difference learning methods (TD, SARSA, Q-Learning)
	[Recitation] Q-learning Step-by-Step
June 11	Summary of Tabular Solution Methods	- SB Ch. 9.1-9.4 - Human-level control through deep reinforcement learning	Quiz 2	Project 2 Release
	RL with function approximation
	Deep Q-networks (DQN)
	June 13				Imitation Learning: Behavior Cloning	- Dave-2 Presentation (NVIDIA self-driving car, 2016) Watch on YouTube - Dave-2 Demo (NVIDIA self-driving car, 2016) - Watch on YouTube	Quiz 2 Due Sunday @11:59pm	Project 2
Imitation Learning: Inverse Reinforcement Learning
DQN: Recap
Deep Learning Demo by Nathan Margaglio
[Recitation] Q-learning and Policy Iterations
June 18	Policy Gradient	SB Ch. 13	Quiz 3 Release	Project 2 Due Wednesday @11:59pm
	REINFORCE and Actor-Critic
	DQN Demo by Anurag Anil Saykar
June 20	Policy Gradient Methods	- SB Ch. 13 - Trust Region Policy Optimization by John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel (Read) - Proximal Policy Optimization Algorithms by John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov (Read)	Quiz 3 Due Sunday @11:59pm	Project 3 Release
June 20	TRPO, PPO		Quiz 3 Due Sunday @11:59pm	Project 3 Release
June 25	Safety in AI	None	Quiz 4 Release	Project 3
June 25	Course Material Review	None	Quiz 4 Release	Project 3
June 27	RL Challenge Final Round (Presentations)	None	Quiz 4 Due Sunday @11:59	Project 3
	Students Projects Presentations
	Q&A before Final
July 2	Final	None	None	Project 3 Due Wednesday @11:59pm

Logistics

Instructor: Alina Vereshchaka
Session: May 28 - Jul 05
Lectures: Tue/Thu 3:00 - 6:15pm, Knox 104
Recitations: Tue/Thu 6:15 - 7:15pm, Knox 104
Office hours: Tue/Thu 1:30 - 2:30pm, Mon/Wed 12:30 - 1:30pm

How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu

Calendar

Add our schedule to your calendar here.

Reference Materials

There is no official textbook for the class but a number of the supporting readings will come from:

Richard S. Sutton and Andrew G. Barto, "Reinforcement learning: An introduction", Second Edition, MIT Press, 2019 - is a classical book and covers all the basics
Lecture slides and other relevant papers will be added

Additional references, that can be useful:

Li, Yuxi. "Deep reinforcement learning." arXiv preprint arXiv:1810.06339 (2018). - an overview of the latest algorythms and applications in reinfocment learning
David Silver's course on Reiforcement Learning

Useful RL Materials

MDP Cheatsheet Reference (2 pages) by John Schulman (pdf)
Neural Network tutorial Article + Sample code

Evaluation

50% - Projects (3 projects: 15 + 15 + 20)
20% - Short weekly quizzes
30% - Final Exam

Office hours and recitations

Office hours and recitations start from Thursday, May 30
Office hours are held in Davis 310
Recitations are held in Knox 104
Office Hours can be held in person or online. For online office hours, you will need to create a Google Hangout event and invite me (avereshc[at]buffalo.edu)

Projects

The course consists of three projects. Projects will be done individually.

Project 1 - Building RL environment
Project 2 - DQN
Project 3 - Policy gradient

Late Day Policy

You can use up to 3 late days
A late day extend the deadline by 24 hours
If you have more then 3 days after the deadline, a penalty of 25% for one day will be applied to any work submitted after the that time.

Weekly Quizes - How does it work?

Released every Monday 9am, due by Sunday 11:59pm
Can be found at UBlearns > Assignments
Each quiz will contain 4-5 problems on topics covered that week
At the end of a submission, the system will give you your final score
5 quizzes in total, only 4 quizzes with the highest scores will be counted
Three attempts allowed, only the highest score will be kept

FAQ

Will we have a class on July 4?

No, our last day of classes is July 2, our Final examination is scheduled on that day.

What do I need to do before class starts?

Sign-up for Piazza (if you do not have an account already) and enroll into the CSE 4/510 Introduction to Reinforcement Learning class.
Confirm that the class shows up in your UBLearns account.

Where is the course syllabus?

Syllabus is here.

What programming language will be used?

We will be using Python as the programming language for the projects.

Attendance

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself

I am highly interested in the course, but I cannot register, can I attend?

Yes, you are welcome to audit the course.

CSE4/510: Reinforcement Learning

Summer 2019, Lecture: Tue/Thu 3:00-6:15 pm, Knox 104