Description
This course is intended for students interested in artificial intelligence. Reinforcement learning is an area of machine learning where an agent learns how to behave in a environment by performing actions and assessing the results. Reinforcement learning is how Google DeepMind created the AlphaGo system that beat a high-ranking Go player and how AlphaStar become the first artificial intelligent to defeat a top professional player in StarCraft II. We will study the fundamentals and practical applications of reinforcement learning and will cover the latest techniques used to create agents that can solve a variety of complex tasks, with applications ranging from gaming to finance to robotics.
Syllabus
Date | Lecture Topic | Reading | Quiz | Project |
---|---|---|---|---|
May 28 | Introduction to Reinforcement Learning | None |
Quiz 0
(UBlearns > Assignments) |
None |
Course Logistics (slides) | ||||
Defining RL and Markov Decision Process | ||||
Modeling Choises | ||||
May 30 | Polices, Value Functions & Bellman Equations | SB (Sutton and Barton) Ch. 3 Python Tutorial [from Stanford] |
Quiz 0
Due Sunday @11:59pm |
Project 1 Released UBlearns > Assignments |
Python/Google Colab overview + Gym environments basics by Nathan Margaglio | ||||
[Recitation] Return & Reward exploration | ||||
June 4 | Dynamic Programming & Monte Carlo | SB Ch. 4, 5.1-5.4, 6.1-6.5 | Quiz 1 Released |
Project 1 |
Temporal-Difference learning methods (Q-Learning) | Q-Learning Demo by Anurag Anil Saykar | [Recitation] Policy Iteration | ||
June 6 | Learning and Planning with Tabular Methods (Model Based) | SB Ch. 6.1-6.5 | Quiz 1 Due Sunday @11:59pm |
Project 1
Due Sunday @11:59pm |
Temporal-Difference learning methods (TD, SARSA, Q-Learning) | ||||
[Recitation] Q-learning Step-by-Step | ||||
June 11 | Summary of Tabular Solution Methods |
- SB Ch. 9.1-9.4
- Human-level control through deep reinforcement learning |
Quiz 2 | Project 2
Release |
RL with function approximation | ||||
Deep Q-networks (DQN) | ||||
June 13 | Imitation Learning: Behavior Cloning |
- Dave-2 Presentation (NVIDIA self-driving car, 2016) Watch on YouTube
- Dave-2 Demo (NVIDIA self-driving car, 2016) - Watch on YouTube |
Quiz 2 Due Sunday @11:59pm |
Project 2 |
Imitation Learning: Inverse Reinforcement Learning | ||||
DQN: Recap | ||||
Deep Learning Demo by Nathan Margaglio | ||||
[Recitation] Q-learning and Policy Iterations | ||||
June 18 | Policy Gradient | SB Ch. 13 |
Quiz 3
Release |
Project 2
Due Wednesday @11:59pm |
REINFORCE and Actor-Critic | ||||
DQN Demo by Anurag Anil Saykar | ||||
June 20 | Policy Gradient Methods |
- SB Ch. 13
- Trust Region Policy Optimization by John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel (Read) - Proximal Policy Optimization Algorithms by John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov (Read) |
Quiz 3
Due Sunday @11:59pm |
Project 3
Release |
TRPO, PPO | ||||
June 25 | Safety in AI | None |
Quiz 4
Release |
Project 3 |
Course Material Review | ||||
June 27 | RL Challenge Final Round (Presentations) | None |
Quiz 4
Due Sunday @11:59 |
Project 3 |
Students Projects Presentations | ||||
Q&A before Final | ||||
July 2 | Final | None | None |
Project 3
Due Wednesday @11:59pm |
Logistics
- Instructor: Alina Vereshchaka
- Session: May 28 - Jul 05
- Lectures: Tue/Thu 3:00 - 6:15pm, Knox 104
- Recitations: Tue/Thu 6:15 - 7:15pm, Knox 104
- Office hours: Tue/Thu 1:30 - 2:30pm, Mon/Wed 12:30 - 1:30pm
Calendar
Add our schedule to your calendar here.Reference Materials
There is no official textbook for the class but a number of the supporting readings will come from:- Richard S. Sutton and Andrew G. Barto, "Reinforcement learning: An introduction", Second Edition, MIT Press, 2019 - is a classical book and covers all the basics
- Lecture slides and other relevant papers will be added
- Li, Yuxi. "Deep reinforcement learning." arXiv preprint arXiv:1810.06339 (2018). - an overview of the latest algorythms and applications in reinfocment learning
- David Silver's course on Reiforcement Learning
Useful RL Materials
- MDP Cheatsheet Reference (2 pages) by John Schulman (pdf)
- Neural Network tutorial Article + Sample code
Evaluation
- 50% - Projects (3 projects: 15 + 15 + 20)
- 20% - Short weekly quizzes
- 30% - Final Exam
Office hours and recitations
- Office hours and recitations start from Thursday, May 30
- Office hours are held in Davis 310
- Recitations are held in Knox 104
- Office Hours can be held in person or online. For online office hours, you will need to create a Google Hangout event and invite me (avereshc[at]buffalo.edu)
Projects
The course consists of three projects. Projects will be done individually.
- Project 1 - Building RL environment
- Project 2 - DQN
- Project 3 - Policy gradient
Late Day Policy
- You can use up to 3 late days
- A late day extend the deadline by 24 hours
- If you have more then 3 days after the deadline, a penalty of 25% for one day will be applied to any work submitted after the that time.
Weekly Quizes - How does it work?
- Released every Monday 9am, due by Sunday 11:59pm
- Can be found at UBlearns > Assignments
- Each quiz will contain 4-5 problems on topics covered that week
- At the end of a submission, the system will give you your final score
- 5 quizzes in total, only 4 quizzes with the highest scores will be counted
- Three attempts allowed, only the highest score will be kept
FAQ
Will we have a class on July 4?
No, our last day of classes is July 2, our Final examination is scheduled on that day.What do I need to do before class starts?
- Sign-up for Piazza (if you do not have an account already) and enroll into the CSE 4/510 Introduction to Reinforcement Learning class.
- Confirm that the class shows up in your UBLearns account.
Where is the course syllabus?
Syllabus is here.
What programming language will be used?
We will be using Python as the programming language for the projects.
Attendance
Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself
I am highly interested in the course, but I cannot register, can I attend?
Yes, you are welcome to audit the course.