Course News
- Dec 23: Happy Holidays!
- Dec 23: Ethics in AI Workshop photos have been posted here
- Dec 23: Congratulations to Luckyson Khaidem and Ankit Anand for receiving the Best Research Award for their work "Asynchronous DDQN ensemble through shared experience learning". (Details)
- Dec 23: We had three teams presenting reinfocement learning projects during the CSE Demo Days 2019. Great work! (Details)
- Nov 27: Hope everyone has a wonderful Thanksgiving and Fall break!
- Nov 27: Please complete our course evaluation at SmartEval
- Nov 21: [Project 3] Register your team for presentation here.
- Nov 8: [Project 3] Part 1 & Proposal is due Nov 13
- Oct 29: Check out details on how to take part in CSE Demo Days
- Oct 17: Project 2 is due in 10 days!
- Oct 17: New facinating project by OpenAI -- Solving Rubik’s Cube with a Robot Hand
- Oct 15: Register your team (up to 2 people) for Project 3 here. Deadline: Nov 1
- Oct 8: Project 1 grades and mid-Semester review are released
- Oct 1: Tetris Challenge (bonus points) is due Wedensday, 2 October at 10pm [details on Piazza]
- Sep 19: Check out new project by OpenAI Emergent Tool Use from Multi-Agent Interaction
- Sep 19: Project 1 is due in 10 days!
- Sep 3: Quiz 0 is due on Wednesday @11:59pm
- Sep 3: Welcome our Friends of the course Nathan and Anurag that are here to give you additional help to explore RL
- Aug 29: If you are about to watch something during the weekend, here is good movie about Alpha Go, we have talked about today (Trailer)
- Aug 27: Please complete the survey here
- Aug 27: Welcome to all returning & new students. We have a great semester ahead!
Our Group
Thank you everyone for a great semester! It was a great pleasure working with all of you during our challenging course. I wish you a happy and productive winter break and hope that you will continue your journey in reinforcement learning!
I also want to give special thanks to the course TA Yuhao Du and our Friends of the Course Anurag Saykar and Nathan Margaglio for their constant support. Thank you for putting in all of your effort and encouraging students to keep motivated and supporting them throughout our course!

Description
This course is intended for students interested in artificial intelligence. Reinforcement learning is an area of machine learning where an agent learns how to behave in an environment by performing actions and assessing the results. Reinforcement learning is how Google DeepMind created the AlphaGo system that beat a high-ranking Go player and how AlphaStar become the first artificially intelligent system to defeat a top professional player in StarCraft II. We will study the fundamentals and practical applications of reinforcement learning and will cover the latest methods used to create agents that can solve a variety of complex tasks, with applications ranging from gaming to finance to robotics.
Syllabus
Date | Lecture Topic | Recommended Reading | Quiz | Project |
---|---|---|---|---|
Aug 27 | Introduction to Reinforcement Learning |
Quiz 0
(UBlearns > Assignments) Released |
None | |
Course Logistics | ||||
Aug 29 | Markov Decision Process | SB (Sutton and Barton) Ch. 3 | Quiz 0 | None |
Modeling Choises | ||||
Sep 3 | Polices, Value Functions & Bellman Equations | SB (Sutton and Barton) Ch. 3 |
Quiz 0 is due on Wednesday @11:59pm
Quiz 1
Released |
Project 1
Released UBlearns > Assignments |
Python/Google Colab overview | ||||
Sep 5 | Dynamic Programming | SB (Sutton and Barton) Ch. 4 |
Quiz 1
Due Sunday @11:59pm |
Project 1 |
Gym environments basics | ||||
Sep 10 | Learning and Planning with Tabular Methods | SB Ch. 5 |
Quiz 2
Released |
Project 1 |
Monte Carlo | ||||
Sep 12 | Monte Carlo Tree Search |
- Monte-Carlo tree search and rapid action value estimation in
computer Go - SB Ch. 6.1-6.3 |
Quiz 2
Due Sunday @11:59pm |
Project 1 |
Temporal Difference (TD(0)) | ||||
TD(0) Demo by Aman Khurana | ||||
Sep 17 | Tabular Methods Review | SB Ch. 6.4-6.5 |
Quiz 3
Released |
Project 1 |
Model Free RL (Temporal Difference) | ||||
SARSA Demo by Manan Ajit Oza | ||||
Sep 19 | Model Free RL (Q-Learning) | SB Ch. 6.4-6.5 |
Quiz 3
Due Monday @11:59pm |
Project 1 |
Q-learning Demo by Yash Nitin Mantri | ||||
Sep 24 | Q-learning step-by-step | SB Ch. 9.1-9.4 |
Quiz 4
Released |
Project 1 |
RL with function approximation | ||||
Sep 26 | Linear Value Function Approximation | - SB Ch. 9 - Gradient Descent Notes (Harvard) |
Quiz 4
Due Monday @11:59pm |
Project 1 Due: Sunday @ 11:59pm |
Oct 1 | Linear Value Function Approximation: Step-by-step |
- Human-level control through deep reinforcement
learning (paper) |
Quiz 5
Released |
Project 2 Released |
Non-Linear Value Function Approximation I | ||||
Oct 3 | Non-Linear Value Function Approximation II (DQN) |
- Playing Atari with Deep Reinforcement Learning
(paper) - CS231n(Stanford) CNN (notes) |
Quiz 5
Due Monday @11:59pm |
Project 2 |
Convolution Neural Networks (CNN) Review | ||||
Oct 8 | Double Deep Q-Networks | - Deep Reinforcement Learning with Double Q-learning (paper) | Quiz 6 |
Project 2 |
DQN Demo by Arooshi Avasthy | ||||
Oct 10 | Deep Q-Networks (Douling DQN, Experience Replay) | - Dueling Network Architectures for Deep Reinforcement Learning (paper) - Prioritized Experience Replay (paper) |
Quiz 6
Due Monday @11:59pm |
Project 2
|
Imitation Learning: Behavior Cloning | ||||
Oct 15 | Imitation Learning: Behavior Cloning + Inverse Reinforcement Learning | - End to End Learning for Self-Driving Cars (paper covering DAVE-2)
(paper) - DAVE-2 Driving Lincoln (YouTube) | Quiz 7 |
Project 2
Project 3 [DRAFT] released |
Double DQN Demo by Xiao Zhang | ||||
Oct 17 | Policy gradient I (Policy Gradient Theorem) | SB Ch. 13 |
Quiz 7
Due Monday @11:59pm |
Project 2
|
Oct 22 | Policy gradient II (REINFORCE) | SB Ch. 13 | Quiz 8 | Project 2 |
Oct 24 | Tabular Methods and Deep Q-Networks Review |
Quiz 8
Due Monday @11:59pm |
Project 2 Due: Sunday @ 11:59pm |
|
Oct 29 | Policy gradient with Baselines | SB Ch. 13 | Quiz 9 | Project 3 |
REINFORCE Demo by Xiao Zhang | ||||
Oct 31 | Actor-Critic (A2C, A3C) | - SB Ch. 13
- Asynchronous Methods for Deep Reinforcement Learning by (paper) |
Quiz 9
Due Monday @11:59pm |
Project 3 Register your team by Nov 1 |
Nov 5 | Advanced Policy Gradient Methods (DPG, DDPG, Importance Sampling) |
- Deterministic Policy Gradient Algorithms (paper)
- Continuous control with deep reinforcement learning (paper) | Quiz 10 | Project 3 |
Nov 7 | Advanced Policy Gradient Methods (TRPO, PPO) | - Post about PPO by OpenAI - Trust Region Policy Optimization (paper) - Proximal Policy Optimization Algorithms (paper) |
Quiz 10
Due Monday @11:59pm |
Project 3 Part 1 is due Nov 13 |
Nov 12 | MDP + POMDP + Dec-POMDP | Quiz 11 |
Project 3 Part 1 & Proposal is due Nov 13 |
|
Nov 14 | Multiagent Reinforcement Learning | - Egorov, Maxim. "Multi-agent deep reinforcement learning." (paper) - Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (paper) |
Quiz 11
Due Monday @11:59pm |
Project 3 |
Nov 19 | Model-Based Reinforcement Learning (pdf) | - Deep Dynamics Models for Dexterous Manipulation by BAIR (UC Berkely) (blog post) - Model Based Reinforcement Learning for Atari (paper) | Quiz 12 |
Project 3 |
Nov 21 | Meta Learning | - One-Shot Imitation from Watching Videos by BAIR (UC Berkely) (blog post) - Meta Reinforcement Learning by Lilian Weng (blog post) |
Quiz 12
Due Monday @11:59pm |
Project 3 |
Nov 26 | Ethics in AI | No Quizz |
Project 3 |
|
Nov 28 | |
|||
Dec 3 | Review Session | No Quizz |
Project 3 Presentations |
|
Dec 5 | Review Session & End of the Course | No Quizz |
None |
CSE Demo Day Results
Three teams were chosen to present their work during the CSE Demo Day on December 6, 2019. More about CSE Demo Days.- Luckyson Khaidem and Ankit Anand "Asynchronous DDQN ensemble through shared experience learning". View poster
Best Research Award (more details) - Vishva Nitin Patel and Leena Manohar Patil "Continuous control with deep reinforcement learning". View poster
- Shashank Bhat and Anirudh Sridhar "Policy Gradient Updation using Proximal Policy Optimization". View poster



Logistics
- Instructor: Alina Vereshchaka
- Lectures: Tue, Thu 3:30 - 4:50pm, Norton 213
- Office hours: Tue, Thu 2:00 - 3:00pm @ Davis Hall or online & by appointments
- How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu
Teaching Assistant
- Yuhao Du (yuhaodu[at]buffalo.edu)
- Office hours: Mon, Wed 3:30 - 6:00pm @ Davis Hall TA area or online & by appointments
Friends of the Course
- Nathan Margaglio (namargag[at]buffalo.edu): Piazza & by appointments
- Anurag Saykar (anuragan[at]buffalo.edu): Piazza & Thu 5-6pm @ Davis Hall TA area
Calendar
Add our schedule to your calendar here.CSE Demo Day Fall 2019
You are welcome to take part in the CSE Demo Day, that is hold at the end of every semester by the CSE Department. Check more details from the previous events here.Details:
- Time: Dec 6, noon - 6pm
- Location: Davis Hall, 1-2 floors
- It is a great opportunity to meet with local companies
- Research posters template can be found here
To participate in the CSE Demo Day as part of CSE510 RL course:
- Choose a topic for Project 3
- Get preliminar results
- Send an email with the poster draft, project name and short description to avereshc[at]buffalo.edu by Nov 26
- If selected
- The CSE Deparment will help with printing the poster (up to 24" x 36")
- It will satify your presentation points for Project 3
- You may get bonus points for your eforts!
Reference Materials
There is no official textbook for the class, but a number of the supporting readings will come from:- Richard S. Sutton and Andrew G. Barto, "Reinforcement learning: An introduction", Second Edition, MIT Press, 2019 - is a classical book and covers all the basics
- Lecture slides, relevant papers, and other materials will be added in the table above
- Li, Yuxi. "Deep reinforcement learning." arXiv preprint arXiv:1810.06339 (2018). - an overview of the latest algorythms and applications in reinfocment learning
- David Silver's course on Reiforcement Learning
Useful RL Materials
- MDP Cheatsheet Reference (2 pages) by John Schulman (pdf)
- Neural Network tutorial Article + Sample code
Usefull Tools:
- Overleaf (LaTex online document generator) - great tool for creating reports (link (refferec))
- Google Colab (online Jupyter Notebook with free GPU) (link)
Evaluation
- 50% - Projects (3 projects: 15 + 15 + 20)
- 20% - Weekly quizzes
- 30% - Final Exam
Bonus points
- Piazza Rockstar
- Demo Time
- Candy Questions
Note: Each component will receive a numerical score. The course grade will be based on the weighted total of all components and the class curve. The curve will be different for CSE410 and CSE510. The exam will be closed-book, and closed-notes.
Office hours
- Office hours start from Thursday, Aug 29
- Office hours can be held in person at Davis 310 or online. For online meeting, you will need to create a Google Hangout event and invite me (avereshc[at]buffalo.edu)
Projects
The course consists of three projects. Projects will be done individually.
- Project 1 - Defining and solving RL environment
- Project 2 - Value Function Approximation (Deep Q-Networks)
- Project 3 - Policy gradient
Late Day Policy
- You can use up to 5 late days
- A late day extend the deadline by 24 hours
- If you have more then 5 days after the deadline, a penalty of 25% for one day will be applied to any work submitted after that time.
Weekly Quizes - How does it work?
- Released every Tuesday 9:00am, due by Monday 11:59pm
- Can be found at UBlearns > Assignments
- Each quiz will contain 3-4 problems on topics covered that week
- At the end of a submission the system will give you the final score
- 13 quizzes in total, only 12 quizzes with the highest scores will be counted toward the final grade
- Three attempts allowed, only the highest score will be kept
Academic Integrity Policy
Academic integrity is a fundamental university value. No collaboration, cheating, and plagiarism is allowed in projects, quizes, and the exam. Those found violating academic integrity will get an immediate F in the course. Please refer to the Academic Integrity Policy for more details.FAQ
When is the final?
Our Final exam date is scheduled on Dec 12 at 3:30-6:30pm. You can also check it under MyUB Hub account.
Can this course satisfy breadth/depth requirement?
Yes, the course can be used to satisfy the depth requirement for the AI focus area.
What do I need to do before class starts?
- Sign-up for Piazza (if you do not have an account already) and enroll into the "CSE 4/510 Introduction to Reinforcement Learning" class.
- Confirm that the class shows up in your UBLearns account.
Where is the course syllabus?
Syllabus is here.
What programming language will be used?
We will be using Python as the programming language for the projects, also familarity with Keras/Tensorfow/PyTorch will help.
Is attendance required?
Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself
I am highly interested in the course, but I cannot register, can I attend?
Typically I welcome students interested in the topics to audit the course. Unfortunately this Fall our scheduled room is not big enough to fill all people interested. You are welcome to drop me an email one week after the class begins, I will give you updates if there is some space available.