CSE4/510: Reinforcement Learning

Fall 2019, Lectures: Tue/Thu 3:30 - 4:50pm, Norton 213

Course News

Our Group

Thank you everyone for a great semester! It was a great pleasure working with all of you during our challenging course. I wish you a happy and productive winter break and hope that you will continue your journey in reinforcement learning!

I also want to give special thanks to the course TA Yuhao Du and our Friends of the Course Anurag Saykar and Nathan Margaglio for their constant support. Thank you for putting in all of your effort and encouraging students to keep motivated and supporting them throughout our course!

RL Course Fall19 Group Photo
Reinfrocement Learning Course, Fall 2019


This course is intended for students interested in artificial intelligence. Reinforcement learning is an area of machine learning where an agent learns how to behave in an environment by performing actions and assessing the results. Reinforcement learning is how Google DeepMind created the AlphaGo system that beat a high-ranking Go player and how AlphaStar become the first artificially intelligent system to defeat a top professional player in StarCraft II. We will study the fundamentals and practical applications of reinforcement learning and will cover the latest methods used to create agents that can solve a variety of complex tasks, with applications ranging from gaming to finance to robotics.


Date Lecture Topic Recommended Reading Quiz Project
Aug 27 Introduction to Reinforcement Learning Quiz 0
(UBlearns > Assignments)
Course Logistics
Aug 29 Markov Decision Process SB (Sutton and Barton) Ch. 3
Quiz 0 None
Modeling Choises
Sep 3 Polices, Value Functions & Bellman Equations SB (Sutton and Barton) Ch. 3
Quiz 0 is due on Wednesday @11:59pm Quiz 1
Project 1
UBlearns > Assignments
Python/Google Colab overview
Sep 5 Dynamic Programming SB (Sutton and Barton) Ch. 4
Quiz 1
Due Sunday @11:59pm
Project 1
Gym environments basics
Sep 10 Learning and Planning with Tabular Methods SB Ch. 5
Quiz 2
Project 1
Monte Carlo
Sep 12 Monte Carlo Tree Search - Monte-Carlo tree search and rapid action value estimation in computer Go
- SB Ch. 6.1-6.3
Quiz 2
Due Sunday @11:59pm
Project 1
Temporal Difference (TD(0))
TD(0) Demo by Aman Khurana
Sep 17 Tabular Methods Review SB Ch. 6.4-6.5
Quiz 3
Project 1
Model Free RL (Temporal Difference)
SARSA Demo by Manan Ajit Oza
Sep 19 Model Free RL (Q-Learning) SB Ch. 6.4-6.5
Quiz 3
Due Monday @11:59pm
Project 1
Q-learning Demo by Yash Nitin Mantri
Sep 24 Q-learning step-by-step SB Ch. 9.1-9.4
Quiz 4
Project 1
RL with function approximation
Sep 26 Linear Value Function Approximation - SB Ch. 9
- Gradient Descent Notes (Harvard)
Quiz 4
Due Monday @11:59pm
Project 1
Due: Sunday @ 11:59pm
Oct 1 Linear Value Function Approximation: Step-by-step - Human-level control through deep reinforcement learning (paper)
Quiz 5
Project 2
Non-Linear Value Function Approximation I
Oct 3 Non-Linear Value Function Approximation II (DQN) - Playing Atari with Deep Reinforcement Learning (paper)
- CS231n(Stanford) CNN (notes)
Quiz 5
Due Monday @11:59pm
Project 2
Convolution Neural Networks (CNN) Review
Oct 8 Double Deep Q-Networks - Deep Reinforcement Learning with Double Q-learning (paper)
Quiz 6 Project 2
DQN Demo by Arooshi Avasthy
Oct 10 Deep Q-Networks (Douling DQN, Experience Replay) - Dueling Network Architectures for Deep Reinforcement Learning (paper)
- Prioritized Experience Replay (paper)
Quiz 6
Due Monday @11:59pm
Project 2
Imitation Learning: Behavior Cloning
Oct 15 Imitation Learning: Behavior Cloning + Inverse Reinforcement Learning - End to End Learning for Self-Driving Cars (paper covering DAVE-2) (paper)
- DAVE-2 Driving Lincoln (YouTube)
Quiz 7 Project 2
Project 3 [DRAFT] released
Double DQN Demo by Xiao Zhang
Oct 17 Policy gradient I (Policy Gradient Theorem) SB Ch. 13
Quiz 7
Due Monday @11:59pm
Project 2
Oct 22 Policy gradient II (REINFORCE) SB Ch. 13
Quiz 8 Project 2
Oct 24 Tabular Methods and Deep Q-Networks Review
Quiz 8
Due Monday @11:59pm
Project 2
Due: Sunday @ 11:59pm

Oct 29 Policy gradient with Baselines SB Ch. 13
Quiz 9 Project 3
REINFORCE Demo by Xiao Zhang
Oct 31 Actor-Critic (A2C, A3C) - SB Ch. 13
- Asynchronous Methods for Deep Reinforcement Learning by (paper)
Quiz 9
Due Monday @11:59pm
Project 3
Register your team by Nov 1
Nov 5 Advanced Policy Gradient Methods (DPG, DDPG, Importance Sampling) - Deterministic Policy Gradient Algorithms (paper)
- Continuous control with deep reinforcement learning (paper)
Quiz 10 Project 3
Nov 7 Advanced Policy Gradient Methods (TRPO, PPO) - Post about PPO by OpenAI
- Trust Region Policy Optimization (paper)
- Proximal Policy Optimization Algorithms (paper)
Quiz 10
Due Monday @11:59pm
Project 3
Part 1 is due Nov 13
Nov 12 MDP + POMDP + Dec-POMDP Quiz 11 Project 3
Part 1 & Proposal is due Nov 13
Nov 14 Multiagent Reinforcement Learning - Egorov, Maxim. "Multi-agent deep reinforcement learning." (paper)
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (paper)
Quiz 11
Due Monday @11:59pm
Project 3
Nov 19 Model-Based Reinforcement Learning (pdf) - Deep Dynamics Models for Dexterous Manipulation by BAIR (UC Berkely) (blog post)
- Model Based Reinforcement Learning for Atari (paper)
Quiz 12 Project 3
Nov 21 Meta Learning - One-Shot Imitation from Watching Videos by BAIR (UC Berkely) (blog post)
- Meta Reinforcement Learning by Lilian Weng (blog post)
Quiz 12
Due Monday @11:59pm
Project 3
Nov 26 Ethics in AI No Quizz Project 3
Nov 28
Thanksgiving Break
Dec 3 Review Session No Quizz Project 3
Dec 5 Review Session & End of the Course No Quizz None

CSE Demo Day Results

Three teams were chosen to present their work during the CSE Demo Day on December 6, 2019. More about CSE Demo Days.
  1. Luckyson Khaidem and Ankit Anand "Asynchronous DDQN ensemble through shared experience learning". View poster
    Best Research Award
    (more details)
  2. Alina Vereshchaka, Luckyson Khaidem and Ankit Anand
    Alina Vereshchaka, Luckyson Khaidem and Ankit Anand (sourse)
  3. Vishva Nitin Patel and Leena Manohar Patil "Continuous control with deep reinforcement learning". View poster
  4. Vishva_Nitin_Patel_Leena_Manohar_Patil
    Leena Manohar Patil and Vishva Nitin Patel
  5. Shashank Bhat and Anirudh Sridhar "Policy Gradient Updation using Proximal Policy Optimization". View poster
  6. Shashank Bhat and Anirudh Sridhar
    Shashank Bhat and Anirudh Sridhar


  • Instructor: Alina Vereshchaka
  • Lectures: Tue, Thu 3:30 - 4:50pm, Norton 213
  • Office hours: Tue, Thu 2:00 - 3:00pm @ Davis Hall or online & by appointments
  • How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu

Teaching Assistant

  • Yuhao Du (yuhaodu[at]buffalo.edu)
  • Office hours: Mon, Wed 3:30 - 6:00pm @ Davis Hall TA area or online & by appointments

Friends of the Course

  • Nathan Margaglio (namargag[at]buffalo.edu): Piazza & by appointments
  • Anurag Saykar (anuragan[at]buffalo.edu): Piazza & Thu 5-6pm @ Davis Hall TA area


Add our schedule to your calendar here.

CSE Demo Day Fall 2019

You are welcome to take part in the CSE Demo Day, that is hold at the end of every semester by the CSE Department. Check more details from the previous events here.


  • Time: Dec 6, noon - 6pm
  • Location: Davis Hall, 1-2 floors
  • It is a great opportunity to meet with local companies
  • Research posters template can be found here

To participate in the CSE Demo Day as part of CSE510 RL course:

  1. Choose a topic for Project 3
  2. Get preliminar results
  3. Send an email with the poster draft, project name and short description to avereshc[at]buffalo.edu by Nov 26
  4. If selected
    • The CSE Deparment will help with printing the poster (up to 24" x 36")
    • It will satify your presentation points for Project 3
    • You may get bonus points for your eforts!

Reference Materials

There is no official textbook for the class, but a number of the supporting readings will come from: Additional references, that can be useful:

Useful RL Materials

Usefull Tools:


Bonus points

Note: Each component will receive a numerical score. The course grade will be based on the weighted total of all components and the class curve. The curve will be different for CSE410 and CSE510. The exam will be closed-book, and closed-notes.

Office hours


The course consists of three projects. Projects will be done individually.

Late Day Policy

Weekly Quizes - How does it work?

Academic Integrity Policy

Academic integrity is a fundamental university value. No collaboration, cheating, and plagiarism is allowed in projects, quizes, and the exam. Those found violating academic integrity will get an immediate F in the course. Please refer to the Academic Integrity Policy for more details.


When is the final?

Our Final exam date is scheduled on Dec 12 at 3:30-6:30pm. You can also check it under MyUB Hub account.

Can this course satisfy breadth/depth requirement?

Yes, the course can be used to satisfy the depth requirement for the AI focus area.

What do I need to do before class starts?

Where is the course syllabus?

Syllabus is here.

What programming language will be used?

We will be using Python as the programming language for the projects, also familarity with Keras/Tensorfow/PyTorch will help.

Is attendance required?

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself

I am highly interested in the course, but I cannot register, can I attend?

Typically I welcome students interested in the topics to audit the course. Unfortunately this Fall our scheduled room is not big enough to fill all people interested. You are welcome to drop me an email one week after the class begins, I will give you updates if there is some space available.