CSE4/510: Reinforcement Learning

Spring 2020, Lectures: Mon/Wed 11:00am - 12:20pm

Course News

Our Group

This was a great although challenging semester! We have encountered a great challenge in transitioning from the off-line to on-line mode of learning. I am so grateful that you successfully rose to the challenge of this transition and that we were able to gain even more than expected.
RL Course Spring20 Group Photo
Reinfrocement Learning Course, Spring 2020
RL Course Spring20 Group Photo
Reinfrocement Learning Course, Applications in RL Workshop, Spring 2020

Selected Final Projects Presentations

Course updates to reflect the COVID-19 induced transition to online classes [March 19, 2020]

There will be a few changes in our course for the rest of the semester.


  • Our classes will be held online using the WebEx platform during our course hours (Monday and Wednesday from 11am - 12.20pm). Please install WebEx here and try to login to the platform around 5 mins before the beginning of the class.
  • To ensure the same iteration for the class, during the online lecture, we will have popup quizzes and polls (that are not graded) to ensure you are up to date with the material covered.


There are no changes to our assignments. Assignment 2 and Assignment 3 will follow the same due dates and should be submitted through UBlearns.

Final Project

  • Checkpoint and final materials will have to be submitted though UBlearns as planned.
  • Final Project presentations will be done online. We will allocate WebEx or Google Hangout time slots prior to it.

Midterm II

Our second Midterm will be held online on the scheduled exam date (May 11, 2020). It will be done though UBlearns using LockDown Browser.


To ensure you have some interaction with your peers, we will allow you to complete quizzes collaboratively. Every week you will be randomly matched with someone from our class, and you will have to connect online and discuss the quiz questions and answers. This interaction is not mandatory, so if you do not wish to participate in this collaboration, please let me know. We wish to ensure that you have some active social interaction with your peers so that you may learn from each other and help each other understand course materials. This is an essential part of educational life, so we want to provide this for you.

Bonus Points

  • Jupyter Demo time will be held as usual during the second half of the class though the WebEx, current topics are listed on Piazza.
  • We will have an active chat at our disposal and we will also welcome voice interactions, so we hope to keep the Candy Question Bonuses as usual (unfortunately without actual candy).
  • More bonus points opportunities will be released as the course goes.


There will not be many changes to the course syllabus and assessments. The updated version can be found here.

Office Hours

Meetings will be held online though Webex during our usual OH schedule. During this time, there will be an active link that you can use to join. The link will be available for all people, so please contact course staff directly to set up an individual meeting if you feel uncomfortable sharing your question with your peers.


We will continue to use Piazza as the main resource for your questions, course news and all other details. The course materials and syllabus are posted on our course website. If you have any personal query or want to schedule an individual meeting, send us an email:

Course Staff

We, the course staff team, wish to assure you we will do our best to ensure you get the same high quality instruction and interaction we experienced in the course. Our Friends of the Course are also here to support you. Weijin, Mudita, Nathan and Yuhao are available for you per individual request. Just send an email to schedule a meeting online.

Course Staff Contact Meet
Alina Vereshchaka (Instructor) avereshc[at]buffalo.edu Mon, Wed 12:30-2pm & by appointments
Aman Johar (TA) amanjoha[at]buffalo.edu Tue, Thu 1-3pm & by appointments
Weijin Zhu (Friend of the Course) weijinzh[at]buffalo.edu By appointments
Mudita Sanjive (Friend of the Course) muditasa[at]buffalo.edu By appointments
Nathan Margaglio (Friend of the Course) nmargag[at]buffalo.edu By appointments
Yuhao Du (Friend of the Course) yuhaodu[at]buffalo.edu By appointments


Reinforcement learning is an area of machine learning, where an agent or a system of agents learn to archive a goal by interacting with their environment. RL is often seen as the third area of machine learning, in addition to supervised and unsupervised areas, in which learning of an agent occurs as a result of its own actions and interaction with the environment.

In recent years there has been success in reinforcement learning research in both theoretical and applied fields. It was applied in a variety of fields such as robotics, pattern recognition, personalized medical treatment, drug discovery, speech recognition, computer vision, and natural language processing. This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning. Students will progress towards larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms. We will also go over the recent methods that are based on reinforcement learning, such as imitation learning, meta learning and more complex environment formulations.


Redeem your Candy Question bonus.

[Bonus request should be submitted within two days after the lecture, otherwise they will expire.]


Date Lecture Topic Recommended Reading Quiz Assignments
Jan 27 Introduction to Reinforcement Learning Quiz 1

UBlearns > Assignments
Course Logistics
Jan 29 Markov Decision Process SB (Sutton and Barton) Ch. 3
Quiz 1 None
Modeling Choises
Feb 3 Polices, Value Functions SB Ch. 3
Quiz 1
Quiz 2
Assignment 1

UBlearns > Assignments
Feb 5 Bellman Equations & Dynamic Programming SB Ch. 4
Quiz 1 (Wed @11:59pm)
Quiz 2 (Sun @11:59pm)
Assignment 1
Feb 10 Google Colab & Python Overview by Aman Johar Bring your laptop to follow the discussion
Quiz 3
Assignment 1
RL Env Basics by Aman Johar
Assignment 1 Q&A
Feb 12 Learning and Planning with Tabular Methods SB Ch. 5
Quiz 3
Assignment 1
Monte Carlo
Feb 17 Monte Carlo & Temporal Difference SB Ch. 6.1-6.3
Quiz 4
Assignment 1
Dynamic Programming demo by Joseph Naro
Feb 19 Temporal Difference (TD(0), SARSA, Q-learning) SB Ch. 6.4-6.5
Quiz 4
Assignment 1
TD(0) demo by Vaibhav Prakash Chhajed
Feb 24 Temporal Difference (Double Q-learning, n-step Bootstrapping) Ch. 6.7, 7
Quiz 5
Assignment 1
Q-learning demo by Raunaq Jain
Feb 26 n-step Bootstrapping & TD(λ) Ch. 7
Quiz 5
Assignment 1 (Sun @11:59pm)
SARSA Demo by Bipul Kumar
Mar 2 Tabular Methods Solutions Overview
Released Assignment 2
Midterm I Q&A
Mar 4 Midterm I
Assignment 2
Mar 9 Applications in RL Worskshop (more details) Quiz 6
Assignment 2
Mar 11 Linear Value Function Approximation I - SB Ch. 9
- Gradient Descent Notes (Harvard)
- Human-level control through deep reinforcement learning (paper)
Quiz 6
Assignment 2
Non-Linear Value Function Approximation (DQN) I
Mar 16 Spring Break
Assignment 2
Mar 18 Spring Break
Assignment 2
Mar 23 Linear Value Function Approximation II - SB Ch. 9
- Gradient Descent Notes (Harvard)
Quiz 7 Assignment 2
Mar 25 Linear Value Function Approximation: Step-by-step - Deep Reinforcement Learning with Double Q-learning (paper)
- CS231n (Stanford) CNN (notes)
Quiz 7 Assignment 2
Non-Linear Value Function Approximation II (Double DQN)
Mar 30 Deep Q-Networks (Dueling DQN, Experience Replay, Rainbow DQN) - Dueling Network Architectures for Deep Reinforcement Learning (paper)
- Prioritized Experience Replay (paper)
- Rainbow: Combining Improvements in Deep Reinforcement Learning
Quiz 8 Assignment 2
Apr 1 Imitation Learning: Behavior Cloning + Inverse Reinforcement Learning - End to End Learning for Self-Driving Cars (paper covering DAVE-2) (paper)
- DAVE-2 Driving Lincoln (YouTube)
Quiz 8 Due Assignment 2
Apr 6 Policy Gradient I SB Ch. 13
Quiz 9 Released Assignment 3
Policy Gradient Proof (for reference)
Dueling DQN Demo by Nikita Manish Sawant
Double DQN Demo by Utkarsh Shukla
Apr 8 Policy gradient II SB Ch. 13
Quiz 9 Assignment 3
Apr 13 Advanced Policy Gradient Methods (A3C, DPG, DDPG) - SB Ch. 13
- Asynchronous Methods for Deep Reinforcement Learning by (paper)
- Deterministic Policy Gradient Algorithms (paper)
Quiz 10 Assignment 3
Apr 15 Advanced Policy Gradient Methods (DDPG, Importance Sampling)
- Continuous control with deep reinforcement learning (paper)
Quiz 10 Due Assignment 3
A2C Demo by Salil Santosh Dabholkar
Apr 20 Advanced Policy Gradient Methods (TRPO, PPO) - Post about PPO by OpenAI
- Trust Region Policy Optimization (paper)
- Proximal Policy Optimization Algorithms (paper)
- Learning Dexterity (training a robot hand to manipulate objects using PPO)
Released Quiz 11 Final Project
Apr 22 Advanced RL (GORILA, A3C, IMPALA, Ape-X, R2D3, Agent57) - Distributed Prioritized Experience Replay. Horgan, et al., 2018 (Ape-X paper)
- Making Efficient Use of Demonstrations to Solve Hard Exploration Problems. Gulcehre, et al., 2018 (R2D3 paper)
- Agent57: Outperforming the Atari Human Benchmar (paper)
- Agent57: Outperforming the human Atari benchmark(blog post by DeepMind)
Due Quiz 11 Final Project
PPO Demo by Utkarsh Shukla
Apr 27 Multiagent Reinforcement Learning - Egorov, Maxim. "Multi-agent deep reinforcement learning." (paper)
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (paper)
None Final Project
DDPG Demo by Bipul Kumar
Apr 29 Meta Learning - One-Shot Imitation from Watching Videos by BAIR (UC Berkely) (blog post)
- Meta Reinforcement Learning by Lilian Weng (blog post)
None Final Project
May 4 Ethics & Safety in AI None Final Project
May 6 Final Review and Q&A None Final Project Presentation
Final Project Presentations
May 11 Midterm II


Add our schedule to your calendar here.


  • Instructor: Alina Vereshchaka
  • Lectures: Mon, Wed 11:00 - 12:20pm, Webex Baldy 110
  • Office hours: Mon, Wed 12:30pm - 2:00pm via Webex @ Davis 338b & by appointments
  • How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu

Teaching Assistant

  • Aman Johar (amanjoha[at]buffalo.edu)
  • Office hours: Tue, Thu 1:00 - 3:00pm via Webex @ Davis Hall TA area and Davis 310 & by appointments

Friends of the Course

  • Mudita Sanjive (muditasa[at]buffalo.edu): Piazza & by appointments Thu 12-1pm @ Davis Hall TA area
  • Weijin Zhu (weijinzh[at]buffalo.edu): Piazza & by appointments Fri 5-6pm @ Davis Hall TA area
  • Nathan Margaglio (namargag[at]buffalo.edu): Piazza & by appointments
  • Yuhao Du (yuhaodu@buffalo.edu): By apporintments You can meet Yuhao at Davis 309.

Key Topics

Grading Rubrics

Course Component % of grade
Assignments [3 assignments: 15% + 15% + 10%] 40%
Final Project 20%
Weekly Quizzes 10%
Midterm 1 15%
Midterm 2 15%

Bonus Points

Late Day Policy

Weekly Quizes - How does it work?

Generic requirements:

Reference Materials

There is no official textbook for the class, but a number of the supporting readings will come from: Additional references, that can be useful:

Useful RL Materials

Usefull Tools:

Academic Integrity Policy

Academic integrity is a fundamental university value. No collaboration, cheating, and plagiarism is allowed in projects, quizes, and the exam. Those found violating academic integrity will get an immediate F in the course.
  1. All the submissions will be checked using SafeAssign (integrated tool into UBlearns). This tool checks the submissions for originality. After each submission it returns similarity score along with the original source. It is based on the submitted works for the past semesters as well the current submissions. If there is a high similarity between the submissions from the same class, ALL students involved will be recorded though the Academic Dishonesty Report.
  2. The submissions should include all the references. Kindly note that referencing the source does not mean you can copy/paste it fully and submit as your original work. Updating the hyperparameters or modifying the existing code is a subject to plagiarism. Your work has to be original.
  3. Submitting a material that has been previously submitted, in whole or in substantial part is a subject for further investigation.
  4. You work has to be done on your personally owned device or university provided equipment. Completing the assignment on your friend’s / neighbor's device is a subject for further investigation for both parties involved. Any cheating during the midterms will be subject to immediate 0.
  5. In any suspicious case that includes a student or a group of students will be immediately passed to the undergraduate/graduate adviser and the case will be recorded using the Academic Dishonesty Report Form.
  6. Please read more about UB Academic Integrity Policy 2019-20.
We want you to demonstrate your own achievements and showcase your own abilities during the course! From the course instructors side, we are glad to provide you all the help needed for you to succeed in the course. Here is some of the free resources provided by the University:
  1. If you need help with English, check UB Writing Center (http://www.buffalo.edu/writing/make-an-appointment-old.html)
  2. If you have issues with your device, the University provides access to computers, as well as equipment loans.
  3. Your well-being is highly important, if you have any concerns, make sure to check Counseling Service.
Academic Integrity is a very high priority not only for our Department, but the University as a whole. We are glad to provide you help to ensure you achieve great results during the course, however we are not tolerate any kind of cheating.

Accessibility Resources

If you have any disability which requires reasonable accommodations to enable you to participate in this course, please contact the Office of Accessibility Resources, 60 Capen Hall, 645-2608, and also the instructor of this course. The office will provide you with information and review appropriate arrangements for reasonable accommodations. More details.


The UB School of Engineering and Applied Sciences considers the diversity of its students, faculty, and staff to be a strength, critical to our success. We are committed to providing a safe space and a culture of mutual respect and inclusiveness for all. We believe a community of faculty, students, and staff who bring diverse life experiences and perspectives leads to a superior working environment, and we welcome differences in race, ethnicity, gender, age, religion, language, intellectual and physical ability, sexual orientation, gender identity, socioeconomic status, and veteran status.


Is there any GPU available to use for our projects?

CCR is supporting our course with accessing to powefull GPU servers. If you need access to GPU, create an account at CCR and let me know, so I will add you to the resources.

I am in the waiting list, can you help me to enrol?

Unfortunately there is nothing we can do at this time. I would suggest to keep an eye at the enrollment. Typically some students drop the course right before the drop-date deadline (Feb 4), so if your are in the waiting list, there is a high chance you will get enrolled, so I would strongly suggest to visit the lectures, before the enrolment is finilized, even if you are not registered at this time.

Can this course satisfy breadth/depth requirement?

Yes, the course can be used to satisfy the depth requirement for the AI focus area for graduate level (CSE 510).

What programming language will be used?

We will be using Python as the programming language for the projects, also familarity with Keras/Tensorfow/PyTorch will help.

Is attendance required?

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself

I am highly interested in the course, can audit it?

Typically I welcome students interested in the topics to audit the course. Unfortunately this Spring our scheduled room is not big enough to fill all people interested. You are welcome to drop me an email one week after the class begins, I will give you updates if there is some space available.

Any suggestions or comments?

I would be glad to get a feedback from you, just send me an email.