Course News

Feb 26Follow Piazza for more recent course updates
Feb 3 Our calendar has been updated with all OH (including Friends of the Course OH)
Feb 3 We are proud that some of the brightest students from the previous semesters will join our Instructors team as Friends of Course. Please welcome - Mudita, Weijin and Nathan! They are willing to give you additional help and assist you in your success throughout the course.
Jan 27 Movie about Alpha Go, we have talked about today (Trailer)
Jan 27 Please complete the beginning of the semester survey here
Jan 27 Welcome to all returning & new students. We have a great semester ahead!

Our Group

This was a great although challenging semester! We have encountered a great challenge in transitioning from the off-line to on-line mode of learning. I am so grateful that you successfully rose to the challenge of this transition and that we were able to gain even more than expected.

RL Course Spring20 Group Photo — Reinfrocement Learning Course, Spring 2020

Selected Final Projects Presentations

Augmented Random Search (MuJoCo) Gautam Suryawanshi, Prajit Krisshna Kumar
Group Equivariant Q-networks Kiran Vaddi
Ship Docker Problem (MARL) Bhavin Jawade, Vamshi Krishna
DeepRacer Car Xingtong Li
City On Fire: Multi Agent Reinforcement Learning Srisai Karthik Neelamraju, Anantha Srinath Sedimbi
UAV Learning to Fly Rugved Srinath Hattekar, Chen Zeng, Nanditha
AWS DeepRacer Raunaq Jain, Bipul Kumar, Joseph Naro
Halite III: Multi-agent Learning Victor Lee
Cracking PHYRE with a World Model Sheng Liu
Connected Autonomous Driving (Macad-Gym) Salil Dabholkar
Building Goal Oriented Chatbots Charan Nama Arunkumar
Multi-Agent RL Hoan Tran
HTTP Streaming & Optimization Areas Bekir Oguzhan Turkkan
Multi Agent Reinforcement Learning Peter M. VanNostrand
Chase Tag Game (MARL) Gaurav Vijay Vergiya

Course updates to reflect the COVID-19 induced transition to online classes [March 19, 2020]

There will be a few changes in our course for the rest of the semester.

Classes

Our classes will be held online using the WebEx platform during our course hours (Monday and Wednesday from 11am - 12.20pm). Please install WebEx here and try to login to the platform around 5 mins before the beginning of the class.
To ensure the same iteration for the class, during the online lecture, we will have popup quizzes and polls (that are not graded) to ensure you are up to date with the material covered.

Assignments

There are no changes to our assignments. Assignment 2 and Assignment 3 will follow the same due dates and should be submitted through UBlearns.

Final Project

Checkpoint and final materials will have to be submitted though UBlearns as planned.
Final Project presentations will be done online. We will allocate WebEx or Google Hangout time slots prior to it.

Midterm II

Our second Midterm will be held online on the scheduled exam date (May 11, 2020). It will be done though UBlearns using LockDown Browser.

Quizzes

To ensure you have some interaction with your peers, we will allow you to complete quizzes collaboratively. Every week you will be randomly matched with someone from our class, and you will have to connect online and discuss the quiz questions and answers. This interaction is not mandatory, so if you do not wish to participate in this collaboration, please let me know. We wish to ensure that you have some active social interaction with your peers so that you may learn from each other and help each other understand course materials. This is an essential part of educational life, so we want to provide this for you.

Bonus Points

Jupyter Demo time will be held as usual during the second half of the class though the WebEx, current topics are listed on Piazza.
We will have an active chat at our disposal and we will also welcome voice interactions, so we hope to keep the Candy Question Bonuses as usual (unfortunately without actual candy).
More bonus points opportunities will be released as the course goes.

Syllabus

There will not be many changes to the course syllabus and assessments. The updated version can be found here.

Office Hours

Meetings will be held online though Webex during our usual OH schedule. During this time, there will be an active link that you can use to join. The link will be available for all people, so please contact course staff directly to set up an individual meeting if you feel uncomfortable sharing your question with your peers.

Communication

We will continue to use Piazza as the main resource for your questions, course news and all other details. The course materials and syllabus are posted on our course website. If you have any personal query or want to schedule an individual meeting, send us an email:

Course Staff

We, the course staff team, wish to assure you we will do our best to ensure you get the same high quality instruction and interaction we experienced in the course. Our Friends of the Course are also here to support you. Weijin, Mudita, Nathan and Yuhao are available for you per individual request. Just send an email to schedule a meeting online.

Course Staff	Contact	Meet
Alina Vereshchaka (Instructor)	avereshc[at]buffalo.edu	Mon, Wed 12:30-2pm & by appointments
Aman Johar (TA)	amanjoha[at]buffalo.edu	Tue, Thu 1-3pm & by appointments
Weijin Zhu (Friend of the Course)	weijinzh[at]buffalo.edu	By appointments
Mudita Sanjive (Friend of the Course)	muditasa[at]buffalo.edu	By appointments
Nathan Margaglio (Friend of the Course)	nmargag[at]buffalo.edu	By appointments
Yuhao Du (Friend of the Course)	yuhaodu[at]buffalo.edu	By appointments

Description

Reinforcement learning is an area of machine learning, where an agent or a system of agents learn to archive a goal by interacting with their environment. RL is often seen as the third area of machine learning, in addition to supervised and unsupervised areas, in which learning of an agent occurs as a result of its own actions and interaction with the environment.

In recent years there has been success in reinforcement learning research in both theoretical and applied fields. It was applied in a variety of fields such as robotics, pattern recognition, personalized medical treatment, drug discovery, speech recognition, computer vision, and natural language processing. This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning. Students will progress towards larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms. We will also go over the recent methods that are based on reinforcement learning, such as imitation learning, meta learning and more complex environment formulations.

Prerequisites

CSE 410: CSE 250, MTH 309 and one of the following: EAS 305 or MTH 411 or STA 301.
CSE 4/510: CSE4/574 or CSE4/555 or CSE4/573 or CSE4/568 is recommended to be either completed or taken during the same semester.

Redeem your Candy Question bonus.

[Bonus request should be submitted within two days after the lecture, otherwise they will expire.]

Syllabus

Date	Lecture Topic	Recommended Reading	Quiz	Assignments
Jan 27	Introduction to Reinforcement Learning		Quiz 1 UBlearns > Assignments	None
Jan 27	Course Logistics		Quiz 1 UBlearns > Assignments	None
Jan 29	Markov Decision Process	SB (Sutton and Barton) Ch. 3	Quiz 1	None
Jan 29	Modeling Choises	SB (Sutton and Barton) Ch. 3	Quiz 1	None
Feb 3	Polices, Value Functions	SB Ch. 3	Quiz 1 Quiz 2	Assignment 1 UBlearns > Assignments
Feb 5	Bellman Equations & Dynamic Programming	SB Ch. 4	Quiz 1 (Wed @11:59pm) Quiz 2 (Sun @11:59pm)	Assignment 1
Feb 10	Google Colab & Python Overview by Aman Johar	Bring your laptop to follow the discussion	Quiz 3	Assignment 1
	RL Env Basics by Aman Johar
	Assignment 1 Q&A
Feb 12	Learning and Planning with Tabular Methods	SB Ch. 5	Quiz 3	Assignment 1
Feb 12	Monte Carlo	SB Ch. 5	Quiz 3	Assignment 1
Feb 17	Monte Carlo & Temporal Difference	SB Ch. 6.1-6.3	Quiz 4	Assignment 1
Feb 17	Dynamic Programming demo by Joseph Naro	SB Ch. 6.1-6.3	Quiz 4	Assignment 1
Feb 19	Temporal Difference (TD(0), SARSA, Q-learning)	SB Ch. 6.4-6.5	Quiz 4	Assignment 1
Feb 19	TD(0) demo by Vaibhav Prakash Chhajed	SB Ch. 6.4-6.5	Quiz 4	Assignment 1
Feb 24	Temporal Difference (Double Q-learning, n-step Bootstrapping)	Ch. 6.7, 7	Quiz 5	Assignment 1
Feb 24	Q-learning demo by Raunaq Jain	Ch. 6.7, 7	Quiz 5	Assignment 1
Feb 26	n-step Bootstrapping & TD(λ)	Ch. 7	Quiz 5	Assignment 1 (Sun @11:59pm)
Feb 26	SARSA Demo by Bipul Kumar	Ch. 7	Quiz 5	Assignment 1 (Sun @11:59pm)
Mar 2	Tabular Methods Solutions Overview		None	Released Assignment 2
Mar 2	Midterm I Q&A		None	Released Assignment 2
Mar 4	Midterm I		None	Assignment 2
Mar 9	Applications in RL Worskshop (more details)		Quiz 6	Assignment 2
Mar 11	Linear Value Function Approximation I	- SB Ch. 9 - Gradient Descent Notes (Harvard) - Human-level control through deep reinforcement learning (paper)	Quiz 6	Assignment 2
Mar 11	Non-Linear Value Function Approximation (DQN) I		Quiz 6	Assignment 2
Mar 16	Spring Break		None	Assignment 2
Mar 18	Spring Break		None	Assignment 2
Mar 23	Linear Value Function Approximation II	- SB Ch. 9 - Gradient Descent Notes (Harvard)	Quiz 7	Assignment 2
Mar 25	Linear Value Function Approximation: Step-by-step	- Deep Reinforcement Learning with Double Q-learning (paper) - CS231n (Stanford) CNN (notes)	Quiz 7	Assignment 2
Mar 25	Non-Linear Value Function Approximation II (Double DQN)		Quiz 7	Assignment 2
Mar 30	Deep Q-Networks (Dueling DQN, Experience Replay, Rainbow DQN)	- Dueling Network Architectures for Deep Reinforcement Learning (paper) - Prioritized Experience Replay (paper) - Rainbow: Combining Improvements in Deep Reinforcement Learning	Quiz 8	Assignment 2
Apr 1	Imitation Learning: Behavior Cloning + Inverse Reinforcement Learning	- End to End Learning for Self-Driving Cars (paper covering DAVE-2) (paper) - DAVE-2 Driving Lincoln (YouTube)	Quiz 8	Due Assignment 2
Apr 6	Policy Gradient I	SB Ch. 13	Quiz 9	Released Assignment 3
	Policy Gradient Proof (for reference)
	Dueling DQN Demo by Nikita Manish Sawant
	Double DQN Demo by Utkarsh Shukla
Apr 8	Policy gradient II	SB Ch. 13	Quiz 9	Assignment 3
Apr 8	Actor-Critic	SB Ch. 13	Quiz 9	Assignment 3
Apr 13	Advanced Policy Gradient Methods (A3C, DPG, DDPG)	- SB Ch. 13 - Asynchronous Methods for Deep Reinforcement Learning by (paper) - Deterministic Policy Gradient Algorithms (paper)	Quiz 10	Assignment 3
Apr 15	Advanced Policy Gradient Methods (DDPG, Importance Sampling)	- Continuous control with deep reinforcement learning (paper)	Quiz 10	Due Assignment 3
Apr 15	A2C Demo by Salil Santosh Dabholkar		Quiz 10	Due Assignment 3
Apr 20	Advanced Policy Gradient Methods (TRPO, PPO)	- Post about PPO by OpenAI - Trust Region Policy Optimization (paper) - Proximal Policy Optimization Algorithms (paper) - Learning Dexterity (training a robot hand to manipulate objects using PPO)	Released Quiz 11	Final Project
Apr 22	Advanced RL (GORILA, A3C, IMPALA, Ape-X, R2D3, Agent57)	- Distributed Prioritized Experience Replay. Horgan, et al., 2018 (Ape-X paper) - Making Efficient Use of Demonstrations to Solve Hard Exploration Problems. Gulcehre, et al., 2018 (R2D3 paper) - Agent57: Outperforming the Atari Human Benchmar (paper) - Agent57: Outperforming the human Atari benchmark(blog post by DeepMind)	Due Quiz 11	Final Project
Apr 22	PPO Demo by Utkarsh Shukla		Due Quiz 11	Final Project
Apr 27	Multiagent Reinforcement Learning	- Egorov, Maxim. "Multi-agent deep reinforcement learning." (paper) - Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (paper)	None	Final Project
Apr 27	DDPG Demo by Bipul Kumar		None	Final Project
Apr 29	Meta Learning	- One-Shot Imitation from Watching Videos by BAIR (UC Berkely) (blog post) - Meta Reinforcement Learning by Lilian Weng (blog post)	None	Final Project
May 4	Ethics & Safety in AI		None	Final Project
May 6	Final Review and Q&A		None	Final Project Presentation
May 6	Final Project Presentations		None	Final Project Presentation
May 11	Midterm II		None	None

Calendar

Add our schedule to your calendar here.

Logistics

Instructor: Alina Vereshchaka
Lectures: Mon, Wed 11:00 - 12:20pm, Webex ~~Baldy 110~~
Office hours: Mon, Wed 12:30pm - 2:00pm via Webex ~~@ Davis 338b~~ & by appointments
How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu

Teaching Assistant

Aman Johar (amanjoha[at]buffalo.edu)
Office hours: Tue, Thu 1:00 - 3:00pm via Webex ~~@ Davis Hall TA area and Davis 310~~ & by appointments

Friends of the Course

Mudita Sanjive (muditasa[at]buffalo.edu): Piazza & by appointments ~~Thu 12-1pm @ Davis Hall TA area~~
Weijin Zhu (weijinzh[at]buffalo.edu): Piazza & by appointments ~~Fri 5-6pm @ Davis Hall TA area~~
Nathan Margaglio (namargag[at]buffalo.edu): Piazza & by appointments
Yuhao Du (yuhaodu@buffalo.edu): By apporintments ~~You can meet Yuhao at Davis 309.~~

Key Topics

RL task formulation (action space, state space, environment definition)
Tabular based solutions (dynamic programming, Monte Carlo, temporal-difference)
Function approximation solutions (Deep Q-networks)
Policy gradient from basic (REINFORCE) towards advanced topics (Trust Regions Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), etc.)
Model-based reinforcement learning
Imitation learning (behavioral cloning, inverse RL)
Meta-learning
Multi-agent reinforcement learning (MARL), partial observable environments

Grading Rubrics

Course Component	% of grade
Assignments [3 assignments: 15% + 15% + 10%]	40%
Final Project	20%
Weekly Quizzes	10%
Midterm 1	15%
Midterm 2	15%

Bonus Points

Piazza Rockstar
Jupyter Demo Time
Candy Questions
Poster Session Partiipation
Other activities to be released as the course goes

Late Day Policy

Students can use up to 5 free late days throughout the course that can be applied towards the assignments and/or a course project (some assignments may have a hard deadline)
A late day extends the deadline by 24 hours
If there is more than 5 days after the deadline, a penalty of 25% for one day will be applied to any work submitted after that time
For the group work [course project] max late days left among teammates can be applied

Weekly Quizes - How does it work?

Released every Monday 9:00am, due by Sunday 11:59pm
Can be found at UBlearns > Assignments
Each quiz will contain 3-4 problems on topics covered that week
The quiz can be either multiple choise questions or written
11 quizzes in total, only 10 quizzes with the highest scores will be counted toward the final grade
If the quizz is multi-choice questions - three attempts allowed, only the highest score will be kept

Generic requirements:

Experience in Python

All class assignments are composed in Python (using numpy / Tensorflow / Keras / PyTorch). If student has extensive programming experience in a different language (e.g. C/ C++/ Matlab/ Javascript) this will be acceptable.

Calculus, Linear Algebra

Student should be comfortable taking derivatives and understanding matrix vector operations and notations.

Basic Probability and Statistics

Student should know basics of probabilities, conditional probability distributions, conditional expectations, Gaussian distributions, mean, standard deviation, etc.

Foundations of Machine Learning and Deep Learning

Some knowledge of machine learning and deep learning will help to better understand the cost functions, approximation methods, NN and CNN structures.

Reference Materials

There is no official textbook for the class, but a number of the supporting readings will come from:

Richard S. Sutton and Andrew G. Barto, "Reinforcement learning: An introduction", Second Edition, MIT Press, 2019 - is a classical book and covers all the basics
Lecture slides, relevant papers, and other materials will be added in the table above

Additional references, that can be useful:

Li, Yuxi. "Deep reinforcement learning." arXiv preprint arXiv:1810.06339 (2018). - an overview of the latest algorythms and applications in reinfocment learning
David Silver's course on Reiforcement Learning

Useful RL Materials

MDP Cheatsheet Reference (2 pages) by John Schulman (pdf)
Neural Network tutorial Article + Sample code

Usefull Tools:

Overleaf (LaTex online document generator) - great tool for creating reports (referral link)
Google Colab (online Jupyter Notebook with free GPU) (link)

Academic Integrity Policy

Academic integrity is a fundamental university value. No collaboration, cheating, and plagiarism is allowed in projects, quizes, and the exam. Those found violating academic integrity will get an immediate F in the course.

All the submissions will be checked using SafeAssign (integrated tool into UBlearns). This tool checks the submissions for originality. After each submission it returns similarity score along with the original source. It is based on the submitted works for the past semesters as well the current submissions. If there is a high similarity between the submissions from the same class, ALL students involved will be recorded though the Academic Dishonesty Report.
The submissions should include all the references. Kindly note that referencing the source does not mean you can copy/paste it fully and submit as your original work. Updating the hyperparameters or modifying the existing code is a subject to plagiarism. Your work has to be original.
Submitting a material that has been previously submitted, in whole or in substantial part is a subject for further investigation.
You work has to be done on your personally owned device or university provided equipment. Completing the assignment on your friend’s / neighbor's device is a subject for further investigation for both parties involved. Any cheating during the midterms will be subject to immediate 0.
In any suspicious case that includes a student or a group of students will be immediately passed to the undergraduate/graduate adviser and the case will be recorded using the Academic Dishonesty Report Form.
Please read more about UB Academic Integrity Policy 2019-20.

We want you to demonstrate your own achievements and showcase your own abilities during the course! From the course instructors side, we are glad to provide you all the help needed for you to succeed in the course. Here is some of the free resources provided by the University:

If you need help with English, check UB Writing Center (http://www.buffalo.edu/writing/make-an-appointment-old.html)
If you have issues with your device, the University provides access to computers, as well as equipment loans.
Your well-being is highly important, if you have any concerns, make sure to check Counseling Service.

Academic Integrity is a very high priority not only for our Department, but the University as a whole. We are glad to provide you help to ensure you achieve great results during the course, however we are not tolerate any kind of cheating.

Accessibility Resources

If you have any disability which requires reasonable accommodations to enable you to participate in this course, please contact the Office of Accessibility Resources, 60 Capen Hall, 645-2608, and also the instructor of this course. The office will provide you with information and review appropriate arrangements for reasonable accommodations. More details.

Diversity

The UB School of Engineering and Applied Sciences considers the diversity of its students, faculty, and staff to be a strength, critical to our success. We are committed to providing a safe space and a culture of mutual respect and inclusiveness for all. We believe a community of faculty, students, and staff who bring diverse life experiences and perspectives leads to a superior working environment, and we welcome differences in race, ethnicity, gender, age, religion, language, intellectual and physical ability, sexual orientation, gender identity, socioeconomic status, and veteran status.

FAQ

Is there any GPU available to use for our projects?

CCR is supporting our course with accessing to powefull GPU servers. If you need access to GPU, create an account at CCR and let me know, so I will add you to the resources.

I am in the waiting list, can you help me to enrol?

Unfortunately there is nothing we can do at this time. I would suggest to keep an eye at the enrollment. Typically some students drop the course right before the drop-date deadline (Feb 4), so if your are in the waiting list, there is a high chance you will get enrolled, so I would strongly suggest to visit the lectures, before the enrolment is finilized, even if you are not registered at this time.

Can this course satisfy breadth/depth requirement?

Yes, the course can be used to satisfy the depth requirement for the AI focus area for graduate level (CSE 510).

What programming language will be used?

We will be using Python as the programming language for the projects, also familarity with Keras/Tensorfow/PyTorch will help.

Is attendance required?

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself

I am highly interested in the course, can audit it?

Typically I welcome students interested in the topics to audit the course. Unfortunately this Spring our scheduled room is not big enough to fill all people interested. You are welcome to drop me an email one week after the class begins, I will give you updates if there is some space available.

Any suggestions or comments?

I would be glad to get a feedback from you, just send me an email.

CSE4/510: Reinforcement Learning

Spring 2020, Lectures: Mon/Wed 11:00am - 12:20pm

Course News

Our Group

Selected Final Projects Presentations

Course updates to reflect the COVID-19 induced transition to online classes [March 19, 2020]

Classes

Assignments

Final Project

Midterm II

Quizzes

Bonus Points

Syllabus

Office Hours

Communication

Course Staff

Description

Prerequisites

Redeem your Candy Question bonus.

Syllabus

Calendar

Logistics

Teaching Assistant

Friends of the Course

Key Topics

Grading Rubrics

Bonus Points

Late Day Policy

Weekly Quizes - How does it work?

Generic requirements:

Reference Materials

Useful RL Materials

Usefull Tools:

Academic Integrity Policy

Accessibility Resources

Diversity

FAQ

Is there any GPU available to use for our projects?

I am in the waiting list, can you help me to enrol?

Can this course satisfy breadth/depth requirement?

What programming language will be used?

Is attendance required?

I am highly interested in the course, can audit it?

Any suggestions or comments?