Course News

Dec 23: Happy Holidays!
Dec 23: Ethics in AI Workshop photos have been posted here
Dec 23: Congratulations to Luckyson Khaidem and Ankit Anand for receiving the Best Research Award for their work "Asynchronous DDQN ensemble through shared experience learning". (Details)
Dec 23: We had three teams presenting reinfocement learning projects during the CSE Demo Days 2019. Great work! (Details)
Nov 27: Hope everyone has a wonderful Thanksgiving and Fall break!
Nov 27: Please complete our course evaluation at SmartEval
Nov 21: [Project 3] Register your team for presentation here.
Nov 8: [Project 3] Part 1 & Proposal is due Nov 13
Oct 29: Check out details on how to take part in CSE Demo Days
Oct 17: Project 2 is due in 10 days!
Oct 17: New facinating project by OpenAI -- Solving Rubik’s Cube with a Robot Hand
Oct 15: Register your team (up to 2 people) for Project 3 here. Deadline: Nov 1
Oct 8: Project 1 grades and mid-Semester review are released
Oct 1: Tetris Challenge (bonus points) is due Wedensday, 2 October at 10pm [details on Piazza]
Sep 19: Check out new project by OpenAI Emergent Tool Use from Multi-Agent Interaction
Sep 19: Project 1 is due in 10 days!
Sep 3: Quiz 0 is due on Wednesday @11:59pm
Sep 3: Welcome our Friends of the course Nathan and Anurag that are here to give you additional help to explore RL
Aug 29: If you are about to watch something during the weekend, here is good movie about Alpha Go, we have talked about today (Trailer)
Aug 27: Please complete the survey here
Aug 27: Welcome to all returning & new students. We have a great semester ahead!

Our Group

Thank you everyone for a great semester! It was a great pleasure working with all of you during our challenging course. I wish you a happy and productive winter break and hope that you will continue your journey in reinforcement learning!

I also want to give special thanks to the course TA Yuhao Du and our Friends of the Course Anurag Saykar and Nathan Margaglio for their constant support. Thank you for putting in all of your effort and encouraging students to keep motivated and supporting them throughout our course!

RL Course Fall19 Group Photo — Reinfrocement Learning Course, Fall 2019

Description

This course is intended for students interested in artificial intelligence. Reinforcement learning is an area of machine learning where an agent learns how to behave in an environment by performing actions and assessing the results. Reinforcement learning is how Google DeepMind created the AlphaGo system that beat a high-ranking Go player and how AlphaStar become the first artificially intelligent system to defeat a top professional player in StarCraft II. We will study the fundamentals and practical applications of reinforcement learning and will cover the latest methods used to create agents that can solve a variety of complex tasks, with applications ranging from gaming to finance to robotics.

Syllabus

Date	Lecture Topic	Recommended Reading	Quiz	Project
Aug 27	Introduction to Reinforcement Learning		Quiz 0 (UBlearns > Assignments) Released	None
Aug 27	Course Logistics		Quiz 0 (UBlearns > Assignments) Released	None
Aug 29	Markov Decision Process	SB (Sutton and Barton) Ch. 3	Quiz 0	None
Aug 29	Modeling Choises	SB (Sutton and Barton) Ch. 3	Quiz 0	None
Sep 3	Polices, Value Functions & Bellman Equations	SB (Sutton and Barton) Ch. 3	Quiz 0 is due on Wednesday @11:59pm Quiz 1 Released	Project 1 Released UBlearns > Assignments
Sep 3	Python/Google Colab overview	SB (Sutton and Barton) Ch. 3	Quiz 0 is due on Wednesday @11:59pm Quiz 1 Released	Project 1 Released UBlearns > Assignments
Sep 5	Dynamic Programming	SB (Sutton and Barton) Ch. 4	Quiz 1 Due Sunday @11:59pm	Project 1
Sep 5	Gym environments basics	SB (Sutton and Barton) Ch. 4	Quiz 1 Due Sunday @11:59pm	Project 1
Sep 10	Learning and Planning with Tabular Methods	SB Ch. 5	Quiz 2 Released	Project 1
Sep 10	Monte Carlo	SB Ch. 5	Quiz 2 Released	Project 1
Sep 12	Monte Carlo Tree Search	- Monte-Carlo tree search and rapid action value estimation in computer Go - SB Ch. 6.1-6.3	Quiz 2 Due Sunday @11:59pm	Project 1
	Temporal Difference (TD(0))
	TD(0) Demo by Aman Khurana
Sep 17	Tabular Methods Review	SB Ch. 6.4-6.5	Quiz 3 Released	Project 1
	Model Free RL (Temporal Difference)
	SARSA Demo by Manan Ajit Oza
Sep 19	Model Free RL (Q-Learning)	SB Ch. 6.4-6.5	Quiz 3 Due Monday @11:59pm	Project 1
	Q-learning Demo by Yash Nitin Mantri
	Sep 24				Q-learning step-by-step	SB Ch. 9.1-9.4	Quiz 4 Released	Project 1
RL with function approximation	Sep 24					SB Ch. 9.1-9.4	Quiz 4 Released	Project 1
Sep 26	Linear Value Function Approximation	- SB Ch. 9 - Gradient Descent Notes (Harvard)	Quiz 4 Due Monday @11:59pm	Project 1 Due: Sunday @ 11:59pm
Oct 1	Linear Value Function Approximation: Step-by-step	- Human-level control through deep reinforcement learning (paper)	Quiz 5 Released	Project 2 Released
	Non-Linear Value Function Approximation I
	Oct 3				Non-Linear Value Function Approximation II (DQN)	- Playing Atari with Deep Reinforcement Learning (paper) - CS231n(Stanford) CNN (notes)	Quiz 5 Due Monday @11:59pm	Project 2
Convolution Neural Networks (CNN) Review	Oct 3						Quiz 5 Due Monday @11:59pm	Project 2
Oct 8	Double Deep Q-Networks	- Deep Reinforcement Learning with Double Q-learning (paper)	Quiz 6	Project 2
Oct 8	DQN Demo by Arooshi Avasthy		Quiz 6	Project 2
Oct 10	Deep Q-Networks (Douling DQN, Experience Replay)	- Dueling Network Architectures for Deep Reinforcement Learning (paper) - Prioritized Experience Replay (paper)	Quiz 6 Due Monday @11:59pm	Project 2
Oct 10	Imitation Learning: Behavior Cloning		Quiz 6 Due Monday @11:59pm	Project 2
Oct 15	Imitation Learning: Behavior Cloning + Inverse Reinforcement Learning	- End to End Learning for Self-Driving Cars (paper covering DAVE-2) (paper) - DAVE-2 Driving Lincoln (YouTube)	Quiz 7	Project 2 Project 3 [DRAFT] released
Oct 15	Double DQN Demo by Xiao Zhang		Quiz 7	Project 2 Project 3 [DRAFT] released
Oct 17	Policy gradient I (Policy Gradient Theorem)	SB Ch. 13	Quiz 7 Due Monday @11:59pm	Project 2
Oct 22	Policy gradient II (REINFORCE)	SB Ch. 13	Quiz 8	Project 2
Oct 24	Tabular Methods and Deep Q-Networks Review		Quiz 8 Due Monday @11:59pm	Project 2 Due: Sunday @ 11:59pm
Oct 29	Policy gradient with Baselines	SB Ch. 13	Quiz 9	Project 3
Oct 29	REINFORCE Demo by Xiao Zhang	SB Ch. 13	Quiz 9	Project 3
Oct 31	Actor-Critic (A2C, A3C)	- SB Ch. 13 - Asynchronous Methods for Deep Reinforcement Learning by (paper)	Quiz 9 Due Monday @11:59pm	Project 3 Register your team by Nov 1
Nov 5	Advanced Policy Gradient Methods (DPG, DDPG, Importance Sampling)	- Deterministic Policy Gradient Algorithms (paper) - Continuous control with deep reinforcement learning (paper)	Quiz 10	Project 3
Nov 5	Nov 7		Quiz 10	Project 3	Advanced Policy Gradient Methods (TRPO, PPO)	- Post about PPO by OpenAI - Trust Region Policy Optimization (paper) - Proximal Policy Optimization Algorithms (paper)	Quiz 10 Due Monday @11:59pm	Project 3 Part 1 is due Nov 13
Nov 12	MDP + POMDP + Dec-POMDP		Quiz 11	Project 3 Part 1 & Proposal is due Nov 13
Nov 14	Multiagent Reinforcement Learning	- Egorov, Maxim. "Multi-agent deep reinforcement learning." (paper) - Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (paper)	Quiz 11 Due Monday @11:59pm	Project 3
Nov 19	Model-Based Reinforcement Learning (pdf)	- Deep Dynamics Models for Dexterous Manipulation by BAIR (UC Berkely) (blog post) - Model Based Reinforcement Learning for Atari (paper)	Quiz 12	Project 3
Nov 21	Meta Learning	- One-Shot Imitation from Watching Videos by BAIR (UC Berkely) (blog post) - Meta Reinforcement Learning by Lilian Weng (blog post)	Quiz 12 Due Monday @11:59pm	Project 3
Nov 26	Ethics in AI		No Quizz	Project 3
Nov 28	Thanksgiving Break
Dec 3	Review Session		No Quizz	Project 3 Presentations
Dec 5	Review Session & End of the Course		No Quizz	None

CSE Demo Day Results

Three teams were chosen to present their work during the CSE Demo Day on December 6, 2019. More about CSE Demo Days.

Luckyson Khaidem and Ankit Anand "Asynchronous DDQN ensemble through shared experience learning". View poster
Best Research Award (more details)

Alina Vereshchaka, Luckyson Khaidem and Ankit Anand (sourse)

Vishva Nitin Patel and Leena Manohar Patil "Continuous control with deep reinforcement learning". View poster

Vishva_Nitin_Patel_Leena_Manohar_Patil — Leena Manohar Patil and Vishva Nitin Patel

Shashank Bhat and Anirudh Sridhar "Policy Gradient Updation using Proximal Policy Optimization". View poster

Logistics

Instructor: Alina Vereshchaka
Lectures: Tue, Thu 3:30 - 4:50pm, Norton 213
Office hours: Tue, Thu 2:00 - 3:00pm @ Davis Hall or online & by appointments
How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu

Teaching Assistant

Yuhao Du (yuhaodu[at]buffalo.edu)
Office hours: Mon, Wed 3:30 - 6:00pm @ Davis Hall TA area or online & by appointments

Friends of the Course

Nathan Margaglio (namargag[at]buffalo.edu): Piazza & by appointments
Anurag Saykar (anuragan[at]buffalo.edu): Piazza & Thu 5-6pm @ Davis Hall TA area

Calendar

Add our schedule to your calendar here.

CSE Demo Day Fall 2019

You are welcome to take part in the CSE Demo Day, that is hold at the end of every semester by the CSE Department. Check more details from the previous events here.

Details:

Time: Dec 6, noon - 6pm
Location: Davis Hall, 1-2 floors
It is a great opportunity to meet with local companies
Research posters template can be found here

To participate in the CSE Demo Day as part of CSE510 RL course:

Choose a topic for Project 3
Get preliminar results
Send an email with the poster draft, project name and short description to avereshc[at]buffalo.edu by Nov 26
If selected
- The CSE Deparment will help with printing the poster (up to 24" x 36")
- It will satify your presentation points for Project 3
- You may get bonus points for your eforts!

Reference Materials

There is no official textbook for the class, but a number of the supporting readings will come from:

Richard S. Sutton and Andrew G. Barto, "Reinforcement learning: An introduction", Second Edition, MIT Press, 2019 - is a classical book and covers all the basics
Lecture slides, relevant papers, and other materials will be added in the table above

Additional references, that can be useful:

Li, Yuxi. "Deep reinforcement learning." arXiv preprint arXiv:1810.06339 (2018). - an overview of the latest algorythms and applications in reinfocment learning
David Silver's course on Reiforcement Learning

Useful RL Materials

MDP Cheatsheet Reference (2 pages) by John Schulman (pdf)
Neural Network tutorial Article + Sample code

Usefull Tools:

Overleaf (LaTex online document generator) - great tool for creating reports (link (refferec))
Google Colab (online Jupyter Notebook with free GPU) (link)

Evaluation

50% - Projects (3 projects: 15 + 15 + 20)
20% - Weekly quizzes
30% - Final Exam

Bonus points

Piazza Rockstar
Demo Time
Candy Questions

Note: Each component will receive a numerical score. The course grade will be based on the weighted total of all components and the class curve. The curve will be different for CSE410 and CSE510. The exam will be closed-book, and closed-notes.

Office hours

Office hours start from Thursday, Aug 29
Office hours can be held in person at Davis 310 or online. For online meeting, you will need to create a Google Hangout event and invite me (avereshc[at]buffalo.edu)

Projects

The course consists of three projects. Projects will be done individually.

Project 1 - Defining and solving RL environment
Project 2 - Value Function Approximation (Deep Q-Networks)
Project 3 - Policy gradient

Late Day Policy

You can use up to 5 late days
A late day extend the deadline by 24 hours
If you have more then 5 days after the deadline, a penalty of 25% for one day will be applied to any work submitted after that time.

Weekly Quizes - How does it work?

Released every Tuesday 9:00am, due by Monday 11:59pm
Can be found at UBlearns > Assignments
Each quiz will contain 3-4 problems on topics covered that week
At the end of a submission the system will give you the final score
13 quizzes in total, only 12 quizzes with the highest scores will be counted toward the final grade
Three attempts allowed, only the highest score will be kept

Academic Integrity Policy

Academic integrity is a fundamental university value. No collaboration, cheating, and plagiarism is allowed in projects, quizes, and the exam. Those found violating academic integrity will get an immediate F in the course. Please refer to the Academic Integrity Policy for more details.

FAQ

When is the final?

Our Final exam date is scheduled on Dec 12 at 3:30-6:30pm. You can also check it under MyUB Hub account.

Can this course satisfy breadth/depth requirement?

Yes, the course can be used to satisfy the depth requirement for the AI focus area.

What do I need to do before class starts?

Sign-up for Piazza (if you do not have an account already) and enroll into the "CSE 4/510 Introduction to Reinforcement Learning" class.
Confirm that the class shows up in your UBLearns account.

Where is the course syllabus?

Syllabus is here.

What programming language will be used?

We will be using Python as the programming language for the projects, also familarity with Keras/Tensorfow/PyTorch will help.

Is attendance required?

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself

I am highly interested in the course, but I cannot register, can I attend?

Typically I welcome students interested in the topics to audit the course. Unfortunately this Fall our scheduled room is not big enough to fill all people interested. You are welcome to drop me an email one week after the class begins, I will give you updates if there is some space available.

CSE4/510: Reinforcement Learning

Fall 2019, Lectures: Tue/Thu 3:30 - 4:50pm, Norton 213