- Feb 26Follow Piazza for more recent course updates
- Feb 3 Our calendar has been updated with all OH (including Friends of the Course OH)
- Feb 3 We are proud that some of the brightest students from the previous semesters will join our Instructors team as Friends of Course. Please welcome - Mudita, Weijin and Nathan! They are willing to give you additional help and assist you in your success throughout the course.
- Jan 27 Movie about Alpha Go, we have talked about today (Trailer)
- Jan 27 Please complete the beginning of the semester survey here
- Jan 27 Welcome to all returning & new students. We have a great semester ahead!
Our GroupThis was a great although challenging semester! We have encountered a great challenge in transitioning from the off-line to on-line mode of learning. I am so grateful that you successfully rose to the challenge of this transition and that we were able to gain even more than expected.
Selected Final Projects Presentations
- Augmented Random Search (MuJoCo) Gautam Suryawanshi, Prajit Krisshna Kumar
- Group Equivariant Q-networks Kiran Vaddi
- Ship Docker Problem (MARL) Bhavin Jawade, Vamshi Krishna
- DeepRacer Car Xingtong Li
- City On Fire: Multi Agent Reinforcement Learning Srisai Karthik Neelamraju, Anantha Srinath Sedimbi
- UAV Learning to Fly Rugved Srinath Hattekar, Chen Zeng, Nanditha
- AWS DeepRacer Raunaq Jain, Bipul Kumar, Joseph Naro
- Halite III: Multi-agent Learning Victor Lee
- Cracking PHYRE with a World Model Sheng Liu
- Connected Autonomous Driving (Macad-Gym) Salil Dabholkar
- Building Goal Oriented Chatbots Charan Nama Arunkumar
- Multi-Agent RL Hoan Tran
- HTTP Streaming & Optimization Areas Bekir Oguzhan Turkkan
- Multi Agent Reinforcement Learning Peter M. VanNostrand
- Chase Tag Game (MARL) Gaurav Vijay Vergiya
Course updates to reflect the COVID-19 induced transition to online classes [March 19, 2020]
There will be a few changes in our course for the rest of the semester.
- Our classes will be held online using the WebEx platform during our course hours (Monday and Wednesday from 11am - 12.20pm). Please install WebEx here and try to login to the platform around 5 mins before the beginning of the class.
- To ensure the same iteration for the class, during the online lecture, we will have popup quizzes and polls (that are not graded) to ensure you are up to date with the material covered.
There are no changes to our assignments. Assignment 2 and Assignment 3 will follow the same due dates and should be submitted through UBlearns.
- Checkpoint and final materials will have to be submitted though UBlearns as planned.
- Final Project presentations will be done online. We will allocate WebEx or Google Hangout time slots prior to it.
Our second Midterm will be held online on the scheduled exam date (May 11, 2020). It will be done though UBlearns using LockDown Browser.
To ensure you have some interaction with your peers, we will allow you to complete quizzes collaboratively. Every week you will be randomly matched with someone from our class, and you will have to connect online and discuss the quiz questions and answers. This interaction is not mandatory, so if you do not wish to participate in this collaboration, please let me know. We wish to ensure that you have some active social interaction with your peers so that you may learn from each other and help each other understand course materials. This is an essential part of educational life, so we want to provide this for you.
- Jupyter Demo time will be held as usual during the second half of the class though the WebEx, current topics are listed on Piazza.
- We will have an active chat at our disposal and we will also welcome voice interactions, so we hope to keep the Candy Question Bonuses as usual (unfortunately without actual candy).
- More bonus points opportunities will be released as the course goes.
SyllabusThere will not be many changes to the course syllabus and assessments. The updated version can be found here.
Meetings will be held online though Webex during our usual OH schedule. During this time, there will be an active link that you can use to join. The link will be available for all people, so please contact course staff directly to set up an individual meeting if you feel uncomfortable sharing your question with your peers.
We will continue to use Piazza as the main resource for your questions, course news and all other details. The course materials and syllabus are posted on our course website. If you have any personal query or want to schedule an individual meeting, send us an email:
We, the course staff team, wish to assure you we will do our best to ensure you get the same high quality instruction and interaction we experienced in the course. Our Friends of the Course are also here to support you. Weijin, Mudita, Nathan and Yuhao are available for you per individual request. Just send an email to schedule a meeting online.
|Alina Vereshchaka (Instructor)||avereshc[at]buffalo.edu||Mon, Wed 12:30-2pm & by appointments|
|Aman Johar (TA)||amanjoha[at]buffalo.edu||Tue, Thu 1-3pm & by appointments|
|Weijin Zhu (Friend of the Course)||weijinzh[at]buffalo.edu||By appointments|
|Mudita Sanjive (Friend of the Course)||muditasa[at]buffalo.edu||By appointments|
|Nathan Margaglio (Friend of the Course)||nmargag[at]buffalo.edu||By appointments|
|Yuhao Du (Friend of the Course)||yuhaodu[at]buffalo.edu||By appointments|
Reinforcement learning is an area of machine learning, where an agent or a system of agents learn to archive a goal by interacting with their environment. RL is often seen as the third area of machine learning, in addition to supervised and unsupervised areas, in which learning of an agent occurs as a result of its own actions and interaction with the environment.
In recent years there has been success in reinforcement learning research in both theoretical and applied fields. It was applied in a variety of fields such as robotics, pattern recognition, personalized medical treatment, drug discovery, speech recognition, computer vision, and natural language processing. This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning. Students will progress towards larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms. We will also go over the recent methods that are based on reinforcement learning, such as imitation learning, meta learning and more complex environment formulations.
- CSE 410: CSE 250, MTH 309 and one of the following: EAS 305 or MTH 411 or STA 301.
- CSE 4/510: CSE4/574 or CSE4/555 or CSE4/573 or CSE4/568 is recommended to be either completed or taken during the same semester.
Redeem your Candy Question bonus.
[Bonus request should be submitted within two days after the lecture, otherwise they will expire.]
|Date||Lecture Topic||Recommended Reading||Quiz||Assignments|
|Jan 27||Introduction to Reinforcement Learning||
UBlearns > Assignments
|Jan 29||Markov Decision Process||SB (Sutton and Barton) Ch. 3 ||Quiz 1||None|
|Feb 3||Polices, Value Functions||SB Ch. 3||
UBlearns > Assignments
|Feb 5||Bellman Equations & Dynamic Programming||SB Ch. 4 ||
Quiz 1 (Wed @11:59pm)
Quiz 2 (Sun @11:59pm)
|Feb 10||Google Colab & Python Overview by Aman Johar||Bring your laptop to follow the discussion||
|RL Env Basics by Aman Johar|
|Assignment 1 Q&A|
|Feb 12||Learning and Planning with Tabular Methods||SB Ch. 5 ||
|Feb 17||Monte Carlo & Temporal Difference||SB Ch. 6.1-6.3||
|Dynamic Programming demo by Joseph Naro|
|Feb 19||Temporal Difference (TD(0), SARSA, Q-learning)||SB Ch. 6.4-6.5||
|TD(0) demo by Vaibhav Prakash Chhajed|
|Feb 24||Temporal Difference (Double Q-learning, n-step Bootstrapping)||Ch. 6.7, 7||
|Q-learning demo by Raunaq Jain|
|Feb 26||n-step Bootstrapping & TD(λ)||Ch. 7||
||Assignment 1 (Sun @11:59pm)|
|SARSA Demo by Bipul Kumar|
|Mar 2||Tabular Methods Solutions Overview||None
||Released Assignment 2|
|Midterm I Q&A|
|Mar 4||Midterm I||None
|Mar 9||Applications in RL Worskshop (more details)||
|Mar 11||Linear Value Function Approximation I||- SB Ch. 9|
- Gradient Descent Notes (Harvard)
- Human-level control through deep reinforcement learning (paper)
|Non-Linear Value Function Approximation (DQN) I|
|Mar 16||Spring Break||None
|Mar 18||Spring Break||None
|Mar 23||Linear Value Function Approximation II|| - SB Ch. 9|
- Gradient Descent Notes (Harvard)
Assignment 2 |
|Mar 25||Linear Value Function Approximation: Step-by-step||
- Deep Reinforcement Learning with Double
- CS231n (Stanford) CNN (notes)
|Non-Linear Value Function Approximation II (Double DQN)|
|Mar 30||Deep Q-Networks (Dueling DQN, Experience Replay, Rainbow DQN)||
- Dueling Network Architectures for Deep Reinforcement Learning (paper)|
- Prioritized Experience Replay (paper)
- Rainbow: Combining Improvements in Deep Reinforcement Learning
|Quiz 8||Assignment 2|
|Apr 1||Imitation Learning: Behavior Cloning + Inverse Reinforcement Learning||
- End to End Learning for Self-Driving
Cars (paper covering DAVE-2)
- DAVE-2 Driving Lincoln (YouTube)
|Quiz 8||Due Assignment 2|
|Apr 6||Policy Gradient I||SB Ch. 13||Quiz 9||Released Assignment 3|
|Policy Gradient Proof (for reference)|
|Dueling DQN Demo by Nikita Manish Sawant|
|Double DQN Demo by Utkarsh Shukla|
|Apr 8||Policy gradient II||SB Ch. 13||Quiz 9||Assignment 3|
|Apr 13||Advanced Policy Gradient Methods (A3C, DPG, DDPG)|| - SB Ch. 13
- Asynchronous Methods for Deep Reinforcement Learning by (paper)
- Deterministic Policy Gradient Algorithms (paper)
|Quiz 10||Assignment 3|
|Apr 15||Advanced Policy Gradient Methods (DDPG, Importance Sampling)|
- Continuous control with deep reinforcement learning (paper)
|Quiz 10||Due Assignment 3|
|A2C Demo by Salil Santosh Dabholkar|
|Apr 20||Advanced Policy Gradient Methods (TRPO, PPO)||- Post about PPO by OpenAI
- Trust Region Policy Optimization (paper)
- Proximal Policy Optimization Algorithms (paper)
- Learning Dexterity (training a robot hand to manipulate objects using PPO)
|Released Quiz 11||Final Project|
|Apr 22||Advanced RL (GORILA, A3C, IMPALA, Ape-X, R2D3, Agent57)||
- Distributed Prioritized Experience Replay. Horgan, et al., 2018 (Ape-X paper) |
- Making Efficient Use of Demonstrations to Solve Hard Exploration Problems. Gulcehre, et al., 2018 (R2D3 paper)
- Agent57: Outperforming the Atari Human Benchmar (paper)
- Agent57: Outperforming the human Atari benchmark(blog post by DeepMind)
|Due Quiz 11||Final Project|
|PPO Demo by Utkarsh Shukla|
|Apr 27||Multiagent Reinforcement Learning|| - Egorov, Maxim.
"Multi-agent deep reinforcement learning." (paper) |
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (paper)
|DDPG Demo by Bipul Kumar|
|Apr 29||Meta Learning|| - One-Shot Imitation from
Watching Videos by BAIR (UC Berkely) (blog post) |
- Meta Reinforcement Learning by Lilian Weng (blog post)
|May 4||Ethics & Safety in AI||None||Final Project|
|May 6||Final Review and Q&A||None||Final Project Presentation|
|Final Project Presentations|
|May 11||Midterm II||None
CalendarAdd our schedule to your calendar here.
- Instructor: Alina Vereshchaka
- Lectures: Mon, Wed 11:00 - 12:20pm, Webex
- Office hours: Mon, Wed 12:30pm - 2:00pm via Webex
@ Davis 338b& by appointments
- How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu
- Aman Johar (amanjoha[at]buffalo.edu)
- Office hours: Tue, Thu 1:00 - 3:00pm via Webex
@ Davis Hall TA area and Davis 310& by appointments
Friends of the Course
- Mudita Sanjive (muditasa[at]buffalo.edu): Piazza & by appointments
Thu 12-1pm @ Davis Hall TA area
- Weijin Zhu (weijinzh[at]buffalo.edu): Piazza & by appointments
Fri 5-6pm @ Davis Hall TA area
- Nathan Margaglio (namargag[at]buffalo.edu): Piazza & by appointments
- Yuhao Du (email@example.com): By apporintments
You can meet Yuhao at Davis 309.
- RL task formulation (action space, state space, environment definition)
- Tabular based solutions (dynamic programming, Monte Carlo, temporal-difference)
- Function approximation solutions (Deep Q-networks)
- Policy gradient from basic (REINFORCE) towards advanced topics (Trust Regions Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), etc.)
- Model-based reinforcement learning
- Imitation learning (behavioral cloning, inverse RL)
- Multi-agent reinforcement learning (MARL), partial observable environments
|Course Component||% of grade|
|Assignments [3 assignments: 15% + 15% + 10%]||40%|
- Piazza Rockstar
- Jupyter Demo Time
- Candy Questions
- Poster Session Partiipation
- Other activities to be released as the course goes
Late Day Policy
- Students can use up to 5 free late days throughout the course that can be applied towards the assignments and/or a course project (some assignments may have a hard deadline)
- A late day extends the deadline by 24 hours
- If there is more than 5 days after the deadline, a penalty of 25% for one day will be applied to any work submitted after that time
- For the group work [course project] max late days left among teammates can be applied
Weekly Quizes - How does it work?
- Released every Monday 9:00am, due by Sunday 11:59pm
- Can be found at UBlearns > Assignments
- Each quiz will contain 3-4 problems on topics covered that week
- The quiz can be either multiple choise questions or written
- 11 quizzes in total, only 10 quizzes with the highest scores will be counted toward the final grade
- If the quizz is multi-choice questions - three attempts allowed, only the highest score will be kept
- Experience in Python
- Calculus, Linear Algebra
- Basic Probability and Statistics
- Foundations of Machine Learning and Deep Learning
Student should be comfortable taking derivatives and understanding matrix vector operations and notations.
Student should know basics of probabilities, conditional probability distributions, conditional expectations, Gaussian distributions, mean, standard deviation, etc.
Some knowledge of machine learning and deep learning will help to better understand the cost functions, approximation methods, NN and CNN structures.
Reference MaterialsThere is no official textbook for the class, but a number of the supporting readings will come from:
- Richard S. Sutton and Andrew G. Barto, "Reinforcement learning: An introduction", Second Edition, MIT Press, 2019 - is a classical book and covers all the basics
- Lecture slides, relevant papers, and other materials will be added in the table above
- Li, Yuxi. "Deep reinforcement learning." arXiv preprint arXiv:1810.06339 (2018). - an overview of the latest algorythms and applications in reinfocment learning
- David Silver's course on Reiforcement Learning
Useful RL Materials
- MDP Cheatsheet Reference (2 pages) by John Schulman (pdf)
- Neural Network tutorial Article + Sample code
- Overleaf (LaTex online document generator) - great tool for creating reports (referral link)
- Google Colab (online Jupyter Notebook with free GPU) (link)
Academic Integrity PolicyAcademic integrity is a fundamental university value. No collaboration, cheating, and plagiarism is allowed in projects, quizes, and the exam. Those found violating academic integrity will get an immediate F in the course.
- All the submissions will be checked using SafeAssign (integrated tool into UBlearns). This tool checks the submissions for originality. After each submission it returns similarity score along with the original source. It is based on the submitted works for the past semesters as well the current submissions. If there is a high similarity between the submissions from the same class, ALL students involved will be recorded though the Academic Dishonesty Report.
- The submissions should include all the references. Kindly note that referencing the source does not mean you can copy/paste it fully and submit as your original work. Updating the hyperparameters or modifying the existing code is a subject to plagiarism. Your work has to be original.
- Submitting a material that has been previously submitted, in whole or in substantial part is a subject for further investigation.
- You work has to be done on your personally owned device or university provided equipment. Completing the assignment on your friend’s / neighbor's device is a subject for further investigation for both parties involved. Any cheating during the midterms will be subject to immediate 0.
- In any suspicious case that includes a student or a group of students will be immediately passed to the undergraduate/graduate adviser and the case will be recorded using the Academic Dishonesty Report Form.
- Please read more about UB Academic Integrity Policy 2019-20.
- If you need help with English, check UB Writing Center (http://www.buffalo.edu/writing/make-an-appointment-old.html)
- If you have issues with your device, the University provides access to computers, as well as equipment loans.
- Your well-being is highly important, if you have any concerns, make sure to check Counseling Service.
If you have any disability which requires reasonable accommodations to enable you to participate in this course, please contact the Office of Accessibility Resources, 60 Capen Hall, 645-2608, and also the instructor of this course. The office will provide you with information and review appropriate arrangements for reasonable accommodations. More details.
The UB School of Engineering and Applied Sciences considers the diversity of its students, faculty, and staff to be a strength, critical to our success. We are committed to providing a safe space and a culture of mutual respect and inclusiveness for all. We believe a community of faculty, students, and staff who bring diverse life experiences and perspectives leads to a superior working environment, and we welcome differences in race, ethnicity, gender, age, religion, language, intellectual and physical ability, sexual orientation, gender identity, socioeconomic status, and veteran status.
Is there any GPU available to use for our projects?
CCR is supporting our course with accessing to powefull GPU servers. If you need access to GPU, create an account at CCR and let me know, so I will add you to the resources.
I am in the waiting list, can you help me to enrol?
Unfortunately there is nothing we can do at this time. I would suggest to keep an eye at the enrollment. Typically some students drop the course right before the drop-date deadline (Feb 4), so if your are in the waiting list, there is a high chance you will get enrolled, so I would strongly suggest to visit the lectures, before the enrolment is finilized, even if you are not registered at this time.
Can this course satisfy breadth/depth requirement?
Yes, the course can be used to satisfy the depth requirement for the AI focus area for graduate level (CSE 510).
What programming language will be used?
We will be using Python as the programming language for the projects, also familarity with Keras/Tensorfow/PyTorch will help.
Is attendance required?
Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself
I am highly interested in the course, can audit it?
Typically I welcome students interested in the topics to audit the course. Unfortunately this Spring our scheduled room is not big enough to fill all people interested. You are welcome to drop me an email one week after the class begins, I will give you updates if there is some space available.
Any suggestions or comments?
I would be glad to get a feedback from you, just send me an email.