Reinforcement Learning, Fall 21

Description

Reinforcement learning is an area of machine learning, where an agent or a system of agents learn to archive a goal by interacting with their environment. RL is often seen as the third area of machine learning, in addition to supervised and unsupervised areas, in which learning of an agent occurs as a result of its own actions and interaction with the environment.

In recent years there has been success in reinforcement learning research in both theoretical and applied fields. It was applied in a variety of fields such as robotics, pattern recognition, personalized medical treatment, drug discovery, speech recognition, computer vision, and natural language processing. This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning. Students will progress towards larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms. We will also go over the recent methods that are based on reinforcement learning, such as imitation learning, meta learning and more complex environment formulations.

Course Staff	Contact	Meet
Alina Vereshchaka (Instructor)	avereshc[at]buffalo.edu	To be confirmed
To be confirmed (TA)	To be confirmed	To be confirmed

Syllabus can be found here

Logistics

Instructor: Alina Vereshchaka
Lectures: Tue, Thu 11:10 - 12:25pm, Talbert Hall 107 (campus map)
Office hours: To be confirmed
How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu

Key Topics

RL task formulation (action space, state space, environment definition). Defining RL environments
Tabular based solutions (dynamic programming, Monte Carlo, temporal-difference)
Linear value function approximation
Non-linear value function approximation (Deep Q-networks: Double DQN, Dueling DQN, PER)
Policy gradient from basic (REINFORCE) towards advanced actor-critic algorithms (proximal policy optimization, deep deterministic policy gradient, etc.)
Multi-agent reinforcement learning
Imitation learning (behavioral cloning)
Emerging topics in RL
Ethics & safety in AI

Grading Rubrics

Course Component	% of grade
Assignments [3 assignments: 15% + 15% + 10%]	40%
Final Project	20%
Weekly Quizzes	10%
Midterm I	15%
Midterm II	15%

Bonus Points

Piazza Rockstar
Jupyter Demo Time
Candy Questions
Poster Session Partiipation
Other activities to be released as the course goes

Late Day Policy

Students can use up to 5 free late days throughout the course that can be applied towards the assignments (some assignments may have a hard deadline)
A late day extends the deadline by 24 hours If there is more than 5 days after the deadline, a penalty of 25% for one day will be applied to any work submitted after that time

Weekly Quizes - How does it work?

Released every Tuesday 9:00am, due by Monday 11:59pm
Can be found at UBlearns > Assignments
Each quiz contains 3-5 problems on topics covered that week
Quizzes come in various forms, including multiple choice, multiple answer, written and coding formats
At the end of a submission, the system will give you your final score, unless it is in the written or coding format
11 quizzes in total, only 10 quizzes with the highest scores will be counted
Three attempts are allowed, unless it is in the written or coding format

Prerequisites

CSE4/574 or CSE4/555 or CSE4/573

A few points to make sure you have the right expectations for the course so that your classroom experience will be positive.

All of the assignments will be completed in Python and it is assumed that you have worked with it before. Due to a busy schedule, no tutorials on Python foundations will be offered.
The course requires you to have prior experience working with machine learning models. It is recommended that you have taken one of our AI courses or have completed a course equivalent.
Our second and third assignments and the final project will require us to use any of the following frameworks: Keras/PyTorch/Tensorflow. The assignment will require to build a deep learning model, so prior experience with these frameworks will be very useful.

Reference Materials

There is no official textbook for the class, but a number of the supporting readings will come from:

Richard S. Sutton and Andrew G. Barto, "Reinforcement learning: An introduction", Second Edition, MIT Press, 2019 - is a classical book and covers all the basics
Lecture slides, relevant papers, and other materials will be added in the table above

Additional references, that can be useful:

Li, Yuxi. "Deep reinforcement learning." arXiv preprint arXiv:1810.06339 (2018). - an overview of the latest algorythms and applications in reinfocment learning
David Silver's course on Reiforcement Learning

Useful RL Materials

MDP Cheatsheet Reference (2 pages) by John Schulman (pdf)
Neural Network tutorial Article + Sample code

Usefull Tools:

Overleaf (LaTex online document generator) - great tool for creating reports
Google Colab (online Jupyter Notebook with free GPU) (link)

Academic Integrity Policy

Academic integrity is a fundamental university value. No collaboration, cheating, and plagiarism is allowed in projects, quizes, and the exam. Those found violating academic integrity will get an immediate F in the course.

Academic integrity is a fundamental university value.
No collaboration, cheating, and plagiarism is allowed in assignments, quizzes or the midterms.
The catalog describes plagiarism as “Copying or receiving material from any source and submitting that material as one’s own, without acknowledging and citing the particular debts to the source (quotations, paraphrases, basic ideas), or in any other manner representing the work of another as one’s own.”
Any suspicious cases will be officially reported using the Academic Dishonesty Report form and all bonus points will be subject to removal from the student’s final evaluation.
Those found violating academic integrity more than once throughout their program will receive an immediate F in the course.
Please refer to the UB Academic Integrity Policy for more details.

Academic Integrity is a very high priority not only for our Department, but the University as a whole. We are glad to provide you help to ensure you achieve great results during the course, however we are not tolerate any kind of cheating.

Hepfull Resourses

We want you to demonstrate your own achievements and showcase your own abilities during the course! From the course instructors side, we are glad to provide you all the help needed for you to succeed in the course. Here is some of the free resources provided by the University:

If you need help with English, check UB Writing Center
If you have issues with your device, the University provides access to computers, as well as equipment loans.
Your well-being is highly important, if you have any concerns, make sure to check Counseling Service.

Accessibility Resources

If you have a disability and may require some type of instructional and/or examination accommodation, please inform me early in the semester so that we can coordinate the accommodations you may need. If you have not already done so, please contact the Office of Accessibility Services, 60 Capen Hall, 645-2608, and also the instructor of this course. The office will provide you with information and review appropriate arrangements for accommodations. More details.

Diversity

The UB School of Engineering and Applied Sciences considers the diversity of its students, faculty, and staff to be a strength, critical to our success. We are committed to providing a safe space and a culture of mutual respect and inclusiveness for all. We believe a community of faculty, students, and staff who bring diverse life experiences and perspectives leads to a superior working environment, and we welcome differences in race, ethnicity, gender, age, religion, language, intellectual and physical ability, sexual orientation, gender identity, socioeconomic status, and veteran status.

FAQ

Is there any GPU available to use for our projects?

CCR is supporting our course with accessing to powefull GPU servers. If you need access to GPU, create an account at CCR and let me know, so I will add you to the resources.

I am in the waiting list, can you help me to enrol?

Unfortunately there is nothing we can do at this time. I would suggest to keep an eye at the enrollment. Typically some students drop the course right before the drop-date deadline, so if your are in the waiting list, there is a high chance you will get enrolled, so I would strongly suggest to visit the lectures, before the enrolment is finilized, even if you are not registered at this time.

Can this course satisfy breadth/depth requirement?

Yes, the course can be used to satisfy the depth requirement for the AI focus area for graduate level (CSE 546).

What programming language will be used?

We will be using Python (version >3.9) as the programming language for the projects, also familarity with Keras/Tensorfow/PyTorch will help.

Is attendance required?

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself

I am highly interested in the course, can audit it?

Typically I welcome students interested in the topics to audit the course. Unfortunately this Fall our scheduled room is not big enough to fill all people interested. You are welcome to drop me an email one week after the class begins, I will give you updates if there is some space available.

Any suggestions or comments?

I would be glad to get a feedback from you, just send me an email.

CSE4/546: Reinforcement Learning

Fall 21, Lectures: Tue/Thu 11:10am - 12:25pm, Talbert 107