CSE4/546: Reinforcement Learning

Fall 21, Lectures: Tue/Thu 11:10am - 12:25pm, Talbert 107

Description

Reinforcement learning is an area of machine learning, where an agent or a system of agents learn to archive a goal by interacting with their environment. RL is often seen as the third area of machine learning, in addition to supervised and unsupervised areas, in which learning of an agent occurs as a result of its own actions and interaction with the environment.

In recent years there has been success in reinforcement learning research in both theoretical and applied fields. It was applied in a variety of fields such as robotics, pattern recognition, personalized medical treatment, drug discovery, speech recognition, computer vision, and natural language processing. This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning. Students will progress towards larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms. We will also go over the recent methods that are based on reinforcement learning, such as imitation learning, meta learning and more complex environment formulations.

Course Staff Contact Meet
Alina Vereshchaka (Instructor) avereshc[at]buffalo.edu To be confirmed
To be confirmed (TA) To be confirmed To be confirmed

Syllabus can be found here

Logistics

  • Instructor: Alina Vereshchaka
  • Lectures: Tue, Thu 11:10 - 12:25pm, Talbert Hall 107 (campus map)
  • Office hours: To be confirmed
  • How to contact me: Please use Piazza for all questions related to lectures, quizes, and assignments. For any personal quaries, email avereshc[at]buffalo.edu

Key Topics

  • RL task formulation (action space, state space, environment definition). Defining RL environments
  • Tabular based solutions (dynamic programming, Monte Carlo, temporal-difference)
  • Linear value function approximation
  • Non-linear value function approximation (Deep Q-networks: Double DQN, Dueling DQN, PER)
  • Policy gradient from basic (REINFORCE) towards advanced actor-critic algorithms (proximal policy optimization, deep deterministic policy gradient, etc.)
  • Multi-agent reinforcement learning
  • Imitation learning (behavioral cloning)
  • Emerging topics in RL
  • Ethics & safety in AI

Grading Rubrics

Course Component % of grade
Assignments [3 assignments: 15% + 15% + 10%] 40%
Final Project 20%
Weekly Quizzes 10%
Midterm I 15%
Midterm II 15%

Bonus Points

  • Piazza Rockstar
  • Jupyter Demo Time
  • Candy Questions
  • Poster Session Partiipation
  • Other activities to be released as the course goes

Late Day Policy

  • Students can use up to 5 free late days throughout the course that can be applied towards the assignments (some assignments may have a hard deadline)
  • A late day extends the deadline by 24 hours If there is more than 5 days after the deadline, a penalty of 25% for one day will be applied to any work submitted after that time

Weekly Quizes - How does it work?

  • Released every Tuesday 9:00am, due by Monday 11:59pm
  • Can be found at UBlearns > Assignments
  • Each quiz contains 3-5 problems on topics covered that week
  • Quizzes come in various forms, including multiple choice, multiple answer, written and coding formats
  • At the end of a submission, the system will give you your final score, unless it is in the written or coding format
  • 11 quizzes in total, only 10 quizzes with the highest scores will be counted
  • Three attempts are allowed, unless it is in the written or coding format

Prerequisites

CSE4/574 or CSE4/555 or CSE4/573

A few points to make sure you have the right expectations for the course so that your classroom experience will be positive.
  • All of the assignments will be completed in Python and it is assumed that you have worked with it before. Due to a busy schedule, no tutorials on Python foundations will be offered.
  • The course requires you to have prior experience working with machine learning models. It is recommended that you have taken one of our AI courses or have completed a course equivalent.
  • Our second and third assignments and the final project will require us to use any of the following frameworks: Keras/PyTorch/Tensorflow. The assignment will require to build a deep learning model, so prior experience with these frameworks will be very useful.

Reference Materials

There is no official textbook for the class, but a number of the supporting readings will come from: Additional references, that can be useful:

Useful RL Materials

Usefull Tools:

Academic Integrity Policy

Academic integrity is a fundamental university value. No collaboration, cheating, and plagiarism is allowed in projects, quizes, and the exam. Those found violating academic integrity will get an immediate F in the course.
  1. Academic integrity is a fundamental university value.
  2. No collaboration, cheating, and plagiarism is allowed in assignments, quizzes or the midterms.
  3. The catalog describes plagiarism as “Copying or receiving material from any source and submitting that material as one’s own, without acknowledging and citing the particular debts to the source (quotations, paraphrases, basic ideas), or in any other manner representing the work of another as one’s own.”
  4. Any suspicious cases will be officially reported using the Academic Dishonesty Report form and all bonus points will be subject to removal from the student’s final evaluation.
  5. Those found violating academic integrity more than once throughout their program will receive an immediate F in the course.
  6. Please refer to the UB Academic Integrity Policy for more details.

Academic Integrity is a very high priority not only for our Department, but the University as a whole. We are glad to provide you help to ensure you achieve great results during the course, however we are not tolerate any kind of cheating.

Hepfull Resourses

We want you to demonstrate your own achievements and showcase your own abilities during the course! From the course instructors side, we are glad to provide you all the help needed for you to succeed in the course. Here is some of the free resources provided by the University:

  1. If you need help with English, check UB Writing Center
  2. If you have issues with your device, the University provides access to computers, as well as equipment loans.
  3. Your well-being is highly important, if you have any concerns, make sure to check Counseling Service.

Accessibility Resources

If you have a disability and may require some type of instructional and/or examination accommodation, please inform me early in the semester so that we can coordinate the accommodations you may need. If you have not already done so, please contact the Office of Accessibility Services, 60 Capen Hall, 645-2608, and also the instructor of this course. The office will provide you with information and review appropriate arrangements for accommodations. More details.

Diversity

The UB School of Engineering and Applied Sciences considers the diversity of its students, faculty, and staff to be a strength, critical to our success. We are committed to providing a safe space and a culture of mutual respect and inclusiveness for all. We believe a community of faculty, students, and staff who bring diverse life experiences and perspectives leads to a superior working environment, and we welcome differences in race, ethnicity, gender, age, religion, language, intellectual and physical ability, sexual orientation, gender identity, socioeconomic status, and veteran status.

FAQ

Is there any GPU available to use for our projects?

CCR is supporting our course with accessing to powefull GPU servers. If you need access to GPU, create an account at CCR and let me know, so I will add you to the resources.

I am in the waiting list, can you help me to enrol?

Unfortunately there is nothing we can do at this time. I would suggest to keep an eye at the enrollment. Typically some students drop the course right before the drop-date deadline, so if your are in the waiting list, there is a high chance you will get enrolled, so I would strongly suggest to visit the lectures, before the enrolment is finilized, even if you are not registered at this time.

Can this course satisfy breadth/depth requirement?

Yes, the course can be used to satisfy the depth requirement for the AI focus area for graduate level (CSE 546).

What programming language will be used?

We will be using Python (version >3.9) as the programming language for the projects, also familarity with Keras/Tensorfow/PyTorch will help.

Is attendance required?

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions related to quizes or projects and these are harder to do and benefit from by yourself

I am highly interested in the course, can audit it?

Typically I welcome students interested in the topics to audit the course. Unfortunately this Fall our scheduled room is not big enough to fill all people interested. You are welcome to drop me an email one week after the class begins, I will give you updates if there is some space available.

Any suggestions or comments?

I would be glad to get a feedback from you, just send me an email.