CSE 701: Some Recent Progresses in Machine Learning

Course Description

Machine learning (ML) and artificial Intelligence (AI) is transforming society and promoting various innovations in computer vision, language processing, 5G networks, edge computing, autonomous systems, healthcare etc. In this seminar, we will review some recent breakthroughs and progresses in the theoretical foundations, algorithms and applications of modern machine learning. The first part of this topic will include various new optimization algorithms such as adaptive gradient methods, bilevel optimizers, federated optimizers and their applications in ML. The second part will discuss the generalization analysis of training overparameterized models and neural networks. The final part will talk about recent hot topics in modern ML such as meta-learning, continual learning and contrastive learning. All students in this seminar are expected to read, discuss, present and write summaries of selected papers on such topics.

Logistics

Instructor: Kaiyi Ji, assistant professor at CSE of UB

Contact: Davis Hall 338G, kaiyiji@buffalo.edu

Location and time: Talbrt 103. Every Wednesday 4:00PM - 6:50PM.

Office hours: On demand. Email me for appointments.

Course syllabus: available at PDF.

Course piazza: submit summaries, discussions via piazza.

To submit your summary (PDF), post a note only visible to me under the assignment/summary folder for that week.

References

There is no required textbook. Some suggested references:

Z. Allen-Zhu, Y. Li, and Z. Song. “A convergence theory for deep learning via over-parameterization,” ICML, 2019.
L. Bottou, F.E Curtis and J. Nocedal. “Optimization methods for large-scale machine learning,” Siam Review, 2018.
I. Goodfellow, Y. Bengio, A. Courville. “Deep learning,” MIT press, 2016.

Course Objective

This course is to help students understand the algorithmic design and theoretical analysis (including optimization theory and statistical theory) in modern machine learning, and further learn how to use them to various applications such as overparameterized models, deep learning, adversarial learning, meta-learning, continual learning, contrastive learning, etc. Some other skills such as presentation, paper summary are also practiced.

Course Requirements

Finish the required reading before each lecture.
Write a short summary (at most 1 page) of the paper(s) presented in each lecture. It needs to summarize the problem, algorithms, technical contributions and experiments. The summary is due every Tuesday, 11:59 pm (1 day before the next lecture). No late submission will be accepted.
Present one of the selected papers throughout the semester. Each presentation should last for 30-50 min and contain 20-40 slides, and should give a brief introduction to the background, motivation, problem, algorithm, theory and experiments. You are encouraged but not mandatory to share the slides before the presentation and a link to your slides will be posted on the course website if you do.
Each lecture involves at most three presentations. See course schedule!

Grading Policy

30% for class participation

35% for paper summaries (14*2.5%=35%)

35% for presentation

The seminar is graded in S/U. Satisfactory score >= 70% and unsatisfactory score < 70%.

Course Schedule (Please sign up papers here: link)

Date	Presenter	Topic	Readings	Slides
2/1/2023		Introduction		S1
2/8/2023	1. B. Neelima Srilakshmi 2. Kajol 3. Tanmay Deshmukh	Stochastic algorithms	1. Bottou, Léon. Stochastic gradient descent tricks. Neural networks: Tricks of the trade, 2012. 2. Kingma, Diederik P., and Jimmy Ba. Adam: A method for stochastic optimization. ICLR 2014. 3. Xie, Xingyu and Zhou, Pan and Li, Huan and Lin, Zhouchen and Yan, Shuicheng. Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models.	S2.1 S2.2 S2.3
2/15/2023	1. Yan Ju 2. Mamatha Yarramaneni 3. Peiyao Xiao	Blackbox methods	1. Tu, Chun-Chen, et al. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. AAAI 2019. 2. Eriksson, David and Pearce, Michael and Gardner, Jacob and Turner, Ryan D and Poloczek, Matthias. Scalable global optimization via local bayesian optimization. NeurIPS 2019. 3. Anonymous authors. Zeroth-Order Optimization with Trajectory-Informed Derivative Estimation. Under review.	S3.1 S3.2 S3.3
2/22/2023	1. Kotha Meher Preethi 2. Anagha Vivekanand Joshi 3. Yifan Yang	Bilevel Optimization	1. Franceschi, Luca, et al. Bilevel programming for hyperparameter optimization and meta-learning. ICML 2018. 2. 1. Ji, Kaiyi, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. ICML 2021. 3. Chen, Lesi and Xu, Jing and Zhang, Jingzhao. On Bilevel Optimization without Lower-level Strong Convexity	S4.1 S4.2 S4.3
3/1/2023	1. Umar Ahmed 2. Divya Sharvani Kandukuri 3. Mingxi Lei	Federated learning	1. McMahan, Brendan, et al. Communication-efficient learning of deep networks from decentralized data. AISTATS 2017. 2. Zhao, Yue and Li, Meng and Lai, Liangzhen and Suda, Naveen and Civin, Damon and Chandra, Vikas. Federated learning with non-iid data. 3. Reddi, Sashank and Charles, Zachary and Zaheer, Manzil and Garrett, Zachary and Rush, Keith and Kone{v{c}}n{`y}, Jakub and Kumar, Sanjiv and McMahan, H Brendan. Adaptive federated optimization. ICLR 2021.	S5.1 S5.2 S5.3
3/8/2023	1. Yuting Hu 2. Lipisha Chaudhary 3. Lalitha Priya Garigapati	Minimax optimization	1. Liu, Mingrui, et al. Towards better understanding of adaptive gradient algorithms in generative adversarial nets. ICLR 2020. 2. Yuan, Zhuoning and Yan, Yan and Sonka, Milan and Yang, Tianbao. Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification. ICCV 2021. 3. Madry, Aleksander and Makelov, Aleksandar and Schmidt, Ludwig and Tsipras, Dimitris and Vladu, Adrian. Towards deep learning models resistant to adversarial attacks. ICLR 2018.	S6.1 S6.2 S6.3
3/15/2023	1. Victor Vats 2. Sai Saran Anamanamudi 3. Naresh Kumar Devulapally	MAML methods.	1. Finn, Chelsea, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. ICML 2017. 2. Finn, Chelsea, et al. Online meta-learning. ICML 2019. 3. Raghu, Aniruddh, et al. Rapid learning or feature reuse? towards understanding the effectiveness of MAML. ICLR 2020.	S7.1 S7.2 S7.3
3/22/2023	Spring Recess
3/29/2023	1. Venkata Sai Saran Putta	Compressed methods.	1. Bernstein, Jeremy, et al. signSGD: Compressed optimisation for non-convex problems. ICML 2018. 2. Karimireddy, Sai Praneeth, et al. Error feedback fixes signsgd and other gradient compression schemes. ICML 2019.	S8.1
4/5/2023	1. Dhanush Ankata 2. Harshavardhan Reddy Bommireddy Pranaya Satwika Reddy Maddi	Learn two-layer NNs	1. Du, Simon S and Zhai, Xiyu and Poczos, Barnabas and Singh, Aarti. Gradient descent provably optimizes over-parameterized neural networks. ICLR 2019. 2. Li, Yuanzhi and Liang, Yingyu. Learning overparameterized neural networks via stochastic gradient descent on structured data. NeurIPS 2018. (Two students)	S9.1 S9.2
4/12/2023	1. Enjamamul Hoq Jue Guo 2. Aishwarya Mehta Ayush Utkarsh	Learn multi-layer NNs	1. Du, Simon and Lee, Jason and Li, Haochuan and Wang, Liwei and Zhai, Xiyu. Gradient descent finds global minima of deep neural networks. ICML 2019. (Two students) 2. Zou, Difan and Cao, Yuan and Zhou, Dongruo and Gu, Quanquan. Stochastic gradient descent optimizes over-parameterized deep relu networks. Machine Learning 2020. (Two students)	S10.1 S10.2
4/19/2023	1. Rakesh Pasupuleti 2. Eva Pradhan	Generalization analysis on linear regression	1. Bartlett, Peter L and Long, Philip M and Lugosi, Gabor and Tsigler, Alexander. Benign overfitting in linear regression. PNAS 2020. 2. Zou, Difan and Wu, Jingfeng and Braverman, Vladimir and Gu, Quanquan and Kakade, Sham. Benign overfitting of constant-stepsize sgd for linear regression	S11.1 S11.2
4/26/2023	1. Harichandana Vejendla 3. Sowmiya Murugiah	Few-shot learning	1. Goldblum, Micah and Reich, Steven and Fowl, Liam and Ni, Renkun and Cherepanova, Valeriia and Goldstein, Tom. Unraveling meta-learning: Understanding feature representations for few-shot tasks. ICML 2020. 2. Rusu, Andrei A and Rao, Dushyant and Sygnowski, Jakub and Vinyals, Oriol and Pascanu, Razvan and Osindero, Simon and Hadsell, Raia. Meta-learning with latent embedding optimization. ICLR 2019. 3. Rajeswaran, Aravind and Finn, Chelsea and Kakade, Sham M and Levine, Sergey. Meta-learning with implicit gradients. NeurIPS 2019.	S12.1 S12.3
5/3/2023		Continual learning	1. Buzzega, Pietro and Boschini, Matteo and Porrello, Angelo and Abati, Davide and Calderara, Simone. Dark experience for general continual learning: a strong, simple baseline. NeurIPS 2020. 2. Lopez-Paz, David and Ranzato, Marc'Aurelio. Gradient episodic memory for continual learning. NeurIPS 2017. 3. Shin, Hanul and Lee, Jung Kwon and Kim, Jaehong and Kim, Jiwon. Continual learning with deep generative replay. NeurIPS 2017.	S13
5/10/2023		Contrastive learning	1. Chen, Xinlei and He, Kaiming. Exploring simple siamese representation learning. CVPR 2021. 2. Yuan, Zhuoning and Wu, Yuexin and Qiu, Zi-Hao and Du, Xianzhi and Zhang, Lijun and Zhou, Denny and Yang, Tianbao. Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance. ICML 2022.	S14

Academic Integrity

Students are expected to write all summaries and homework independently, based on paper reading, presentation and in-class discussion. Directly paraphrasing others’ work or solution is regarded as plagiarism, which will result in an F grade. Any reference used in your presentation must be clearly cited. Academic integrity is required in your learning process. This course follows the departmental and university policies on academic integrity, which can be found at link.