Week Topics + Notes Additional Readings, Notable Events
1. Aug 30 What is COLT? Characterizing learning models. Consistency Model. PAC Model.
[PC -- Branislav] Consistency model
  • Monotone disjunction is CM-learnable
  • k-CNF is CM-learnable
  • Separation Hyperplane is CM-learnable

  • Pitt, L. and Valiant, L. G. 1988. Computational limitations on learning from examples. J. ACM 35, 4 (Oct. 1988), 965-984.
  • [PC] Aldous, D. and Vazirani, U. 1995. A Markovian extension of Valiant's learning model. Inf. Comput. 117, 2 (Mar. 1995), 181-186. [ pdf ]
  • Blum, A. and Rivest, R. L. 1989. Training a 3-node neural network in NP-complete. In Advances in Neural information Processing Systems 1 Morgan Kaufmann Publishers, San Francisco, CA, 494-501.
  • [PC] Feldman, V. 2009. Hardness of approximate two-level logic minimization and PAC learning with membership queries. J. Comput. Syst. Sci. 75, 1 (Jan. 2009), 13-26. (Also STOC'06) [ pdf ]
  • Vitaly Feldman, Hardness of Proper Learning, The Encyclopedia of Algorithms, 2008
  • Vitaly Feldman, Statistical Query Learning, The Encycopedia of Algorithms, 2008.
2. Sep 06 Sample complexity. Sample complexity for finite hypothesis spaces. VC-dimension. Sample complexity for infinite hypothesis spaces.
[PC -- Swapnoneel] Some hardness results
  • k-term DNF is not CM-learnable (i.e. it's NP-hard), for any k ≥ 2
  • Intractability of learning 3-term DNF by 3-term DNF (See Rivest's lecture 5)
Mon, Sep 06 is Labor Day. Thursday, Sep 09 is Rosh Hashanah.
  • Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, Manfred K. Warmuth: Occam's Razor. Inf. Process. Lett. 24(6): 377-380 (1987). [This is the original Occam's Razor paper]
  • [PC] Ming Li, John Tromp, Paul M. B. Vitányi: Sharpening Occam's razor. Inf. Process. Lett. 85(5): 267-274 (2003).
  • Hosking, J. R., Pednault, E. P., and Sudan, M. 1997. A statistical perspective on data mining. Future Gener. Comput. Syst. 13, 2-3 (Nov. 1997), 117-134.
  • [PC] Misha Alekhnovich, Mark Braverman, Vitaly Feldman, Adam Klivans, Toniann Pitassi,  The complexity of properly learning simple concept classes, Journal of Computer and System Sciences, 74(1), 2008 (also, FOCS 2004). Two people can present this
3. Sep 13 [PC -- Steven] Some PAC-learning results.
  • Learning k-decision list (see Rivest's lecture 6)
  • Learning 3-term DNF by 3-CNF (Rivest's lecture 5)

4. Sep 20 [PC -- Steve Uurtamo] Three different proofs of Sauer's lemma. See also a blog post by Tim Gowers (that's the first proof). Induction is the second proof. And proof using shifting technique is the third. All proofs are short, and you'll learn nice combinatorial techniques from them.
5. Sep 27 Dealing with Noises. Inconsistent Hypothesis Model. Empirical error and Generalization error. Uniform convergence theorem.


[PC -- Daniel Megalo (tue sep 28)] Sample complexity lowerbound. Show that Omega(d/ε) is necessary, where d is the VC-dimension. (Rivest's lecture 10)
[PC] Venkatesan Guruswami, Prasad Raghavendra: Hardness of Learning Halfspaces with Noise. SIAM J. Comput. 39(2): 742-765 (2009). (Aslo FOCS 2006). [ pdf ]
6. Oct 04 Weak and Strong PAC-learning. Boosting & AdaBoost, training error bound.
  • [Schapire's course] Scribe notes 9
  • [Blum's course] Lecture 0209
  • Robert E. Schapire. The boosting approach to machine learning: An overview. In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003. [ pdf ]

[PC -- Xiaoxing Yu] State and prove Theorem 8 (page 17) in the following paper:

Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139, 1997. [ Postscript ] (the original AdaBoost paper).

The theorem is very short, and it makes use of Theorem 1 in the following paper. Thus, please prove both theorems

Baum, E. B. and Haussler, D. 1989. What size net gives valid generalization?. Neural Comput. 1, 1 (Mar. 1989), 151-160. [ pdf ]
  • Ron Meir and Gunnar Rätsch. An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning (LNAI2600), 2003 [ Pdf ]
7. Oct 11 Generalization error bounds: naive and margins-based
  • [Schapire's course] Scribe notes 10, notes 11
  • [Blum's course] Lecture 0211
  • Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686, 1998. [ pdf ]

  • [PC Caiming] Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." (Feb. 1999a). Caiming can take an entire lecture (1.5 hour) for this
  • [PC] Robert E. Schapire. The convergence rate of AdaBoost [open problem]. In The 23rd Conference on Learning Theory, 2010. [ pdf ]
  • [PC] Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting out- put codes. Journal of Artificial Intelligence Research, 2:263–286, January 1995. (They showed how to convert binary classifiers into multiclass-classifers using error correcting codes!Two people can present this
  • [PC] Robert E. Schapire. Using output codes to boost multiclass learning problems. In Machine Learning: Proceedings of the Fourteenth International Conference, 1997. [ Postscript ]
8. Oct 18
  • [PC Praneeta] Empirical margin loss bound. Prove Theorem 1, page 129, or this paper.
  • [PC Yongding] Massart's Lemma and its corollary + Rademacher complexity of H is equal to the Rademacher complexity of co(H). Please prove 3 things
    • Massart's Lemma & its corrollary (page 15, 16, 17 in Lecture 3 of Mehryar Mohri's class)
    • Rademacher complexity of Convex Hull (page 23, Lecture 6 of Mehryar Mohri's class)
Both presentations are on Tuesday, Oct 19.

9. Oct 25

10. Nov 01 Support Vector Machines, the linearly separable case
Chris Burges' SVM tutorial. [ pdf ]
Excerpt from Vapnik's The nature of statistical learning theory.
11. Nov 08 SVM: the kernel trick
O. Bousquet, S. Boucheron, and G. Lugosi, Introduction to Statistical Learning Theory. [ pdf ]
12. Nov 15 Online learning. The mistake-bound model. Learning from expert advices. WMA & RWMA.
[PC] Perceptron algorithm & its analysis
13. Nov 22 Winnow
Wed Nov 24 -- Fri Nov 26: Fall Recess

  • Blum, A. and Y. Mansour (2007) Learning,Regret Minimization, and Equilibria. In Algorithmic Game Theory (eds. N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani),Cambridge University Press. [ pdf ]
  • Yoav Freund and Robert E. Schapire, Adaptive Game Playing Using Multiplicative Weights, Games and Economics Behaviors, 29: 79-103, 1999. [ ps ]


14. Nov 29 Linear regression Jyrki Kivinen and Manfred K. Warmuth. Exponentiated Gradient versus Gradient Descent for Linear Predictors. Information and Computation, 132(1):1-63, January, 1997. pdf.
15. Dec 06 Maximum entropy, maximum likelihood Fri Dec 10 is the last day of classes.