Week |
Topics
+ Notes |
Additional Readings,
Notable Events |
1. Aug
30 |
What is COLT?
Characterizing learning models. Consistency Model. PAC Model.
[PC -- Branislav]
Consistency model
- Monotone disjunction is CM-learnable
- k-CNF is CM-learnable
- Separation Hyperplane is CM-learnable
|
- Pitt, L. and
Valiant, L. G. 1988. Computational
limitations on learning from
examples. J. ACM 35, 4 (Oct.
1988), 965-984.
- [PC] Aldous, D. and Vazirani,
U. 1995. A Markovian extension of Valiant's learning model. Inf.
Comput. 117, 2 (Mar. 1995), 181-186. [ pdf
]
- Blum, A. and Rivest, R. L. 1989. Training
a
3-node neural network in NP-complete. In Advances in Neural
information Processing Systems 1 Morgan Kaufmann Publishers, San
Francisco, CA, 494-501.
- [PC]
Feldman, V. 2009.
Hardness of approximate two-level logic minimization and PAC learning
with membership queries. J. Comput. Syst. Sci. 75, 1 (Jan. 2009),
13-26. (Also STOC'06) [ pdf
]
- Vitaly Feldman, Hardness
of Proper Learning, The Encyclopedia of Algorithms, 2008
- Vitaly Feldman, Statistical
Query Learning, The Encycopedia of Algorithms, 2008.
|
2. Sep
06 |
Sample
complexity. Sample complexity for finite hypothesis spaces.
VC-dimension. Sample complexity for infinite hypothesis spaces.
[PC -- Swapnoneel] Some hardness results
- k-term DNF is not
CM-learnable (i.e. it's NP-hard), for any k ≥ 2
- Intractability of learning 3-term DNF by 3-term DNF (See
Rivest's lecture
5)
|
Mon, Sep 06 is
Labor Day.
Thursday, Sep 09 is Rosh Hashanah.
- Anselm Blumer, Andrzej
Ehrenfeucht, David Haussler, Manfred K. Warmuth: Occam's
Razor. Inf.
Process. Lett. 24(6): 377-380 (1987). [This is the
original Occam's
Razor paper]
- [PC]
Ming Li, John Tromp, Paul M. B. Vitányi: Sharpening
Occam's
razor. Inf. Process. Lett.
85(5): 267-274 (2003).
- Hosking, J. R.,
Pednault, E. P., and Sudan, M. 1997. A
statistical perspective on data mining.
Future Gener. Comput. Syst. 13, 2-3 (Nov. 1997), 117-134.
- [PC]
Misha Alekhnovich, Mark Braverman, Vitaly Feldman, Adam Klivans,
Toniann Pitassi, The complexity of properly learning simple
concept
classes, Journal of Computer and System Sciences, 74(1), 2008 (also,
FOCS 2004). Two people
can present
this
|
3. Sep
13 |
[PC -- Steven]
Some PAC-learning results.
- Learning
k-decision list (see Rivest's lecture
6)
- Learning 3-term DNF by 3-CNF (Rivest's lecture 5)
|
|
4. Sep
20 |
[PC -- Steve Uurtamo]
Three different proofs
of
Sauer's lemma. See also a blog
post by Tim Gowers (that's the
first proof). Induction is the second proof. And proof using shifting
technique is the third. All proofs are short, and you'll learn nice
combinatorial techniques from them. |
|
5. Sep
27 |
Dealing with Noises.
Inconsistent Hypothesis Model. Empirical error and Generalization error. Uniform
convergence theorem.
[PC -- Daniel Megalo (tue sep 28)]
Sample complexity
lowerbound. Show that Omega(d/ε) is necessary, where d is the
VC-dimension. (Rivest's lecture 10)
|
[PC]
Venkatesan Guruswami,
Prasad Raghavendra: Hardness of Learning
Halfspaces with Noise. SIAM J. Comput. 39(2): 742-765 (2009). (Aslo
FOCS 2006). [ pdf
] |
6. Oct
04 |
Weak
and Strong PAC-learning. Boosting & AdaBoost, training error
bound.
- [Schapire's course] Scribe
notes 9
- [Blum's course] Lecture
0209
- Robert E. Schapire. The
boosting approach to
machine learning: An overview. In
D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors,
Nonlinear Estimation and Classification. Springer, 2003. [ pdf
]
[PC -- Xiaoxing Yu] State and prove Theorem 8 (page 17) in the following paper:
Yoav Freund and Robert E. Schapire. A
decision-theoretic generalization
of on-line learning and an application to boosting. Journal of Computer
and System Sciences, 55(1):119-139, 1997. [ Postscript
] (the original AdaBoost paper).
The theorem is very short, and it makes use of Theorem 1 in the following paper. Thus, please prove both theorems
Baum, E. B. and Haussler, D. 1989. What size net gives valid generalization?. Neural Comput. 1, 1 (Mar. 1989), 151-160. [ pdf ]
|
- Ron Meir and Gunnar Rätsch.
An introduction to boosting and leveraging.
In Advanced Lectures on Machine Learning (LNAI2600), 2003
[ Pdf
]
|
7.
Oct 11 |
Generalization
error bounds: naive and margins-based
- [Schapire's course] Scribe
notes 10, notes
11
- [Blum's course] Lecture
0211
- Robert E. Schapire, Yoav
Freund, Peter Bartlett and Wee Sun Lee.
Boosting the margin: A new explanation for the effectiveness of voting
methods. The Annals of Statistics, 26(5):1651-1686, 1998. [ pdf
]
- [PC Caiming]
Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine."
(Feb. 1999a). Caiming can take an entire lecture (1.5 hour) for this
|
- [PC] Robert
E. Schapire. The convergence rate of AdaBoost [open
problem]. In The 23rd
Conference on Learning Theory, 2010. [ pdf
]
- [PC]
Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning
problems via error-correcting out- put codes. Journal of Artificial
Intelligence Research, 2:263–286, January 1995. (They showed how to convert binary
classifiers into multiclass-classifers using error correcting codes!) Two people can present
this
- [PC]
Robert E. Schapire. Using output codes to boost multiclass learning
problems. In Machine Learning: Proceedings of the Fourteenth
International Conference, 1997. [ Postscript
]
|
8. Oct
18 |
- [PC Praneeta] Empirical margin loss bound. Prove Theorem 1, page 129, or this paper.
- [PC Yongding]
Massart's Lemma and its corollary + Rademacher complexity of H is equal
to the Rademacher complexity of co(H). Please prove 3 things
- Massart's Lemma & its corrollary (page 15, 16, 17 in Lecture 3 of Mehryar Mohri's class)
- Rademacher complexity of Convex Hull (page 23, Lecture 6 of Mehryar Mohri's class)
Both presentations are on Tuesday, Oct 19.
|
|
9.
Oct 25 |
|
|
10.
Nov 01 |
Support
Vector Machines, the linearly separable case
|
Chris Burges' SVM
tutorial. [ pdf
]
Excerpt
from Vapnik's The nature of statistical learning theory.
|
11. Nov 08 |
SVM: the kernel trick
|
O. Bousquet, S.
Boucheron, and G. Lugosi, Introduction to Statistical Learning Theory.
[ pdf
] |
12. Nov
15 |
Online
learning. The mistake-bound model. Learning from expert advices. WMA
& RWMA.
[PC]
Perceptron algorithm & its analysis |
- Avrim Blum. "On-line
algorithms in machine learning." In Dagstuhl Workshop on On-Line
Algorithms, June, 1996.
[ ps
]
- Shai Shalev-Shwartz and Yoram Singer, Tutorial
on Theory and Applications of Online Learning, ICML 2008.
- A
Mind
Reader Game (You should definitely try to play this game!)
- Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold,
D.
P., Schapire, R. E., and Warmuth, M. K. 1997. How
to use expert advice. J. ACM 44, 3 (May. 1997), 427-485.
- Frans M. J. Willems, Yuri M. Shtarkov, Tjalling J.
Tjalkens: The
context-tree weighting method: basic properties. IEEE
Transactions
on Information Theory 41(3): 653-664 (1995). [1996
Paper Award
of the IEEE Information Theory Society]
- Littlestone, N. 1988. Learning Quickly When
Irrelevant
Attributes Abound: A New Linear-Threshold Algorithm. Mach. Learn. 2, 4
(Apr. 1988), 285-318. [The Winnow paper, pdf
]
|
13. Nov
22 |
Winnow
|
Wed
Nov 24 -- Fri Nov 26: Fall Recess
- Blum, A. and Y.
Mansour (2007) Learning,Regret Minimization, and Equilibria. In
Algorithmic Game Theory (eds. N. Nisan, T. Roughgarden, E. Tardos, and
V. Vazirani),Cambridge University Press. [ pdf
]
- Yoav Freund and Robert E. Schapire, Adaptive Game
Playing
Using Multiplicative Weights, Games and Economics Behaviors, 29:
79-103, 1999. [ ps
]
|
14. Nov
29 |
Linear
regression |
Jyrki Kivinen and Manfred K. Warmuth. Exponentiated
Gradient
versus Gradient Descent for Linear Predictors. Information and
Computation, 132(1):1-63, January, 1997. pdf.
|
15. Dec
06 |
Maximum
entropy, maximum likelihood
|
Fri
Dec 10 is the last day of classes. |