CONTACT 201 Bell Hall - Buffalo, New York 14260-2000 - (716) 645-3180
© 2008, University at Buffalo, All rights reserved. | Privacy | Accessibility
Saturday, September 6, 4:00 pm- 5:00pm
ABSTRACT
There has been substantial work in many different fields on the problem of clustering data. Theoretical work has generally been of two types: either on algorithms for (approximately) optimizing various distance-based objectives such as k-median, k-means, and min-sum, or on clustering under probabilistic "generative model" assumptions such as mixtures of Gaussian or related distributions.
In this work we propose a new approach to analyzing the problem of clustering. Building on models used in learning theory, we consider the goal of approximately recovering an unknown target clustering from a given similarity matrix (weighted graph), given only the assumption of certain natural properties that the similarity or weight function satisfies with respect to the desired clustering. We find that if we are willing to relax our goals a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if it contains a pruning that is close to the correct answer) then this leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well. We can also produce accurate clusterings under implicit assumptions made when considering approximation algorithms for distance-based objectives, even at values for which no efficient approximations exist. This work can be viewed as defining a kind of PAC model for clustering.
This talk is based on work joint with Maria-Florina Balcan, Anupam Gupta, and Santosh Vempala.
Speaker Bio
Avrim Blum is Professor of Computer Science at Carnegie Mellon University. His main research interests are in Machine Learning Theory, Approximation Algorithms, and Algorithmic Game Theory, and is also known for his work in AI Planning. He has served as Program Chair for the IEEE Symposium on Foundations of Computer Science (FOCS) and the Conference on Learning Theory (COLT), and on the organizing committee for the National Academy of Sciences U.S. Frontiers of Science Symposium. He was recepient of the Sloan Fellowship and NSF National Young Investigator Awards, and is a Fellow of the ACM.