UB -
University at Buffalo, The State University of New York Computer Science and Engineering

Eastern Great Lakes Theory Workshop Talk

A Computational Theory of Clustering

Avrim Blum, Carnegie Mellon University

Saturday, September 6, 4:00 pm- 5:00pm

ABSTRACT

There has been substantial work in many different fields on the problem of clustering data. Theoretical work has generally been of two types: either on algorithms for (approximately) optimizing various distance-based objectives such as k-median, k-means, and min-sum, or on clustering under probabilistic "generative model" assumptions such as mixtures of Gaussian or related distributions.

In this work we propose a new approach to analyzing the problem of clustering. Building on models used in learning theory, we consider the goal of approximately recovering an unknown target clustering from a given similarity matrix (weighted graph), given only the assumption of certain natural properties that the similarity or weight function satisfies with respect to the desired clustering. We find that if we are willing to relax our goals a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if it contains a pruning that is close to the correct answer) then this leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well. We can also produce accurate clusterings under implicit assumptions made when considering approximation algorithms for distance-based objectives, even at values for which no efficient approximations exist. This work can be viewed as defining a kind of PAC model for clustering.

This talk is based on work joint with Maria-Florina Balcan, Anupam Gupta, and Santosh Vempala.

Slides

Speaker Bio

Avrim Blum is Professor of Computer Science at Carnegie Mellon University. His main research interests are in Machine Learning Theory, Approximation Algorithms, and Algorithmic Game Theory, and is also known for his work in AI Planning. He has served as Program Chair for the IEEE Symposium on Foundations of Computer Science (FOCS) and the Conference on Learning Theory (COLT), and on the organizing committee for the National Academy of Sciences U.S. Frontiers of Science Symposium. He was recepient of the Sloan Fellowship and NSF National Young Investigator Awards, and is a Fellow of the ACM.

< Schedule < EaGL homepage