June 5th, 2012 @ 11:30am Brian Kulis (OSU)

Revisiting k-means via Bayesian Nonparametrics

Abstract: Bayesian models offer great flexibility for clustering applications—Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. Inspired by the asymptotic connection between k-means and mixtures of Gaussians, I will discuss a class of algorithms based on asymptotics of Dirichlet processes that feature the benefits of Bayesian nonparametric models but also the simplicity and scalability of classical clustering methods. In particular, I will focus on two novel algorithms that arise from these asymptotics: i) a k-means-like algorithm that does not fix the number of clusters, and ii) a scalable topic modeling algorithm based on the hierarchical Dirichlet process. I will discuss further extensions as well as some of our recent empirical results.

Bio: Brian Kulis is an assistant professor of computer science and engineering at Ohio State University. Previously, he was a postdoctoral fellow at UC Berkeley EECS and the International Computer Science Institute. He obtained his PhD in computer science from the University of Texas in 2008, and his BA degree from Cornell University in computer science and mathematics in 2003. His research focuses on machine learning, data mining, and large-scale optimization. For his research, he has won three best student paper awards at top-tier conferences—two at the International Conference on Machine Learning (in 2005 and 2007) and one at the IEEE Conference on Computer Vision and Pattern Recognition (in 2008). He is also the recipient of an MCD graduate fellowship from the University of Texas (2003-2007) and an Award of Excellence from the College of Natural Sciences at the University of Texas.

seminars/seminaritems/2012-06-05.txt · Last modified: 2012/05/29 15:20 by silberman