In network data, relationships can be observed (e.g., as in a social network) or be built by
some similarity function (e.g., between data points in euclidean space). To find clusters
in such data, spectral clustering utilizes the eigenvectors of a similarity matrix, where the
(i, j)th element measures the similarity between points i and j. Unfortunately, the standard
spectral clustering algorithm fails when the similarity matrix is sparse (a common regime).
This talk will first discuss how a regularized spectral clustering algorithm can correct for
the problems created by this failure. The statistical improvements from regularization
are apparent in practice. The talk will theoretically characterize the improvement from
regularization under the degree corrected Stochastic Blockmodel. The talk will also discuss
contextualized spectral clustering in which the actors in the network have attributes that
correlate with the communities in the social network. We study the misclustering rate
of our proposed algorithm under a joint mixture model on the network and the node
covariates; this characterizes the algorithm as a statistical estimator. Applications with a
1,000,000 node DTI neuroconnectome and a 4,000,000 node online social network motivate
Cookies served at 3:45pm, 1st floor Lounge.
Karl Rohe, Department of Statistics, University of Wisconsin-Madison