Real-valued word vectors have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities via simple geometrical operations. In this talk, we demonstrate further striking geometrical properties of the word vectors. First, we show that a very simple, and yet counter-intuitive, postprocessing technique, which makes the vectors "more isotropic", renders off-the-shelf vectors even stronger. Second, we show that a sentence containing a target word is well represented by a low-rank subspace; subspaces associated with a particular sense of the target word tend to intersect with a line (one-dimensional subspace). We harness this Grassmannian geometry to disambiguate (in an unsupervised way) multiple senses of words, specifically so on the most promiscuously polysemous of all words: prepositions. A surprising finding is that rare senses, including idiomatic/sarcastic/metaphorical usages, are efficiently captured. Our algorithms are all unsupervised and rely on no linguistic resources; we validate them by presenting new state-of-the-art results on a variety of multilingual benchmark datasets.
1. Geometry of Compositionality, AAAI '17, https://arxiv.org/abs/1611.09799
2. Geometry of Polysemy, ICLR, '17, https://arxiv.org/abs/1610.07569
3. Representing Sentences as Low-rank subspaces, ACL '17, https://arxiv.org/abs/1704.05358
4. Prepositions in Context, preprint, https://arxiv.org/abs/1702.01466
The Information Systems Laboratory Colloquium (ISLC) is typically held in Packard 101 every Thursday at 4:15 pm during the academic year. Refreshments are usually served after the talk.
The Colloquium is organized by graduate students Martin Zhang, Farzan Farnia, Reza Takapoui, and Zhengyuan Zhou. To suggest speakers, please contact any of the students.
Pramod Viswanath received the Ph.D. degree in electrical engineering and computer science from University of California at Berkeley in 2000. From 2000 to 2001, he was a member of research staff at Flarion technologies, NJ. Since 2001, he is on the faculty at University of Illinois at Urbana-Champaign in Electrical and Computer Engineering, where he currently is a professor.
Pramod has worked extensively on information theory and its applications, specifically as applied to wireless communication. His book on wireless communication, coauthored with David Tse, is used in more than 60 institutes around the world. His interest in natural language processing is recent, although he has long been interested in natural languages.