Toward Transdisciplinary Machine Learning: Scalable Text Mining and Social Influence Modeling

Date and time: 
Thursday, February 15, 2018 - 15:30
Location: 
220 Deschutes
Author(s):
Moontae Lee
Cornell University
Host/Committee: 
  • Dejing Dou

Machine Learning has shown remarkable progress in understanding massive data and making data-driven decisions. In this talk, I will first clarify functions of Machine Learning as exploration, prediction, and explanation, demystifying its specialty against other closely related disciplines. Then I will present the state-of-the-art spectral topic modeling for transparent and scalable exploration of multiple modalities such as large text corpora and various user preferences. Next, introducing my recent work: the Chinese Voting Process, which models social influence and biases in online evaluations on product reviews and question answers, I will demonstrate how to predict the intrinsic quality of the evaluations and how to explain different behavioral dynamics across Amazon and 82 StackExchange forums. Later proposing various future collaborations, I will conclude that Machine Learning can be not only multidisciplinary but transdisciplinary tools for effectively tackling diverse and complex real-world problems.

 

Biography

Moontae Lee is a Ph.D. student in Computer Science Department at Cornell University, working with Prof. David Mimno and David Bindel. He studied Computer Science, Mathematics, and Psychology before joining to Cornell. His research focuses on probabilistic modeling of large discrete data, where he incorporates spectral theory for a new paradigm of statistical inference. He also extensively collaborated with various researchers at Microsoft Research in Redmond, developing intelligent and interpretable computational models for industrial applications on language and linguistics. His research orientation tries to combine his expertise in "modeling the world" with domain-specific intuitions like "modeling the biases", so that Machine Learning cannot only leverage "large" amount of data but also understand "context" in the data toward providing proper solutions.

Tags: