Mining Latent Entity Structures
Mining Latent Entity Structures
Synthesis Lectures on Data Mining and Knowledge Discovery
March 2015, 159 pages, (doi:10.2200/S00625ED1V01Y201502DMK010)
Chi Wang
Microsoft Research
Jiawei Han
University of Illinois at Urbana-Champaign
Download Free Sample Chapter 1:wang_Ch1
Abstract
The “big data” era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone’s daily life. Examples of such collections include scientific publications, enterprise logs, news articles, social media, and general web pages. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured, interconnected data. Mining latent structures around entities uncovers hidden knowledge such as implicit topics, phrases, entity roles and relationships. In this monograph, we investigate the principles and methodologies of mining latent entity structures from massive unstructured and interconnected data. We propose a text-rich information network model for modeling data in many different domains. This leads to a series of new principles and powerful methodologies for mining latent structures, including (1) latent topical hierarchy, (2) quality topical phrases, (3) entity roles in hierarchical topical communities, and (4) entity relations. This book also introduces applications enabled by the mined structures and points out some promising research directions.
Table of Contents: Acknowledgments / Introduction / Hierarchical Topic and Community Discovery / Topical Phrase Mining / Entity Topical Role Analysis / Mining Entity Relations / Scalable and Robust Topic Discovery / Application and Research Frontier / Bibliography / Authors’ Biographies