Mr.LDA Scalable Topic Modeling Using Variational Inference in MapReduce
Mr.LDA is a package for flexible, scalable, multilingual topic modeling using variational inference in MapReduce.
Latent Dirichlet Allocation (LDA) and related topic modeling technique are useful for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scalability of inference for LDA. Unlike other techniques that use Gibbs sampling, Mr.LDA uses variational inference, which easily fits into a distributed environment. More importantly, this variational implementation, unlike highly tuned and specialized implementations based on Gibbs sampling, is easily extensible — examples include informed priors to guide topic discovery and extracting topics from a multilingual corpus.
LINK to Project Page