Mr.LDA Scalable Topic Modeling Using Variational Inference in MapReduce

by Roberto Zicari · Published July 26, 2014 · Updated July 26, 2014

Mr.LDA is a package for flexible, scalable, multilingual topic modeling using variational inference in MapReduce.

Latent Dirichlet Allocation (LDA) and related topic modeling technique are useful for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scalability of inference for LDA. Unlike other techniques that use Gibbs sampling, Mr.LDA uses variational inference, which easily fits into a distributed environment. More importantly, this variational implementation, unlike highly tuned and specialized implementations based on Gibbs sampling, is easily extensible — examples include informed priors to guide topic discovery and extracting topics from a multilingual corpus.

LINK to Project Page

View the Project on GitHub

Mr.LDA Scalable Topic Modeling Using Variational Inference in MapReduce

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

HPCC Systems from LexisNexis Risk Solutions

KX

InterSystems

MySQL/Oracle

SingleStore

Supporters

McObject

NEXTGRES

Progress

Raima

Scality

Volt Active Data