HoloClean: A Machine Learning System for Data Enrichment

by Roberto Zicari · June 4, 2018

HoloClean is a statistical inference engine to impute, clean, and enrich data. As a weakly supervised machine learning system, HoloClean leverages available quality rules, value correlations, reference data, and multiple other signals to build a probabilistic model that accurately captures the data generation process, and uses the model in a variety of data curation tasks. HoloClean allows data practitioners and scientists to save the enormous time they spend in building piecemeal cleaning solutions, and instead, effectively communicate their domain knowledge in a declarative way to enable accurate analytics, predictions, and insights from noisy, incomplete, and erroneous data.

Download

Released under License Apache 2.0
© 2018 HoloClean
Contact: contact@holoclean.io

Resources

Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas, A formal framework for probabilistic unclean databases, Manuscript, 2018. [PDF]

Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré, Holoclean: Holistic data repairs with probabilistic inference, PVLDB 10 (2017), no. 11, 1190-1201. [PDF]

Theodoros Rekatsinas, Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, and Christopher Ré, SLiMFast: Guaranteed results for data fusion and source reliability, SIGMOD 2017.[PDF]

Ihab F. Ilyas and Xu Chu, Trends in Cleaning Relational Data: Cosistency and Deduplications, Foundations and Trends in Databases 2015.[PDF]

HoloClean: A Machine Learning System for Data Enrichment

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

HPCC Systems from LexisNexis Risk Solutions

KX

InterSystems

MySQL/Oracle

SingleStore

Supporters

McObject

NEXTGRES

Progress

Raima

Scality

Volt Active Data