Survey of Apache Big Data Stack
Survey of Apache Big Data Stack Supun Kamburugamuve For the PhD Qualifying Exam 12/16/2013 Advisory Committee Prof. Geoffrey Fox Prof. David Leake Prof. Judy Qiu 1. Introduction Over the last decade there has being...
Operational Database Management Systems
Survey of Apache Big Data Stack Supun Kamburugamuve For the PhD Qualifying Exam 12/16/2013 Advisory Committee Prof. Geoffrey Fox Prof. David Leake Prof. Judy Qiu 1. Introduction Over the last decade there has being...
BigDataBench As a multi-discipline research effort, BigDataBench is an open-source big data benchmark suite. The current version is BigDataBench 3.0. It includes 6 real-world and 2 synthetic data sets, and 32 big data workloads, covering micro...
PoliTwi: Early Detection of Emerging Political Topics on Twitter and the Impact on Concept-Level Sentiment Analysis Sven Rill, Dirk Reinela, Jörg Scheidt, Institute of Information Systems, University of Applied Sciences Hof, Alfons-Goppel-Platz 1, Hof, Germany...
MapReduce-MPI Library MapReduce-MPI (MR-MPI) library is an open-source implementation of MapReduce written for distributed-memory parallel machines on top of standard MPI message passing. The MR-MPI library was developed at Sandia National Laboratories, a US...
AsterixDB Big Data Management System (BDMS) Overview (last updates October 2014) The AsterixDB BDMS is the result of over four years of R&D involving researchers at UC Irvine, UC Riverside, and UC San Diego....
Fine-grained Partitioning for Aggressive Data Skipping Modern query engines are increasingly being required to process enormous datasets in near real-time. While much can be done to speed up the data access, a promising technique...
A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data In emerging Big Data scenarios, obtaining timely, high-quality answers to aggregate queries is difficult due to the challenges of processing and cleaning...
BIG DATA: SEIZING OPPORTUNITIES, PRESERVING VALUES  Executive Office of the President The White House Washington MAY 2014 May 1, 2014 DEAR MR. PRESIDENT: We are living in the midst of a social, economic,...
SQL-on-Hadoop without compromise IBM Software Group Thought Leadership White Paper How Big SQL 3.0 from IBM represents an important leap forward for speed, portability and robust functionality in SQL-on-Hadoop solutions By Scott C. Gray, Fatma...
Isis2: A new Open Platform for Data Replication in the Cloud. by Ken Birman, N. Rama Rao Professor of Computer Science at Cornell. “My target is to be the MapReduce solution for the world’s...