Time-series R package
Aaron Benz, Data Scientist Accenture, released (made publicly available) a time-series R package. -January 2015 It has a tutorial/walkthrough of why some might use it and what it offers (being able to plot time-series data –...
Operational Database Management Systems
Aaron Benz, Data Scientist Accenture, released (made publicly available) a time-series R package. -January 2015 It has a tutorial/walkthrough of why some might use it and what it offers (being able to plot time-series data –...
BIG is an archive format that was designed to store millions of files. The format is quite simple. For each archive, two files are used. One file is where we store all the binary...
Mr.LDA is a package for flexible, scalable, multilingual topic modeling using variational inference in MapReduce. Latent Dirichlet Allocation (LDA) and related topic modeling technique are useful for exploring document collections. Because of the increasing...
BG is a benchmark to evaluate performance of a data store for interactive social networking actions and sessions.These actions and sessions either read or update a very small amount of the entire data set....
PoliTwi: Early Detection of Emerging Political Topics on Twitter and the Impact on Concept-Level Sentiment Analysis PoliTwi is on-line service that detects emerging political topics (Top Topics) in Twitter sooner than other standard information...
BigDataBench As a multi-discipline research effort, BigDataBench is an open-source big data benchmark suite. The current version is BigDataBench 3.0. It includes 6 real-world and 2 synthetic data sets, and 32 big data workloads, covering micro...
MapReduce-MPI Library MapReduce-MPI (MR-MPI) library is an open-source implementation of MapReduce written for distributed-memory parallel machines on top of standard MPI message passing. The MR-MPI library was developed at Sandia National Laboratories, a US...
AsterixDB Big Data Management System (BDMS) Overview (last updates October 2014) The AsterixDB BDMS is the result of over four years of R&D involving researchers at UC Irvine, UC Riverside, and UC San Diego....
Isis2: A new Open Platform for Data Replication in the Cloud. by Ken Birman, N. Rama Rao Professor of Computer Science at Cornell. “My target is to be the MapReduce solution for the world’s...
Statistical Workload Injector for MapReduce (SWIM) Yanpei Chen, Sara Alspaugh, Archana Ganapathi, Rean Griffith, Randy Katz MapReduce systems face enormous challenges due to increasing growth, diversity, and consolidation of the data and computation involved....