Cloud9 is a collection of Hadoop tools that tries to make working with big data a bit easier.

This software was designed with two goals in mind: First, to serve as a teaching tool for MapReduce and MapReduce algorithm design. Second, to provide a collection of useful tools on which to build other “big data” systems. Here are just a few features:

  • API for working with various text collections, including Wikipedia, TREC document collections for information retrieval research, and the ClueWeb09 web crawl.
  • Reference implementations of a few common MapReduce algorithm, including PageRank, bread-first search, co-occurrence matrix computation.
  • Implementations of various useful Hadoop data types.
  • Efficient primitive implementation of Java maps, along with integration with fastutil.


You may also like...