Cloud9 is a collection of Hadoop tools that tries to make working with big data a bit easier.
This software was designed with two goals in mind: First, to serve as a teaching tool for MapReduce and MapReduce algorithm design. Second, to provide a collection of useful tools on which to build other “big data” systems. Here are just a few features:
- API for working with various text collections, including Wikipedia, TREC document collections for information retrieval research, and the ClueWeb09 web crawl.
- Reference implementations of a few common MapReduce algorithm, including PageRank, bread-first search, co-occurrence matrix computation.
- Implementations of various useful Hadoop data types.
- Efficient primitive implementation of Java maps, along with integration with fastutil.