Data-Intensive Computing with MapReduce

Time: Thursday, 6:00-8:45pm
Location: HBK 2119
Instructor: Jimmy Lin 

Our world is being revolutionized by “big data”: petabyte-scale data stores are popping up everywhere, opening up exciting opportunities for computing applications and scientific discovery. Data-intensive computing requires programming models that allow us to easily distribute computations across large clusters, and this is where MapReduce comes in. MapReduce, especially the Hadoop open-source implementation, has recently emerged as a popular framework for data-intensive computing. Among its advantages include the ability to horizontally scale to petabytes of data on thousands of commodity servers, easy-to-understand programming semantics, and a high degree of fault tolerance. Hadoop lies at the core of an application stack that is gaining widespread adoption in both industry and academia.

This course will provide an introduction to Hadoop, focusing specifically on algorithm design and “thinking at scale”, applied to a variety of domains: text, graphs, relational data, etc. We will also cover other components in the Hadoop ecosystem and alternative programming models.

Link to Course Material (on GitHub)

Link to slides (on GiHub)

You may also like...