Big Data Fest: Data, tools & practice

by Roberto Zicari · December 5, 2014

Big Data Fest
Data, tools & practice

by
Genoveva Vargas Solar, Senior Scientist, CNRS
Genoveva.Vargas@imag.fr
Regina Motz, Prof. Universidad de la República
regina.motz@gmail.com
Javier Alfonso Espinosa Oviedo, Postdoctoral fellow, LAFMIA
javiera.espinosa@imag.fr
Technical support

Plácido Antonio de Souza Neto, Postdoctoral fellow, LIG
Juan Carlos Castrejón, PhD student, University of Grenoble, LIG-LAFMIA

2014

CONTENT

1. Big Data Challenges [PDF]

1.1. Volume: Huge data collections

1.2. Velocity: Continuous on-line data streams

1.3. Variety: Big data models

2. Applications and tools

2.1. Data replication and sharding [pdf]

2.1.1 NoSql Systems: experience with CouchDB [pdf] CouchWrapup [pdf]

2.2 Life cycle I: sanitizing experience with Pig [pdf]

2.3 Life cycle II: data gathering techniques: Web scrapping, data services, crowdsourcing [pdf ]

2.3.1. Open data / data journalism [see examples in the project definition]

3. Big Data Processing Platforms

3.1. Parallel processing for analytics : Hadoop platforms [pdf]

3.2 Some elements of data analytics [pdf]

3.3. Big Data Management Systems [see slides section 1 & the references section]

4. Big and smart data applications: examples

4.1 Elections [pdf]

4.2 Other applications [pdf]

HANDS ON

NoSQL data stores: expressing queries using MapReduce

Downloading Couch: http://couchdb.apache.org
1. Building a document database: using CouchDB [Ex-1] [Ex1-answers]
2. Querying a document database [Ex-2] [answers on explicit demand]

Data sanitation with Pig

Installing Pig
1. Hortonworks [pdf]
2. Testing your installation: [data] [PigScript]
Dealing with network behavior data collections [pdf] data[distributed in class ask for It !]

Data analytics with Hadoop

Environment: hadoop on Hortonworks
Counting words and other summarization challenges [AllData]
1. Counting words: first approach [ pdf ] [WordCount Example]
2. Counting with some optimizations using combiners: understanding some principles of the map reduce model [ pdf ] [MapReduce-book-final] [code examples]
Some interesting map reduce patterns: see the challenges section [patterns reference]

CHALLENGES

challenge : “a test of one’s abilities or resources in a demanding but stimulating undertaking”, The free English dictionary

CH1: Polyglot meets Xperanto [here]
CH-2: More intensive summarization: choose one of the following
1. Median and standard deviation [ pdf ]
2. Inverted index summarizations [ pdf ]
CH-3 Filtering patterns: choose one of the following
1. Bloom [ pdf ]
2. Top ten [ pdf ]
CH-4: Join patterns: choose two of the following
1. Reduce side join classic and with bloom filter [ pdf ]
2. Replicated join [ pdf ]
3. Composite join [ pdf ]
4. Cartesian product [ pdf ]

Big Data Fest: Data, tools & practice

CONTENT

HANDS ON

CHALLENGES

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

HPCC Systems from LexisNexis Risk Solutions

KX

InterSystems

MySQL/Oracle

SingleStore

Supporters

McObject

NEXTGRES

Progress

Raima

Scality

Volt Active Data