Big Data Fest: Data, tools & practice

Big Data Fest
Data, tools & practice

Genoveva Vargas Solar, Senior Scientist, CNRS
Regina Motz, Prof. Universidad de la República
Javier Alfonso Espinosa Oviedo, Postdoctoral fellow, LAFMIA
Technical support

Plácido Antonio de Souza Neto, Postdoctoral fellow, LIG
Juan Carlos Castrejón, PhD student, University of Grenoble, LIG-LAFMIA



1. Big Data Challenges [PDF]

1.1. Volume: Huge data collections

1.2. Velocity: Continuous on-line data streams

1.3. Variety: Big data models

2. Applications and tools

2.1. Data replication and sharding [pdf]

2.1.1 NoSql Systems: experience with CouchDB [pdf] CouchWrapup [pdf]

2.2 Life cycle I: sanitizing experience with Pig [pdf]

2.3 Life cycle II: data gathering techniques: Web scrapping, data services, crowdsourcing [pdf ]

2.3.1. Open data / data journalism [see examples in the project definition]

3. Big Data Processing Platforms

3.1. Parallel processing for analytics : Hadoop platforms  [pdf]

3.2 Some elements of data analytics [pdf]

3.3. Big Data Management Systems [see slides section 1 & the references section]

4. Big and smart data applications: examples

4.1 Elections [pdf]

4.2 Other applications [pdf]


  • NoSQL data stores: expressing queries using MapReduce
  1. Downloading Couch:
    1. Building a document database: using CouchDB [Ex-1] [Ex1-answers]
    2. Querying a document database [Ex-2] [answers on explicit demand]
  •  Data sanitation with Pig
  1. Installing Pig
    1. Hortonworks [pdf]
    2. Testing your installation: [data] [PigScript]
  2. Dealing with network behavior data collections [pdf] data[distributed in class ask for It !]
  •  Data analytics with Hadoop
  1. Environment: hadoop on Hortonworks
  2. Counting words and other summarization challenges [AllData]
    1. Counting words: first approach  [ pdf ] [WordCount Example]
    2. Counting with some optimizations using combiners: understanding some principles of the map reduce model [ pdf ] [MapReduce-book-final] [code examples]
  3. Some interesting map reduce patterns: see the challenges section [patterns reference]


challenge : “a test of one’s abilities or resources in a demanding but stimulating undertaking”, The free English dictionary 

  • CH1: Polyglot meets Xperanto [here]
  • CH-2: More intensive summarization: choose one of the following
    1. Median and standard deviation [ pdf ]
    2. Inverted index summarizations [ pdf ]
  • CH-3 Filtering patterns: choose one of the following
    1. Bloom [ pdf ]
    2. Top ten [ pdf ]
  • CH-4: Join patterns: choose two of the following
    1. Reduce side join classic and with bloom filter [ pdf ]
    2. Replicated join [ pdf ]
    3. Composite join [ pdf ]
    4. Cartesian product [ pdf ]

You may also like...