Models for Big Data
This paper explores data models used for big data processing and shows how the preferred technology is one that can flexibly move between models.
Table of Contents
Models for Big Data
Structured Data
Text (and HTML)
Semi-Structured Data
Bridging the Gap – The Key – Value pair
XML – Structured Text?
RDF
Data Model Summary
Data Abstraction – An Alternative Approach
Structured Data
Text
Semi-Structured Data
Key-Value Pairs
XML
RDF
Model Flexibility in Practice
Conclusion
Models for Big Data
The principal performance driver of a Big Data application is the data model in which the Big Data resides. Unfortunately most extant Big Data tools impose a data model upon a problem and thereby cripple their performance in some applications1. The aim of this paper is to discuss some of the principle data models that exist and are imposed; and then to argue that an industrial strength Big Data solution needs to be able to move between these models with a minimum of effort.
As each data model is discussed various products which focus upon that data model will be described and generalized pros and cons will be detailed. It should be understood that many commercial products when utilized fully will have tricks, features and tweaks designed to mitigate some of the worst of the cons. Notwithstanding, this paper will attempt to show that those embellishments are a weak substitute for basing the application upon the correct data model.