Positioning CortexDB in Hadoop Data Architectures

Positioning CortexDB in Hadoop Data Architectures

Extending the Data Warehouse and beyond

An analysis by the

WOLFGANG MARTIN TEAM, October 2014.

© 2014 S.A.R.L. Martin

Contents

1. Management Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.BigDataChallengestoEnterpriseDataManagement …………………4

3. Hadoop and the Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.Hadoop and NoSQLTechnologies ……………………………….9

5. CortexDB – a Complementary NoSQL Technology on top of Hadoop . . . . . . . 11

6. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1. Management Summary

Up to now, data architectures for structured data consist of two parts: an architecture for operational (trans- actional) data and an architecture for analytical data plus the data management processes linking both worlds. Both architectures were built on top of relational database technologies. Furthermore, organizations used content management systems for managing unstructured data.

Big Data now changes this traditional world completely. Indeed, Big Data has to be managed, and this is where the Apache open source solution Hadoop plays a major role. Hadoop is designed to store and ana- lyze both structured and unstructured data (“data in rest”). In addition, it can also manage data streams (“data in motion”). Also, given that new analytical workloads can run on Hadoop, this new platform needs to be added into traditional data architectures.

There is a second movement in data management pushed by Big Data: the upcoming NoSQL database technologies. Their success is based on a simple fact. Many of the new analytics and information process- ing challenges in Big Data analytics can no longer be addressed by the traditional relational database tech- nologies.

Up to now, in traditional data architectures, the data warehouse is still considered to provide the single point of truth, i.e. it is the central platform for all BI systems. But Big Data challenges make Hadoop to be the new platform needed to support today’s analytical workloads that go beyond those supported by a data warehouse. In addition, more and more BI vendors provide direct access to Hadoop so that traditional BI tools can now access Hadoop via SQL on Hadoop. Will the data warehouse and its relational technology become obsolete and be replaced by Hadoop?

Indeed, the market place has not yet found a definitive answer to this question. In this research note, we will discuss the issue and conclude that the data warehouse will continue to be center and single point of truth for tactical and strategic performance management, at least for the foreseeable future. But ETL processing and some analytical workloads will move increasingly to Hadoop, particularly if they involve analyzing semi-structured and unstructured data.

Furthermore, NoSQL databases are becoming an increasingly important part of this new database land- scape. They complement Hadoop and offer various advantages over relational database technologies. Indeed, Hadoop complemented by NoSQL technologies offers the potential to handle Big Data analytics and to develop new innovative apps that were not possible by using relational technology.

CortexDB is one of the NoSQL database technologies that is complementary to Hadoop when sitting on top of Hadoop. In this research note, we have a closer look at CortexDB, and discuss what makes CortexDB stand out: CortexDB is a temporal, multimodal NoSQL database technology that differs from all known databases via its index structure and its content-orientation. We will demonstrate how, and we will show that networked systems, data dependencies, complex configurations, very large data volumes, and constant changes can be well addressed by the schema-less multi-model CortexDB. Indeed, requirements like these cannot be managed by Hadoop alone. In other words: CortexDB is the solution for managing large quantities of complex poly-structured data in Hadoop data architectures.

 

DOWNLOAD FULL REPORT (.PDF): Cortex_ResearchNote_2014

Resources

 

You may also like...