Multi-Model Database: The New Normal
by John Biedebach, MarkLogic.
May 17, 2016
I distinctly remember the first Oracle admin class I took. It was in the late 90s and it was Oracle 7.3.4, aka “the first version that really worked.” I logged in with id Scott and pw Tiger, selected * from the emp table, and I was hooked. Harry, Sally and George sprung to life and I enjoyed figuring out who worked in Finance and who worked in Sales.
The other thing I liked about it was that it made sense and I was good at it. That was not true for everyone. Later I worked for Informatica when they were developing the analytic applications. I still have the paper copies of the data models. Just like I have held onto those data models, our industry has been holding on to some things that may be outdated.
Trying to Modify RDBMS for Its Limitations
In fact, we have created an entire economy to compensate for the limits of relational technology: Indexes too slow? Let’s put them in memory. Joins too resource intensive? Let’s send the joins down to the storage layer and spread them out. Partitioning, engineered systems, ETL, abstraction layers, etc. were all designed to counteract shortfalls in our approach to working with data.
When Al Gore invented the Internet,* he introduced a variety, volume and velocity of data that had not been seen before. Relational DBs actually do OK with velocity, but variety is a killer. The notion of a schema requires a fixed structure — and every change in the source data requires an adjustment or a compromise.
Likewise, normalizing the data to drive out cardinality was important when compute and storage were expensive, but it also forced unnatural behavior. Say, for example, you walk into a store and buy a few items. Somewhere in the ether, a relational data modeler takes that transaction and breaks it into a bunch of pieces: customer goes over here, products go over there, location in another place. For 30 years this has been the standard and suggesting any other way is blasphemy! But a new normal is emerging.
For most of our lives structured data has dominated the landscape. The information that defined the digital you was contained in tables and rows. But just like the real you is more than just a collection of bits and bytes, the digital you now is as well. Social media, blog posts, video, pictures all exist online as a reflection of who you are.
Think about how many pictures you have on your phone. I have over a thousand. I am not sure I took a thousand pictures total in my first 40 years and yet now that I have a camera on my phone, there are a thousand new images that capture the significant to the mundane aspects of my life.
A rows-and-column database will no longer do it. We need a multi-model database. One that can capture structured data alongside all of the other information that goes with it. Did you purchase something? What about linking your review (unstructured data) to your purchase in Amazon? Did you post your new purchase to Facebook (more unstructured data)? Let’s capture that too, so that we can build a (truly) complete picture of the transaction.
Even for pure-structured data a multi-model document database makes sense. In our grocery store example, the database designers had to split out the line items from the rest of the purchase information because they could not predict how many items each person would buy. The transaction itself would go into a header table, while each product in the transaction would become its own row in a transaction detail table.
Using a NoSQL approach, the transaction could be captured as a whole, in a JSON or XML document. The beauty of the document approach is that we wouldn’t have to predict the length in advance. We could take things as they came, not bothered by variety or volume.
This is now how some of the world’s largest applications run, because the old way imposed too many restrictions. The benefit of the multi-model database is time to value. I am not saying NoSQL applications do not have to be modeled; they just tend to have simpler models because they have a more elegant way of dealing with exceptions. Each document (or record) tells us what is inside so knowing the structure of the data up front is not as necessary.
If you are reading this you might be a Big Data convert, or perhaps you are a skeptic, or better yet, you are a seeker. Regardless, I am not looking to convert you. A lot of the stuff I read on Big Data is about tactics – which product, which approach, which technology stack. But in order to make real progress, we need to have higher level conversations. Architects and technology executives need to be able to identify good candidates for NoSQL and business executives need to understand the real economic benefits.
The world around us has changed. It is time we looked at a next generation database to go with it.
*Disclaimer: Al Gore did not really invent the Internet
Sponsored by MarkLogic