Personally, I felt the confusion subverted the category in general, as it negated the premise that there was a database that could indeed ingest multiple data models – and poly schemas with various structures and organizational strategies. Under the “NoSQL” moniker, a database that could do this was lumped in with everything from the latest-cute-named data store to my basement – and everyone’s green swamp.
We certainly struggled with it here. MarkLogic had been the darling of media companies and national security organizations for a dozen years before it was acknowledged by analysts. But, oh! the confusion. It can’t be transactional. Well, yes it can. It can’t handle SQL. Negative — I mean, it can.
Fortunately, it looks like a new name is on the horizon: Multi-model is a far more accurate term for data stores that truly welcome all types of data. It is good to have a category finally!
My colleague Damon Feldman, solutions director at MarkLogic, is often thrown into architectures that have been cobbled together by promises, but unfortunately just don’t solve target problems. “People were told they could take lots of data types and put them in different places, which you can,” he told me. “The hard part is making them persist.”
The conventional understanding is that to achieve polyglot persistence you store each discrete data type in its own discrete technology. But polyglot means “able to speak many languages,” not “able to integrate many components.”
The solution? Damon says to have a database that natively stores two or more data models. “Some examples are: document models, and those sort of divide primarily into JSON and XML; RDF or triple models, that’s in the semantic world; or graph models; and I think of text as being another model because it’s usually a very different way of indexing and managing data, and people have a tendency to put that in a separate system. So I would say also text models.”
Damon adds one caveat: look out for vendors that are just bolting on another database. “The issue that I’ve seen with individual projects trying to bundle together many different databases into one system, the so-called polyglot persistence model,” he began, “is that the enterprise components of all these systems don’t work together well. So even though you might be able to query some text and some documents together, when you try to do your backups all at exactly the same time and restore at exactly the same time, each system backs up at a 5-second difference from one another, and you have to figure out what got out of sync.”
True multi-model databases require having the ability to store multiple types of data in the same system that has unified data governance, management, and access. And if you are storing it, you must be able to search it. Composability of search is crucial – meaning you have a database that handles all the different data models, and indexes them so you can run combination queries of text, SPARQL, XQuery and whatever against them is key.
Analysts are starting to see the distinctions too, with one confiding, “It’s no longer about relational vs. non, we’re in the multi-model database generation now.”
But if we are in a multi-model database generation, parsing the facts from the marketing fluff is tough without some basic understanding. In early December, O’Reilly Media featured Damon on a live webcast where he looked at the two paths of multi-model database engineering: a single platform that allows many models on one core, versus complex integrations where many systems are pre-packaged. Of course, multi-model isn’t for every purpose. Damon explored the when, the how and the skills needed to manage this new darling of a database.
Sponsored by MarkLogic