Two cons against NoSQL. Part II.
This post is the second part of a series of feedback I received from various experts, with obviously different point of views, on:
Two cons against NoSQL data stores :
Cons1. ” It’s very hard to move data out from one NoSQL to some other system, even other NoSQL. There is a very hard lock in when it comes to NoSQL. If you ever have to move to another database, you have basically to re-implement a lot of your applications from scratch. “
Cons2. “There is no standard way to access a NoSQL data store. All tools that already exist for SQL has to be recreated to each of the NoSQL databases. This means that it will always be harder to access data in NoSQL than from SQL. For example, how many NoSQL databases can export their data to Excel? (Something every CEO wants to get sooner or later).”
You can also read Part I here.
J Chris Anderson, Couchbase, cofounder and Mobile Architect:
On Cons1: JSON is the defacto format for APIs these days. I’ve found moving between NoSQL stores to be very simple, just a matter of swapping out a driver. It is also usually quite simple to write a small script to migrate the data between databases. Now, there aren’t pre packaged tools for this yet, but it’s typically one page of code to do. There is some truth that if you’ve tightly bound your application to a particular query capability, there might be some more work involved, but a least you don’t have to redo your stored data structures.
On Cons2: I’m from the “it’s a simple matter of programming” school of thought. e.g. just write the query you need, and a little script to turn it into CSV. If you want to do all of this without writing code, then of course the industry isn’t as mature as RDBMS. It’s only a few years old, not decades. But this isn’t a permanent structural issues, it’s just an artifact of the relative freshness of NoSQL.
Marko Rodriguez, Aurelius, cofounder :
On Cons1: NoSQL is not a mirror image of SQL. SQL databases such as MySQL, Oracle, PostgreSQL all share the same data model (table) and query language (SQL).
As such, the vendor lock-in comment should be directed to a particular type of NoSQL system, not to NoSQL in general. Next, in the graph space, there are efforts to mitigate vendor lock-in. TinkerPop provides a common set of interfaces that allow the various graph computing technologies to work together and it allows developers to “plug and play” the underlying technologies as needed. In this way, for instance, TinkerPop’s Gremlin traversal language works for graph systems such as OrientDB , Neo4j, Titan, InfiniteGraph, RDF Sail stores , and DEX to name several.
To stress the point, with TinkerPop and graphs, there is no need to re-implement an application as any graph vendor’s technology is interoperable and swappable.
On Cons2: Much of the argument above holds for this comment as well. Again, one should not see NoSQL as the space, but the particular subset of NoSQL (by data model) as the space to compare against SQL (table). In support of SQL, SQL and the respective databases have been around for much longer than most of the technologies in the NoSQL space. This means they have had longer to integrate into popular data workflows (e.g. Ruby on Rails). However, it does not mean “that it will always be harder to access data in NoSQL than from SQL.” New technologies emerge, they find their footing within the new generation of technologies (e.g. Hadoop) and novel ways of processing/exporting/understanding data emerge. If SQL was the end of the story, it would have been the end of the story.
David Webber, Oracle, Information Technologist:
On Cons1: Well it makes sense. Of course depends what you are using the NoSQL store for – if it is a niche application – or innovative solution – then a “one off” may not be an issue for you. Do you really see people using NoSQL as their primary data store? As with any technology – knowing when to apply it successfully is always the key. And these aspects of portability help inform when NoSQL is appropriate. There are obviously more criteria as well that people should reference to understand when NoSQL would be suitable for their particular application. The good news is that there are solid and stable choices available should they decide NoSQL is their appropriate option. BTW – in the early days of SQL too – even with the ANSI standard – its was a devil to port across SQL implementations – not just syntax, but performance and technique issues – I know – I did three such projects!
Wiqar Chaudry, NuoDB, Tech Evangelist :
On Cons1: The answer to the first scenario is relatively straightforward. There are many APIs like REST or third-party ETL tools that now support popular NoSQL databases. The right way to think about this is to put yourself in the shoes of multiple different users. If you are a developer then it should be relatively simple and if you are a non-developer then it comes down to what third-party tools you have access to and those with which you are familiar. Re-educating yourself to migrate can be time consuming if you have never used these tools however. In terms of migrating applications from one NoSQL technology to another this is largely dependent on how well the data access layer has been abstracted from the physical database. Unfortunately, since there is limited or no support for ORM technologies this can indeed be a daunting task.
On Cons2: This is a fair assessment of NoSQL. It is limited when it comes to third-party tools and integration. So you will be spending time doing custom design.
However, it’s also important to note that the NoSQL movement was really born out of necessity. For example, technologies such as Cassandra were designed by private companies to solve a specific problem that a particular company was facing. Then the industry saw what NoSQL can do and everyone tried to adopt the technology as a general purpose database. With that said, what many NoSQL companies have ignored is the tremendous opportunity to take from SQL-based technologies the goodness that is applicable to 21st century database needs. .
Robert Greene, Versant, Vice President, Technology:
On Cons1: Yes, I agree that this is difficult with most NoSQL solutions and that is a problem for adopters.
Versant has taken the position of trying to be first to deliver enterprise connectivity and standards into the NoSQL community. Of course, we can only take that so far, because many of the concepts that make NoSQL attractive to adopters simply do not have an equivalent in the relational database world. For example, horizontal scale-out capabilities are only loosely defined for relational technologies, but certainly not standardized. Specifically in terms of moving data in/out of other systems, Versant has developed a connector for the Talend Open Studio which has connectivity to over 400 relational and non-relational data sources, making it easy to move data in and out of Versant depending on your needs. For the case of Excell, while it is certainly not our fastest interface, having recognized the needs of data access from accepted tools, Versant has developed odbc/jdbc capabilities which can be used to get data from Versant databases into things like Excell, Toad, etc.
On Cons2: Yes, I also agree that this is a problem for most NoSQL solutions and again Versant is moving to bring better standards based programming API’s to the NoSQL community. For example, in our Java language interface, we support JPA ( Java Persistence API ), which is the interface application developers get when ever they download the Java SDK. They can create an application using JPA and execute against a Versant NoSQL database without implementing any relational mapping annotations or XML.
Versant thinks this is a great low risk way for enterprise developers to test out the benefits of NoSQL with limited risk. For example, if Versant does not perform much faster that the relational databases, run on much less hardware, scale-out effectively to multiple commodity servers, then they can simply take Hibernate or OpenJPA, EclipseLink, etc and drop it into place, do the mapping exercise and then point it at their relational database with nothing lost in productivity.
In the .NET world,b we have an internal implementation that support LINK and will be made available in the near future to interested developers. We are also supporting other standards in the area of production management, having SNMP capabilities so we can be integrated into tools like OpenView and others where IT folks can get a unified view of all their production systems.
I think we as an engineering discipline should not forget our lessons learned in the early 2000′s. Some pretty smart people helped many realize that what is important is the life cycle of your application objects, some of which are persistent, and that what is important is providing the appropriate abstraction for things like transaction demarcation, caching activation, state tracking ( new, changed, deleted ) etc. These are all features common to any application and developers can easily abstract them away to be database implementation independent, just like we did in the ORM days. Its what we do as good software engineers, find the right abstractions and refine and reuse them over time. It is important that the NoSQL vendors embrace such an approach to ease the development burden of the practitioners that will use the technology.
Jason Hunter, MarkLogic , Chief Architect:
On Cons1: When choosing a database, being future-proof is definitely something to consider. You never know where requirements will take you or what future technologies you’ll want to leverage. You don’t want your data locked into a proprietary format that’s going to paint you into a corner and reduce your options. That’s why MarkLogic chose XML (and now JSON also) as its internal data format. It’s an international standard. It’s plain-text, human readable, fully internationalized, widely deployed, and supported by thousands upon thousands of products. Customers choose MarkLogic for several reasons, but a key reason is that the underlying XML data format will still be understood and supported by vendors decades in the future. Furthermore, I think the first sentence above could be restated, “It’s very hard to move the data out from one SQL to some other system, even other SQL.” Ask anyone who’s tried!
On Cons2: People aren’t choosing NoSQL databases because they’re unhappy with the SQL language. They’re picking them because NoSQL databases provide a combination of feature, performance, cost, and flexibility advantages. Customers don’t pick MarkLogic to run away from SQL, they pick MarkLogic because they want the advantages of a document store, the power of integrated text search, the easy scaling and cost savings of a shared-nothing architecture, and the enterprise reliability of a mature product. Yes, there’s a use case for exporting data to Excel. That’s why MarkLogic has a SQL interface as well as REST and Java interfaces. The SQL interface isn’t the only interface, nor is it the most powerful (it limits MarkLogic down to the subset of functionality expressable in SQL) but it provides an integration path.