Tuesday, January 12, 2010

Rick Cattell on "Relational Databases, Object Databases, Key-Value Stores, Document Stores, and Extensible Record Stores: A Comparison."

What`s new at ODBMS.ORG in 2010?

I have extended the focus of ODBMS.ORG to include, besides object database technologies, new developments in data management, such as the linkage to service platforms, operation within scalable (cloud) platforms, object-relational bindings, NoSQL databases and new approaches to concurrency control.

ODBMS.ORG will offer in 2010 educational resources in all of these areas.

I have just published a new expert article by Rick Cattell on this topic:
"Relational Databases, Object Databases, Key-Value Stores, Document Stores, and Extensible Record Stores: A Comparison".

Rick Cattell, formerly at Sun Microsystems, and co-creator of JDBC, and chair of the Object Data Management Group (ODMG), explains: "Traditionally, the obvious platform for most database applications has been a relational DBMS. You might use a specialized parallel relational DBMS if you required high throughput for “data warehousing”, or an object database system if your application had unusual functionality or performance requirements, e.g. for in-memory caching or fast relationship traversal. However, an RDBMS like Oracle or MySQL has usually been the answer. This has changed somewhat recently.

There is now recognition in database research that “one size does not fit all”, for example in the widely-referenced paper by Stonebraker and colleagues.

And in the Web 2.0 industry, many companies have abandoned traditional RDBMSs for so-called “NoSQL” data stores that provide much higher scalability, or they have built a distributed caching layer on top of RDBMSs. More scalable RDBMSs are also coming to market" so Cattell.

Rick`s article ""Relational Databases, Object Databases, Key-Value Stores, Document Stores, and Extensible Record Stores: A Comparison" is available for free download as (PDF)..
Worth reading it...

In fact, we already started last year to look at new developments in data management such as NoSQL databases and Cloud Stores.
An example is the article "On NoSQL technologies. Part I" (PDF) which presents interviews on new data stores with Patrick Linskey, Robert Charles Greene, Kaj Arno and Giuseppe Maxia.

RVZ

Labels: , , ,

Monday, November 16, 2009

Patrick Linskey on "cloud store"

I have asked Patrick Linskey on his opinion on the new wave of "data stores", such as "document stores", and "nosql databases".

You can read the interview below.

Roberto V. Zicari

RVZ:
Patrick, there has been recently a proliferation of "data stores", such as "document stores", and "nosql databases".
Systems such as CouchDB, MongoDB, SimpleDB, Voldemort, Scalaris, etc. provide less functionality than OODBs but a distributed "object" cache over multiple machines.

See for example: wiki/Nosql,
wiki/wiki/Document-oriented_database,
and the article: NoSQL: Distributed and Scalable Non-Relational Database Systems.

What do you think about it?

Patrick Linskey:
I think that the "cloud store" subset of them are pretty fascinating. Of course, as with so much in the software industry, much of what these projects are doing is old hat. But I think that they're relatively unique in
(a) successfully combining compelling complementary sets of features together,
(b) building solutions for known and needed use cases, rather than the more ivory-tower approach that's all too typical of commercial products, and
(c) designing and implementing in a manner oriented to cloud-scale deployment from the very start (i.e., lots of data; geographically diverse data centers; high load requirements).

I expect that all the successful cloud store projects will end up with support for declarative queries and declarative secondary keys. I really don't like the "nosql" term -- I think that Geir Magnusson does a good job of pointing out that the cloud store community is more focused on "alongside SQL". That is, there's nothing wrong with using a relational database in the situations where it's the best tool for the job. The new cloud stores are focused on filling the gaps where most RDB alternatives fall flat.

The way they do it, of course, is by getting rid of problematic features. I think that some of the hype has mis-identified these
problematic features, though. Declarative queries (and full metamodel introspection) and secondary key support are really cool and critical features of all the popular relational databases. The cloud store users out there are doing a lot of extra work because of the absence of these features -- essentially re-implementing them in their application code. Imagine how horrible it'd be if you told a modern DB team that they needed to change their app to tune their database!

So: what are cloud stores omitting that enable them to scale so well?
There are two answers:
- cloud stores are intentionally designed to scale. No* single points of failure, built-in support for consensus-based decisions, partitioning / replication as basic primitives, etc. Taking a codebase designed for a single server and evolving it to a multi-server solution is difficult, since single-machine assumptions often calcify into the implementation.

- more importantly, cloud stores aren't fully ACID, in the traditional sense of the term. By re-casting the data storage problem in more amenable terms (eventual consistency, atomic operations (but not atomic sequences of operations), etc.), the different products can make acceptable trade-offs that traditional single-server ACID stores are simply designed to forbid.

I'd love to see a comparison of established products like TeraData and Coherence to the various new cloud store projects. TeraData, in particular, does an interesting job of re-using the familiar SQL/JDBC model while making a lot of the same compromises and architectural decisions as the new set of cloud stores.

(I'm less interested in -- and educated about -- the single-server nosql projects. These days, I believe that all single-server databases are basically equivalent, since if you are using a single server, your application is sufficiently simple that you should be able to be successful with any of a number of data storage models.)

-Patrick

Patrick Linskey has been involved in object/relational mapping and databases for the last decade. As the founder and CTO of SolarMetric, Patrick drove the technical direction of the company and oversaw the development of Kodo, through its acquisition by BEA. At BEA, Patrick led the EJB team in designing and implementing the WebLogic Server EJB 3.0 solution, and represented BEA on the JDO and EJB3 expert groups. He is a contributor to the Apache OpenJPA project.

Since leaving Oracle, Patrick has worked on a number of projects, ranging from traditional three-tier web and mobile applications to C# peer - to - peer client applications with custom-designed distributed storage solutions.

Labels: , , ,

Saturday, October 4, 2008

More Impedance mismatch: Cloud Computing

I noticed a news on an additional source of Impedance mismatch: Cloud Computing...

Geir Magnusson, vice president of engineering and co-founder of 10gen, presented at a conference called Web 2.0 Expo, a talk: "The Sequel to SQL: Why You Won't Find Your RDBMS in the Clouds."

Magnusson said "an RDBMS is what you need, but not in the cloud."
Magnusson seems to support O/R mapping: "O/R mapping blends the power of an RDBMS with the programming simplicity of an ODBMS [object database management system]," Magnusson said, noting that there is support for O/R mapping in Java, Python, Ruby, .NET and Groovy. "O/R mapping is everywhere."

However, the series of interviews with users indicate that O/R mapping is only one way (and not the most simple one) of getting around the impedance mismatch between object-oriented languages and data stored in a relational system.

Labels: , ,