Skip to content
Tags

,

What Standards for Object Databases?

by Roberto V. Zicari on December 19, 2007

I thought it would be interested to give you an insight of the discussion currently going on at ODBMS.ORG. The issue is what Standards for Object Databases?
Here are two notes, one from Wiliam Cook and one from Mike Card.

A copy of the OMG white paper on Next-Generation Object Database Standardization written by the OMG`s Object Database Technology Working Group, can be download here Next-Generation Object Database Standardization

Roberto V. Zicari
————–

Hi everybody.
I’m sorry that I was not able to attend the meeting on Dec 12. I hope that someone can post some information on it. I think it is great that these topics are being discussed, but I also have some significant disagrements with points being made here.

My biggest issue is that I don’t agree with the premise of the OMG RFI and Prof. Subieta’s response. The premise is that the problem is “the underlying lack of a set of precise definitions and semantics that has plagued ODMSs for years” [mpcard]. The assumption here is that people didn’t use object databases because OODBs didn’t have a solid theory like relational algebra. I do not believe that was the reason. I think the reason was that (1) most of the original OODBs systems didn’t support query optimization or transactions (2) they had difficulty externalizing their data in a way that could be evolved and used by other tools (3) when the did introduce query languages, they were subject to the same impedence mismatch as relational systems.

I think that Impedence mismatch is a language problem not a data problem. Relational data maps very well to traditional data structures in C, Pascal, or any other programming language: just create an array of records. Relational data maps fairly well to objects too, especially since you can represent relationships easily. The impedence mismatch comes from the need to partition a program into two parts: a query that is sent to the database, and a client program that uses the query results. Previously this partitioning was done by putting the query into a string, which causes all sorts of problems. Native Queries and LINQ are two more modern and effective ways to partition a program into a query and a client, so that the semantic connections between them are preserved. Prof. Subieta’s proposal does not address this problem, as far as I can tell.

As for data models, I think that Entity-Relationship models, UML class diagrams, and Subieta’s models are all essentially equivalent. They have the concept of records of attributes connected by relationships. The relational model also has ses of records, but the relationships are not explicit in the data model, but must be specified on each join operation. You can argue over fine points of inheritance and such things, but these are small points compared to the basic similarities of the models. It is not fair to compare any of these models to the network model, which as far as I can tell was a hack on top of the hierarchical data model. It is asuming that hierarchical data models have had a resurgence under the name XML; these are very useful for data transmission but are not a suitable foundation for a database.

As for query languages, I don’t think that the stack-based query language has anything fundament to offer over OQL. It is like saying that an HP calculator with postfix notation has a more solid theoretical model than a standard calcular that uses infix. I also want to point out that the core of OQL is not really object-oriented, becuase it does not deal with methods. It is just a great query language for ER data models. The key point is “entities and relationships” and that is what OQL was designed for and is good at. I do not agree that OQL is inconsistent. Suad pointed out some difficulties with the Java binding, and perhaps there are some other small problems with the way the standard was defined. But rather than fix these small issues, he claimed that the entire system is inconsistent.
See here for an alternative and more balanced view. I think that Prof. Subieta’s query syntax is perfectly reasonable as well. But it is not a fundamental advance, as far as I can tell.

NOTE: Native Queries are not propretary; they were described by one of my students and me in an ECOOP paper and then implemented by db4objects. They have been implemented by others as well, although not in any commercial systems. They are also similar to Microsoft’s LINQ in some ways.

So, to summarize. I think that OMQ is again trying to solve the wrong problem. I sent in a response to the RFI; and yes, it wasn’t what you wanted to hear. But I’m going to keep saying it.

The problem is not a lack of a grand unifying theory. There is plenty of theory to cover ER models, OQL, and other traditional ideas. The disucssions you are having don’t deal with impedence mismatch, which can happen even with an object-oriented language accessing an object-oriented database using OQL! If you put OQL into a string, then you are going to have impedence, and nothing about the formality of the data model or query language is going to fix it. The real problems are impedence mismatch, good query optimization, solid transaction support, evolution of data, and scalability to multiple servers. These are things that OODB vendors didn’t address until it was too late. They thought that objects alone would magically make everthing work well. But.. they don’t.

I’m sorry to be so negative about this, but I really think that there is an opportunity to improve the DB/PL interface.

Wiliam Cook
Assistant Professor
Department of Computer Sciences
University of Texas at Austin

————–
Hello Prof. Cook-

You wrote:

“My biggest issue is that I don’t agree with the premise of the OMG RFI and Prof. Subieta’s response. The premise is that the problem is “the underlying lack of a set of precise definitions and semantics that has plagued ODMSs for years” [mpcard]. The assumption here is that people didn’t use object databases because OODBs didn’t have a solid theory like relational algebra. I do not believe that was the reason. I think the reason was that (1) most of the original OODBs systems didn’t support query optimization or transactions (2) they had difficulty externalizing their data in a way that could be evolved and used by other tools (3) when the did introduce query languages, they were subject to the same impedence mismatch as relational systems.”

I don’t think the RFI itself had a “premise,” at least that I am aware of. Regarding your 3 reasons why ODBMSs were not widely adopted, I would argue that you could trace all 3 of these issues to the lack of a good underlying object model and set of definitions and semantics. I cannot see how you think the “impedance mismatch” or DB/PL interface issue will be solved without laying a good theoretical foundation.

“The problem is not a lack of a grand unifying theory. There is plenty of theory to cover ER models, OQL, and other traditional ideas. The disucssions you are having don’t deal with impedence mismatch, which can happen even with an object-oriented language accessing an object-oriented database using OQL! If you put OQL into a string, then you are going to have impedence, and nothing about the formality of the data model or query language is going to fix it.”

Sure, but no one has ever tried to tie object definition/store models all the way up to a QL, defined with an abstract query processor, like Prof. Subieta has (at least as far as I have read). It is true that the formality of the data model won’t solve the “impedance mismatch” between a query string and a native PL,
but again this falls into the area of further work we have to do. Everyone thinks they have the best way to do this: everyone in ODMG thought their APIs were best and their way was best, and that a formal set of definitions, semantics, and object models was unnecessary because in the end developers just need to write code. That’s why ODMG chapter 2 was so weak and why there were so many “holes” in the ODMG specification: we were trying to write something that would cover several existing products without requiring anyone to make significant code changes. Users didn’t care about the standard because it did not guarantee application code (or even data) portability, so what did it matter? There was no conformance test suite, so you couldn’t even say for sure who was conformant to what.

“The real problems are impedence mismatch, good query optimization, solid transaction support, evolution of data, and scalability to multiple servers. These are things that OODB vendors didn’t address until it was too late. They thought that objects alone would magically make everthing work well. But.. they don’t.”

Yes these are real problems but I would argue that solving them will require a common theoretical foundation from which to build. I guess we’ll see if there is consensus on that view or not at next month’s ODBTWG telecon.

Mike Card
Syracuse Research Corporation (SRC)

From → Uncategorized

One Comment Leave one →
  1. In reply to the original comments:

    I don’t think it’s true that most of the original OODB systems didn’t
    support query optimization or transactions. It depends a bit on what
    you mean by “original”, but if you mean the generation of companies
    that appeared in 1988, I don’t think it’s true. ObjectStore
    absolutely always supported ACID transactions. It also had a query
    optimizer, albeit a simple one, without any impendence mismatch.

    You say that relational data maps fairly well to objects, but there
    are a lot of big problems, which have been written about at great
    length. For example, it is difficult to model inheritance in
    relational databases. This is not a “fine point”; it’s crucial.

Leave a comment to Daniel Cancel reply

Note: HTML is allowed. Your email address will not be published.

Subscribe to this comment feed via RSS