Tuesday, May 20, 2008

Object Database Systems: Quo vadis?

I wanted to have an opinion on some critical questions related to Object Databases:

Where are Object Database Systems going? Are Relational database systems becoming Object Databases? Do we need a standard for Object Databases? Why ODMG did not succeed?


I have therefore interviewed one of our Experts, Mike Card , on his view on the current State of the Union of object database systems.
Mike works with Syracuse Research Corporation (SRC) and is involved in object databases and their application to challenging problems, including pattern recognition. He chairs the ODBT group in OMG to advance object database standardization.

Question1:
It has been said (See Java Panel II ) that an Object Database System in order to be a suitable solution to the object persistence problem needs to support not only a richer object model, but it also has to support set-oriented, scalable, cost-based-optimized query processing, and high-throughput transactions.
Do current ODBMS offer these features?


Mike Card:
In my opinion, no though the support for true transactional processing varies between vendors. Some products use “optimistic” concurrency control, which is suitable only for environments where there is very little concurrent access to the database, such as single-threaded embedded applications. In my opinion, a database engine is not “scalable” (at least in the enterprise sense of the word) if it is based on optimistic concurrency control. This is because most truly large-scale applications will require optimal performance with many concurrent transactions, and this cannot be achieved when updates have to be rolled back at transaction commit time and re-attempted due to access conflicts.

Question2:
Relational systems are rapidly becoming object database systems (See Java Panel II ). Do you agree or disagree with this statement? Why?


Mike Card:
I would disagree, because relational databases still fundamentally access objects as rows of tables and do not offer seamless integration into a host programming language’s type system. It is true that there are some good ORMs out there, but these will never offer the performance or seamlessness that is available with a good ODBMS. I would agree that ORMs are getting better, but relational databases themselves are not becoming object databases.

Question3:
A lot of the worlds systems are built on relational technology and those systems need to be extended and integrated.
That job is always difficult. An ODBMS should be able to fully participate in the enterprise data ecosystem as well as any other DBMS for both new development as well as enhancing existing applications. How this can be achieved?
What is your opinion on this issue?


Mike Card:
As many vendors have noted, this is to some extent a marketing problem in terms of making enterprise customers aware of what object databases can do. It is also a technology issue, however, as engines based on “small-scale” concepts like optimistic concurrency control are not suitable to many enterprise environments.

Question4:
Object databases vary greatly from vendor to vendor. Is a standard for object databases (still) needed? If yes, what needs to be standardized in your opinion?


Mike Card:
Yes, I believe it is. The APIs for creating, opening, deleting, and synchronizing/replicating databases as well as the native query APIs should be standardized to allow application portability. Any APIs needed to insert objects into the database, remove them from the database, or create an index on them should also be standardized, again for the sake of application portability. I would also like to see a standard XML format for exporting object database contents to allow for data portability. I am not sure our current OMG effort can achieve all of these standardization goals, but I would like to.

Question5:
How would this new standard would different to the previous effort in ODMG? And what relationships this new standard would have with standards such as SQL?


Mike Card:
Unlike the previous ODMG standard, the new standard should have a conformance test suite that anyone can download and run against a candidate product. The standard itself should also be unambiguous and use precise language as is done in ISO standards for things like programming languages, e.g. ISO/IEC 8652 (Ada programming language standard).

The primary focus of an object database standard should be its support of a native programming language, so I would expect that an object database standard might be more closely tied to an ISO standard for an object programming language (Ada, C++, other ISO-standardized languages that may appear) than to SQL, though perhaps if a LINQ-like native query capability were included in the object database standard would also reference the SQL standard due to the use of SQL-like verbs and semantics in LINQ.

Question6:
LINQ is leading in database API innovation, providing native language data access. Is this a suitable standard for ODBMS? Why?


Mike Card:
LINQ looks like it has a lot of promise in this area. We (the Object Database Technology Working Group in OMG) are currently evaluating LINQ vs. the stack-based query language (SBQL) developed at the Polish-Japanese Institute for Information Technology to see how these technologies compare for handling complex queries. SBQL has proven to be very good for complex queries and is being deployed in several EU projects, though it is unknown to most American developers. We are doing this evaluation to ensure LINQ is a good foundation for developers of applications that require complex queries, and is not too “small-scale” in its current form. We also want to hear from the LINQ community on plans (if any) to include update capability in LINQ and we need to be sure there are no surprises for parallel transaction execution.

Question7:
When object databases are a suitable solution for an Enterprise and when they are not?


Mike Card:
They are not suitable when the engine is intended primarily for use in single-threaded embedded systems (optimistic concurrency control is a good indicator of this as I mentioned earlier).

An object database would be suitable for use in an enterprise system if it was really good at large-scale data management, i.e. the engine was designed to handle large volumes of data and many parallel transactions. Some object databases are not built like this, they are designed for use primarily in single-threaded embedded applications with fairly small data volumes and as such they would not be good candidates for enterprise applications.

Besides the technology used in the database engine itself, a good enterprise object database would need database maintenance tools (e.g. taking database A offline and replacing it with database B, updating or fiddling with database A and then bringing it back on-line, scheduling backups of databases and replicating databases between sites etc.).

Question 8:
Future direction of object databases. Where do they go?


Mike Card:
The answer to this question depends on where object programming languages themselves go. Up to this point, programming languages have not included the concept of persistence, it is always included as a “foreign” thing to be dealt with using APIs for things like file I/O etc. This is a very 1960s view of persistence, where programs were things that lived in core memory and persistent things were data files written out to tape or disk.

The closest thing to true integration of persistence I have seen is in Ruby with its “PStore” class. I would like to see persistence integrated even more fully, where objects can be declared persistent or made persistent a la

public class myClass {

persistent Integer[] myInts = new Integer[5];
Integer[] myOtherInts = new Integer[2];

public void aMethod() {
myOtherInts.makePersistent();
}

}

and the programming language itself would take care of maintaining them in files and loading them in at program start-up etc. without any additional work from the programmer.

Now there are obviously challenges with this as this small example shows. What does it mean to initialize a persistent object in a class declaration? Is the object re-initialized when the program starts up? Or is the persisted value retained, rendering the initialization clause meaningless on a subsequent run of the program? Should persistent objects be allowed to have initialization clauses like this? What are the rules about inter-object access? Must persistence by reachability be used to ensure referential integrity? Can a “stack” variable (i.e. a variable declared in a method) be declared or made persistent, or must persistent variables be at the class level or even “global” (static)? Are these questions different for interpreted languages like Ruby which do not have the same notions of class as languages like Java? These are computer science/discrete math questions that will be answered during the language design process which will in turn determine how much “database” functionality ends up in the language itself.

If persistence were fully integrated into an object programming language in this way, then the role of an object database for that language might be to just provide an efficient way to organize and search the program’s persistent variables. This would reduce the scope of what an object database has to do, since today an object database not only has to provide efficient organization and search (index and query) capability, but it also has to make objects persistent as seamlessly as possible. Of course, this “reduction in scope” would only be possible if the default persistence mechanism for the programming language was implemented in a way that was efficient and fast for large numbers of objects.

##

Labels: , ,

Wednesday, February 6, 2008

News from the OMG Object Database Technology Users and Vendors

I have received some information from Mrs. Charlotte W. Wales (The MITRE Corporation) related to the OMG Object Database Technology Users and Vendors Roundtable, which took place on 11 December 2007 at the OMG meeting in Burlingame, CA. I have listed it below as I have received it.

--------------------------------------------------------------------------------

News from OMG Object Database Technology Users and Vendors Roundtable, 11 December 2007

All the hard work that went into preparation of the Next Generation Object Database Standardization White Paper, augmented by the publicity received here at the ODBMS.ORG Portal (in the Forum), resulted in a successful Users and Vendors Users
and Vendors Roundtable at the OMG meeting last December in Burlingame, CA. The meeting attendance of 14 was a healthy mixture of users and vendors representing Objectivity, Versant, Gemstone, db4Objects, and Fujitsu (used to market Jasmine)
Tibco, Progeny, Boeing, TUMunich, Kangwon Univ (Korea), PJI, Syracuse Research, and MITRE.

After a welcome and introductions conducted by Char Wales (MITRE), Mike Card (Syracuse Research), calling in from his sickbed in New York, introduced the Next Generation Object Database Standardization effort, providing important historical and technical background including his role in the ODMG.

Prof K. Subieta (PJIT) then gave a presentation on his Stack Based Approach to Object Databases. Anat Ghafni (db4Objects) presented and summarized the high points of the sometimes lively discussions that appeared in the ODBMS Forum in response to the White Paper. These presentations laid an excellent groundwork for discussions during the ensuing Roundtable, moderated by Mike Card and Char Wales, which fulfilled the Roundtable’s “Objectives” – a completely open Forum, with nothing off limits.

The conclusion of the Roundtable was an agreement to work on a Roadmap for achieving the goal of an adopted Next Generation Object Database Standard with vendor implementations by 2009. Facilitated by teleconferences – the plan is to have an initial version of this Roadmap ready in time to present at the ICOODB 2008 ICOODB 2008 conference in Berlin and at the OMG Technical Committee meeting in Washington, DC, both scheduled for the same week in March 2008. If things proceed well, it is hoped that an RFP will be ready for issuance by June 2008, and – with luck – initial
submissions ready for review by the end of this year.

For the benefit of those who have not been part of this “from the beginning”, a recap of a few of the significant events within OMG leading to the Roundtable last December is in order:

-Sep 03: 1st Object Database Working Group meeting; idea of improving existing ODMG3.0 standard introduced.

-Nov 03, Apr '04: “Socialization” of this idea within OMG.

-May 04: Morgan-Kauffman grants OMG the right “to publish, revise, disseminate and use original and revised versions of the Standard as an OMG specification (the “Specification”)” subject to limitations detailed in letter to OMG.

-Sep 05: ODBMS.ORG portal launched.

-Dec 06: Decision to expand scope to Object Database Technology (including modeling and mappings between object and relational).

-Feb 06: Object Database Technology Request for Information (RFI) Issued.

-Jun 06: Report summarizing 11 RFI responses identified three ways forward.

-Sep 07: Next-Generation Object Database Standardization White Paper issued.

Charlotte W. Wales

Labels: ,

Wednesday, December 19, 2007

What Standards for Object Databases?

I thought it would be interested to give you an insight of the discussion currently going on at ODBMS.ORG. The issue is what Standards for Object Databases?
Here are two notes, one from Wiliam Cook and one from Mike Card.
For more details on the discussion, please visit the Forum at ODBMS.ORG.

A copy of the OMG white paper on Next-Generation Object Database Standardization written by the OMG`s Object Database Technology Working Group, can be download here Next-Generation Object Database Standardization

Roberto V. Zicari
--------------

Hi everybody.
I'm sorry that I was not able to attend the meeting on Dec 12. I hope that someone can post some information on it. I think it is great that these topics are being discussed, but I also have some significant disagrements with points being made here.

My biggest issue is that I don't agree with the premise of the OMG RFI and Prof. Subieta's response. The premise is that the problem is "the underlying lack of a set of precise definitions and semantics that has plagued ODMSs for years" [mpcard]. The assumption here is that people didn't use object databases because OODBs didn't have a solid theory like relational algebra. I do not believe that was the reason. I think the reason was that (1) most of the original OODBs systems didn't support query optimization or transactions (2) they had difficulty externalizing their data in a way that could be evolved and used by other tools (3) when the did introduce query languages, they were subject to the same impedence mismatch as relational systems.

I think that Impedence mismatch is a language problem not a data problem. Relational data maps very well to traditional data structures in C, Pascal, or any other programming language: just create an array of records. Relational data maps fairly well to objects too, especially since you can represent relationships easily. The impedence mismatch comes from the need to partition a program into two parts: a query that is sent to the database, and a client program that uses the query results. Previously this partitioning was done by putting the query into a string, which causes all sorts of problems. Native Queries and LINQ are two more modern and effective ways to partition a program into a query and a client, so that the semantic connections between them are preserved. Prof. Subieta's proposal does not address this problem, as far as I can tell.

As for data models, I think that Entity-Relationship models, UML class diagrams, and Subieta's models are all essentially equivalent. They have the concept of records of attributes connected by relationships. The relational model also has ses of records, but the relationships are not explicit in the data model, but must be specified on each join operation. You can argue over fine points of inheritance and such things, but these are small points compared to the basic similarities of the models. It is not fair to compare any of these models to the network model, which as far as I can tell was a hack on top of the hierarchical data model. It is asuming that hierarchical data models have had a resurgence under the name XML; these are very useful for data transmission but are not a suitable foundation for a database.

As for query languages, I don't think that the stack-based query language has anything fundament to offer over OQL. It is like saying that an HP calculator with postfix notation has a more solid theoretical model than a standard calcular that uses infix. I also want to point out that the core of OQL is not really object-oriented, becuase it does not deal with methods. It is just a great query language for ER data models. The key point is "entities and relationships" and that is what OQL was designed for and is good at. I do not agree that OQL is inconsistent. Suad pointed out some difficulties with the Java binding, and perhaps there are some other small problems with the way the standard was defined. But rather than fix these small issues, he claimed that the entire system is inconsistent.
See here for an alternative and more balanced view. I think that Prof. Subieta's query syntax is perfectly reasonable as well. But it is not a fundamental advance, as far as I can tell.

NOTE: Native Queries are not propretary; they were described by one of my students and me in an ECOOP paper and then implemented by db4objects. They have been implemented by others as well, although not in any commercial systems. They are also similar to Microsoft's LINQ in some ways.

So, to summarize. I think that OMQ is again trying to solve the wrong problem. I sent in a response to the RFI; and yes, it wasn't what you wanted to hear. But I'm going to keep saying it.

The problem is not a lack of a grand unifying theory. There is plenty of theory to cover ER models, OQL, and other traditional ideas. The disucssions you are having don't deal with impedence mismatch, which can happen even with an object-oriented language accessing an object-oriented database using OQL! If you put OQL into a string, then you are going to have impedence, and nothing about the formality of the data model or query language is going to fix it. The real problems are impedence mismatch, good query optimization, solid transaction support, evolution of data, and scalability to multiple servers. These are things that OODB vendors didn't address until it was too late. They thought that objects alone would magically make everthing work well. But.. they don't.

I'm sorry to be so negative about this, but I really think that there is an opportunity to improve the DB/PL interface.

Wiliam Cook
Assistant Professor
Department of Computer Sciences
University of Texas at Austin

--------------
Hello Prof. Cook-

You wrote:

"My biggest issue is that I don't agree with the premise of the OMG RFI and Prof. Subieta's response. The premise is that the problem is "the underlying lack of a set of precise definitions and semantics that has plagued ODMSs for years" [mpcard]. The assumption here is that people didn't use object databases because OODBs didn't have a solid theory like relational algebra. I do not believe that was the reason. I think the reason was that (1) most of the original OODBs systems didn't support query optimization or transactions (2) they had difficulty externalizing their data in a way that could be evolved and used by other tools (3) when the did introduce query languages, they were subject to the same impedence mismatch as relational systems."

I don't think the RFI itself had a "premise," at least that I am aware of. Regarding your 3 reasons why ODBMSs were not widely adopted, I would argue that you could trace all 3 of these issues to the lack of a good underlying object model and set of definitions and semantics. I cannot see how you think the "impedance mismatch" or DB/PL interface issue will be solved without laying a good theoretical foundation.

"The problem is not a lack of a grand unifying theory. There is plenty of theory to cover ER models, OQL, and other traditional ideas. The disucssions you are having don't deal with impedence mismatch, which can happen even with an object-oriented language accessing an object-oriented database using OQL! If you put OQL into a string, then you are going to have impedence, and nothing about the formality of the data model or query language is going to fix it."

Sure, but no one has ever tried to tie object definition/store models all the way up to a QL, defined with an abstract query processor, like Prof. Subieta has (at least as far as I have read). It is true that the formality of the data model won't solve the "impedance mismatch" between a query string and a native PL, but again this falls into the area of further work we have to do. Everyone thinks they have the best way to do this: everyone in ODMG thought their APIs were best and their way was best, and that a formal set of definitions, semantics, and object models was unnecessary because in the end developers just need to write code. That's why ODMG chapter 2 was so weak and why there were so many "holes" in the ODMG specification: we were trying to write something that would cover several existing products without requiring anyone to make significant code changes. Users didn't care about the standard because it did not guarantee application code (or even data) portability, so what did it matter? There was no conformance test suite, so you couldn't even say for sure who was conformant to what.

"The real problems are impedence mismatch, good query optimization, solid transaction support, evolution of data, and scalability to multiple servers. These are things that OODB vendors didn't address until it was too late. They thought that objects alone would magically make everthing work well. But.. they don't."

Yes these are real problems but I would argue that solving them will require a common theoretical foundation from which to build. I guess we'll see if there is consensus on that view or not at next month's ODBTWG telecon.

-Mike Card
Syracuse Research Corporation (SRC)

Labels: ,

Monday, December 3, 2007

Tuesday, December 12, OMG Object Database Technology Users and Vendors Roundtable

For those of you interested in standards, here is an event you may want to consider attending, the:

Object Database Technology Users and Vendors Roundtable
on Tuesday, December 12, 2007 08:00-12:00 am, part of the
OMG Technical Meeting in Burlingame, CA

The Meeting is Sponsored by the OMG Middleware and Related Services Platform Task Force (MARS)

The Committee is composed of Michael Card Char Wales Anat Ghafni and Kazimierz Subieta.

I copy here the Objective of the meeting, as written by the above Commitee:

"Gather together vendors in and users of Object Database Technology in order to learn and share their opinions on the work performed so far by the OMG Object Database Technology Working Group (ODBTWG).

Working from the responses to the Request for Information (RFI) issued in February 2006, the WG has been investigating the research done by Prof. Kazimierz Subieta of the Polish Japanese Institute for Information Technology (PJIT) in Warsaw, Poland. Prof. Subieta’s team has developed an approach called “Stack-Based Architecture (SBA)” for defining the contents of an object database, the semantics of an abstract stack-based query processor, and its associated query language (SBQL). The WG considers this work to represent the object equivalent of the relational calculus in that it provides a precisely-defined, semantically complete set of definitions of what objects are, how they are stored, and how they can be queried.

Looking ahead, we would like to consider basing any future object database standard on the SBA object model so that the language bindings, query languages, etc. that follow are well-defined, self-consistent, and complete. Doing this would address many of the criticisms leveled at the earlier ODMG standards (e.g,, ODMG 3.0).

The objective of this meeting is not only to explain how we think the principles of the SBA could be incorporated into a future object database standard but also to listen to the opinions of object database vendors and users regarding this idea. To that end, nothing will be off limits. Let this be a forum for open discussions on what future object database standards should or should not look like, open-source collaborative projects such as reference implementations or conformance test suites, trends in the object database marketplace, level of user interest in object database technology, etc. "

And here is the agenda:

-8:00 – 8:15 Call to Order: Introductions and Agenda Char Wales

-8:15 – 8:30 Introduction to the Next Generation Object Database Standardization Effort Mike Card

-8:30 – 9:45 Keynote: “Object database semantics: the stack-based architecture” Prof . Kazimierz Subieta

- Break 9:45 – 10:00

-10:00 – 10:30 ODBMS Forum: Summary of Initial Reactions to White Paper Anat Ghafni

-10:30 – 11:30 Roundtable: Users and Vendors reactions, comments, discussions Mike Card – Facilitator

11:30 – 12:00 Moving Forward – Plan of Action Char Wales – Facilitator

If you are interested to attend, you would need to register here

Labels: ,

Tuesday, November 20, 2007

How the OMG technology process works

I was asked by a number of people how the OMG standardization process works.

I have found a link to a power point presentation which explains the essence of how the OMG technology process works
and it's the official OMG word rather than just my interpretation of it.

Here's the link to a Power Point presentation (as .pdf ) which does not require an OMG username/password to access: OMG Process

Char Wales explained me that the work they are doing in the Object DB technology WG fits into that structure.

Labels: ,