Skip to content

LINQ is the best option for a future Java query API

by Roberto V. Zicari on August 27, 2008

A conversation with Mike Card.

I have interviewed Mike Card on the latest development of the OMG working group which aims at defining a new standards for Object Database Systems.

Mike works with Syracuse Research Corporation (SRC) and is involved in object databases and their application to challenging problems, including pattern recognition. He chairs the ODBT group in OMG to advance object database standardization.

R. Zicari: Mike, you recently chaired an OMG ODBTWG meeting, on June 24, 2008 What kind of synergy do you see outside OMG in relation to your work?

Mike Card: We think it is likely that the OMG would need to participate in the Java Community Process (JCP) in order to write a Java Specification Request (JSR) to add LINQ functionality to Java.

R. Zicari: There has been a lot of discussion lately on the merit of SBQL vs. LINQ as a possible query API standard for object databases . Did you discuss this issue at the meeting?

M. Card: I began the technical part of our meeting by reviewing Professor Subieta’s comparison of SBQL and LINQ. It was my understanding from this comparison that LINQ was technically capable of performing any query that could be performed by SBQL, and I wanted to know if the participants saw this the same way. They agreed in general, and believed that even if LINQ were only able to do 90% of what SBQL could do in terms of data retrieval that it would still be the way to go.

R. Zicari: Could you please go a bit more in detail on this?

M. Card: Sure. At the meeting it was pointed out that Prof. Subieta had noted in his comparison that he had not shown queries using features that are not a part of LINQ, such as fixed-point arithmetic, numeric ranges, etc.

These are language features that would be familiar to users of Ada but which are not found in languages like C++, C#, and Java so they would likely not be missed and would be considered esoteric.

It was also pointed out that the query examples chosen by Prof. Subieta in his comparison were all “projections” (relational term meaning a query or operation that produces as its output table a subset of the input table, usually containing only some of the input table’s columns).

A query like this by definition will rely on iteration, and this will show the inherent expressive power of SBQL since the abstract machine contains a stack that can be used to do the iteration processing and thus avoid the loops, variables, etc. needed by SQL/LINQ.

R. Zicari: Did you agree on a common direction for your work in the group?

M. Card: The consensus at this meeting and at ICOODB conference in Berlin was that LINQ was the best option for a future Java query API since it already had broad support in the .Net community. We will have to choose a new name for the OMG-Java effort, however, as LINQ is trademarked by Microsoft.

It was also agreed that the query language need not include object update capability, as object updates were generally handled by object method invocations and not from within query expressions.

Now, since LINQ allows method invocations as part of navigation (e.g. “my_object.getBoss().getName()”) it is entirely possible that these method calls could have side effects that update the target objects, perhaps in such a way that the changes would not get saved to the database.

This was recognized as a problem, ideas kicked around for how to solve it included source code analysis tools.
This is something we will need a good answer for as it is a potential “open manhole cover” if we intend the LINQ API to be read-only and not capable of updating the database (especially unintentionally!)

R. Zicari: What else did you address at the meeting?

Mike Card: The discussion then moved on to a list of items included Carl Rosenberger’s ICOODB presentation.
Other items were also reviewed from an e-mail thread in the ODBMS.ORG forumthat included comments from both Prof. Subieta and Prof. William Cook.

The areas discussed were broken down into 3 groups:
i) those things there was consensus on for standardization,
ii) those things that needed more discussion/participation by a larger group, and
iii) those things that there was consensus on for exclusion from standardization.

R. Zicari: What are the areas you agree to standardize?

Mike Card: The areas we agree to standardize are:

1. object lifecycle (in memory): What happens at object creation/deletion, “attached” and “detached” objects, what happens during a database transaction (activation and de-activation), etc. It is desirable that we base our efforts in this area on what has already been done in existing standards for Java such as JDO, JPA, OMG, et. al. This interacts with the concurrency control mechanism for the database engine, may need to refer to Bernstein et. al. for serialization theory / CC algorithms.

2. object identification: A participant raised a concern here RE: re-use of OID where the OID is implemented as a physical pointer and memory is re-cycled resulting in re-use of an OID, which can corrupt some applications. He favored a standard requiring all OIDs to be unique and not re-used

3. session:: what are the definition and semantics of a session?
a. Concurrency control: again, we should refer to Bernstein et. al. for proven algorithms and mathematical definitions in lieu of ACID criteria (ACA: Avoidance of Cascading Aborts, ST: Strict, SR: Serializable, RC: Recoverable for characterizing transaction execution sequences)
b. Transactions: semantics/behavior and span/scope

4. Object model: what OM will we base our work upon?

5. Native language APIs: how will we define these? Will they be based on the Java APIs in ODMG 3.0, or will they be different? Will they be interfaces?

6. Conformance test suite: we will need one of these for each OO language we intend to define a standard for. The test suite, however, is not the definition of the standard; the definition must exist in the specification.

7. Error behavior: exception definitions etc.

R. Zicari: What are the areas where no agreement was (yet) found?

Mike Card: Areas we need to find agreement on are:

1. keys and indices: how do you sort objects? How do you define compound keys or spatial keys? Uniqueness constraints? Can this be handled by annotation, with the annotation being standardized but the implementation being vendor-specific? This interacts with the query mechanism, e.g. availability of an index could be checked for by the query optimizer.

2. referential integrity: do we want to enforce this? Avoidance of dangling pointers, this interacts with object lifecycle/GC considerations.

3. cascaded delete: when you delete an object, do you also delete all objects that it references? It was pointed out that this has issues for a client/server model ODBMS like Versant because it may have to “push” out to clients that objects on the server have been deleted, so you have a distributed cache consistency problem to solve.

4. replication/synchronization: how much should we standardize the ability to keep a synchronized copy of part or all of an object database? Should the replication mechanism be interoperable with relational databases? Part or all of this capability could be included in an optional portion of the standard.

a. Backup:
this is a specialized form of replication, how much should this be standardized? Is the answer to this
question dependent upon the kind of environment (DBA or DBA-less/embedded) that the ODBMS is operating in?

5. events/triggers: do we want to standardize certain kinds of activity (callbacks et. al.) when certain database operations occur?

6. update within query facility: this is a recognition of the limitations of LINQ, which does not support object update it is “read-only.” Generally, object updates and deletes are performed by method invocations in a program and not by query statements.
The question is, since LINQ allows method invocations as part of navigation, e.g. “my_employee_obj.getBoss().getName(),” is it possible in cases like this that such method calls could have side effects which update the object(s) in the navigation statement? If so, what should be done?

7. extents: do we expose APIs for extents to the user?

8. support for C++: how will we support C++/legacy languages for which a LINQ-like facility is not available? We could investigate string-based QL like OQL and/or we could use a facility similar to Cook/db4o “native queries”

R. Zicari: And what are the areas you definitely do not want to standardize?

Mike Card: Areas we do not want to standardize are:

1. garbage collection: issue here is behavioral differences between “embedded” (linked-in) OODBMS vs. client/server OODBMS

2. stored procedures/functions/views: these are relational/SQL concepts that are not necessarily applicable to object-oriented programming languages which are the purview of object databases.

R. Zicari: How will you ensure that the vendor community will support this proposal?

Mike Card: We plan on discussing this list and verify that others not present agree with the grouping of these items. We should also figure out what we want to do with the items in the “middle” group and then begin prioritizing these things. It appears likely that a next-generation ODBMS standard will follow a “dual-track” model in that the query mechanism (at least for Java) will be developed as a JSR within the JCP, while all of the other items will be developed within the OMG process.

For C# (assuming C# is a language we will want an ODBMS standard for, and I think it is), the query API will be built into the language via LINQ and we will need to address all of the “other” issues within our OMG effort just as with Java. In the case of C# and Java, most of these issues can probably be dealt with in the same manner.

How much interest there is in a C++ standardization effort is unclear, this is an area we will need to discuss further.
A LINQ-like facility for C++ is not an option since unlike C# and Java there is no central maintenance point for C++ compilers.

There is an ISO WG that maintains the C++ standard, but C++ “culture” accepts non-conformant compilers so there are many C++ compilers out there that only conform to part of the ISO standard.

The developers present who work with C++ mentioned that their C++ code base must be “tweaked” to work with various compilers as a given set of C++ code might compile fine with 7 compilers but fail with the compiler from vendor number 8.
In general, the maintenance of C++ is more difficult than for Java and C# due to inconsistency in compiler implementation and this complicates anything we want to do with something as complex as object persistence.
##

Some Useful Resources:
- Panel Discussion “ODBMS: Quo Vadis?

- Java Object Persistence: State of the Union PART II

- Java Object Persistence: State of the Union PART I

From → uncategorized

72 Comments Leave one →
  1. Nina-

    I don’t know about your English, but what I meant by a “bridge” was that the abstract store model described by Prof. Subieta is not limited to traditional persisted object stores. SBA is a higher level abstraction that allows any kind of storage mechanism to be used for objects so long as the abstract store criteria are met. This includes relational tables, so that SBQL could be used to qouery objects stored in tables just like an ORM. It also can be used to access persisted objects (in Java, POJOs).

    Prof. Subieta’s work is the first I have seen that addresses database semantics for objects with an abstract machine model that goes all the way from storage to query. He has demonstrated it to us where a simple text or XML file was used as an M0 store that could be fully queried in SBQL.

    Because SBQL can access any object regardless of it’s storage mechanism, it could indeed be used for object queries of both object and relational databases if one so desired.

    -Mike

  2. By the way, if any of you would like to be added to the OMG’s mailing list for this standard work, please email me at mcard@syrres.com and I will get you added.

    Also, please feel free to join us at the next OMG Technical Meeting, which will be held in San Jose CA (see omg.org home page, follow links). We may set up a telecom link so interested parties can attend by voice using Skype et. al.

    -Mike

  3. And Jim-

    You are right, properly speaking LINQ is not an API it is a language extension so I should probably have titled this thread LINQ as a Java query *mechanism* rather than a Java query API.

    -Mike

  4. Thanks, Mike, for your very clear and helpful discussion of the issues at the ODTWG. I think you picked exactly the right goal: extending Java with a standard query interface that does not require application programmers to put queries into strings. This interface should not be tied to any particular back-end query language or database standard. Each vendor can implement their own internal query format, and we have no need for any “war” at all. I am working on ways to improve LINQ, and having an interface for bulk updates is one such topic. Are you interested in trying to push for this kind of innovation?

    I think that this would meet Nina’s goals too, since she says that LINQ+SBQL is her desired solution. A JavaLINQ can be designed to enable multiple back ends, just as C# LINQ does.

  5. Prof. Cook-

    Can you attend the December ODBTWG meeting in Santa Clara? (I mistakenly typed San Jose above). That would be an excellent forum for you to present this work. I have not read anything on update capability in LINQ. I think it would be appealing if it did not break the existing read syntax for LINQ.

    -Mike

  6. Tegiri Nenashi said…
    http://vadimtropashko.wordpress.com/object-relational-impedance-mismatch/

    IMO, the author presents totally wrong and misleading perception of the impedance mismatch problem. It is based on some mathematical divagations, fully irrelevant in this context. If you are interested in true explanation, see http://www.sbql.pl/Topics/ImpedanceMismatch.html

  7. Dear Tegiri,
    Do you think a funtional/relational language be cleanly embedded or invoked from a procedural language? I think so. More and more people are realizing the programming is best done by a combination of specialized languages. Different languages are good for UIs, security models, data models, queries, makefiles, grammars, analytics, workflow, etc. Queries can be defined and optimized in a monoid setting, then converted to more rigid sequences for procedural processing. The goal is to make a clean embedding, not one based on command strings. I think we are finally getting closer to this goal.

    I think that Prof. Subieta’s page gives a much more realistic and useful discussion of impedance mismatch. I disagree with some of his conclusions, for example in the section “Impedance mismatch and native queries”, but his description of the problem is insightful.

    My reading is that his conclusion, like yours, is that we should have one integrated language for everything. This is a nice goal, but just as PL people don’t tend to appreciate DB issues, DB people don’t tend to appreciate all the different requirements on PLs. I think that PL and DB work should strongly connected, but not require one global language.

  8. Mike, Thanks for the invite. I have been to a previous OMG meeting. Currently we are working on the problem of prefetching, or structuring query results, that can apply to LINQ. We haven’t finished an update model yet, but we are thinking about it. I’ll contact you off line.

  9. To: Mike- all
    in fact, it would be quite good if this discussion could be useful for your work within OMG.

    This would then mean that the community behind ODBMS.ORG can work together.

  10. I published two more papers on ODBMS.ORG that are relevant to this discussion:

    -Michael Blaha, Bill Huth and Peter Cheung, “Object-Oriented Design of Database Stored Procedures”
    Link: http://www.odbms.org/experts.html#article20

    and

    -Miguel Garcia and Rakesh Prithiviraj,
    “Rethinking the Architecture of OR Mapping for EMF in terms of LINQ”
    Link: http://www.odbms.org/downloads.html#oop_ap

  11. Kazimierz’s website is kind of remarkable, full with “executable UML” and “data model independence” nonsense. Quote of the day:

    “However, the theses that SQL is a syntactic variant of the relational algebra (or the mathematical logic) are worthless. Approximately, the relational algebra covers not more than 5% of the functionality of SQL. The rest is not founded on any theories. “

  12. Tegiri Nenashi said…
    Kazimierz’s website is kind of remarkable, full with “executable UML” and “data model independence” nonsense.

    “executable UML”: see Wikipedia:
    http://en.wikipedia.org/wiki/Executable_UML
    Google reports 92 900 pages that contain “executable UML”.
    In the European project VIDE (together with partners such as SAP, Fraunhofer Institute, Softeam) we have implemented executable UML together with another OMG standard known as OCL.

    I dont’t want to comment other Tegiri Nenashi aggressive statements. I am very sorry that he/she is disappointed by some of my theses. I see no nonsense within them, they are based on more than 30 years of experience in databases and software engineeering.

  13. For those who do not know Japanese, Tegiri Nenashi is a joke name. So is Mikito Harakiri.

  14. Tegiri Nenashi said…
    Kazimierz’s website is kind of remarkable, full with “executable UML” and “data model independence” nonsense. Quote of the day:

    “However, the theses that SQL is a syntactic variant of the relational algebra (or the mathematical logic) are worthless. Approximately, the relational algebra covers not more than 5% of the functionality of SQL. The rest is not founded on any theories. ”

    5% concerns the SQL-89 standard, if you take all syntactic constructs of SQL and try to realize which of them can be covered by the relational algebra. In case of SQL-92 this is probably much less, because SQL-92 introduces a lot of fatures that are close to programming languages, obiously not covered by the relational algebra. In case of SQL-99 this is 0%, because SQL-99 is a full programming languages and data structures that it addresses are no more flat tables and contain a lot of options fully incompatible with the relational algebra.

  15. To Tegiri Nenashi :

    Out of courstesy to others it would be appropriate if you could

    i) identify yourselves (give us a little background of who you are)

    ii) keep the discussion to a level of courtesy, even if you may not agree on some technical points.

    There is no point of being unecessary rude.
    We are all trying to help finding a good solution…

  16. My apologies for inappropriate tone of the message. This kind of arrogance is typical for a relational zealot (who unfortunately I am:-), especially in discussion about “impedance mismatch”. Therefore, the right action is just not to be here.

    Few farewell comments. Nina mentioned that object query language optimization is nonexistent, and let me defend this position. First, there is strong algebraic foundation for any kind of optimization. In procedural programming, when optimizer moves a statement outside of the loop, it essentially rewrites an expression in Kleene algebra. When a subquery is unnested in SQL it is also an algebraic transformation. Likewise, System R style evaluation of the cost of different join orders leverages join associativity of the relational algebra. Take a look at http://en.wikipedia.org/wiki/Relational_algebra#Use_of_algebraic__properties_for_query_optimization

    Why object query language optimization is a myth? Because the foundation algebra is too complex. Sure some can write a PhD thesis finding few query transformations here and there, but the whole system would fall short of simplicity and clarity of System R method (which each and every database vendor copied ever since). Coming across a couple of such theses in the past, I would suggest that nobody except the author understands them, and this is why we don’t see any implementations.

    The same applies to SQL, which had grown to monstrous proportions. However, nobody really cares about all this junk (my apologies again) that accumulated there in the past decades. Most people rarely step beyond basic select-project-join query — and this one has firm foundation.

  17. No, this kind of tone is characteristic of an uncompetent troll. I didnt say that object query optimisation is nonexistent. You have no idea what we are talking about, sorry.

  18. nina said…
    No, this kind of tone is characteristic of an uncompetent troll. I didnt say that object query optimisation is nonexistent. You have no idea what we are talking about, sorry.

    I think we shold stop this tone of polemics. We all are incompetent concerning a lot of matters. Let our discussion partners to learn a bit within this discussion.

  19. Tegiri Nenashi said…
    … Why object query language optimization is a myth? Because the foundation algebra is too complex….

    I disagree that object query language optimization is a myth. In the SBA/SBQL research we have developed and implemented several optimization methods that are quite powerful:

    a) factoring independent subqueries out of loops implied by non-algebraic operators. See http://www.sbql.pl/phds/PhD%20Jacek%20Plodzien.pdf.
    This method is known from SQL in a less general variant. For instance, in the query:

    select * from Employee where salary > select avg(salary) from Employee

    the subquery

    select avg(salary) from Employee

    can be calculated in advance, to avoid recalculation it within each loop of the where operator. The method that is used in the mentioned PhD cannot be expressed in any algebra, it is based on analysis of scoping and binding names.

    2) Exploiting the distributivity property of query operators. In SQL this method is known as pushing selections before joins. For example, the query

    select * from Employee, Department
    where Employee.D# = Department.D# and Department.dname = "Toys"

    can be rewritten to:

    select * from Employee, (Department where Department.dname = "Toys")
    where Employee.D# = Department.D#

    We much generalized it for OODB, but again, not on the basis of some algebra, but on analysis of scoping and binding rules.

    3) Removing dead subqueries. They mostly appear by processing of views through the query modification technique. Usually a view delivers more than it is required in a particular query, hence unnecessary part can be cut off. This method is also known from SQL, but we much generalized it for object databases. The algorithm is rather complex. So far it is published only in my book (in Polish) http://www.sbql.pl/various/SBA_SBQL_book/Theory%20and%20Construction%20of%20OOQLs.html

    d) Optimization by indices. We can optimize queries by indices organized according to different techniques. This is the subject of a PhD that will be completed soon. Transparent indices are fully implemented in ODRA and work in a way similar to SQL.

    e) Optimization by query caching. This is the subject of another PhD, the result will be probably ready in a year.

    e) Optimization by pipelining. The method is known from SQL, but we have generalized it for OO databases. It is the subject of another PhD. The method is developed mostly in the context of distributed databases.

    f) Methods based on tuning of physical database structures and buffering. The most known method from this group is pointer swizzling. It is implemented in Objectivity/DB. We implemented it as so-called memory-mapping files.

    g) One more PhD concerns the method of optimization in distributed object-oriented databases that is known from relational databases as a method based on semi-joins. We generalized it to a method based on so-called coloured query syntax trees, where "colours" denote different distributed servers.

    There are more methods, in particular, based on chosing an optimal query execution plan. I have at least two more great ideas concerning query optimization in OODB and looking for talented people who want to investigate them.

    I agree with Tegiri Nenashi that algebraic optimization methods in OODB are inefficient thus I am not following such ideas. SBA and SBQL have established an own theoretical school that is self-contained – it does not require object algebras, object calculi, monoid comprehensions calculus, F-logic and other mathematical concepts that people invented so far to cope with object-oriented queries.

    Sorry for this long post, I hope it helps…

  20. For an API approach to LINQ for Java consider using Querydsl : http://source.mysema.com/display/querydsl/Querydsl

  21. SBQL4J (http://code.google.com/p/sbql4j/) is extension of Java language similar to LINQ. It allows to query Java objects.
    But it advantages LINQ in many aspects:
    1. It’s type safe in compile time, even more than LINQ, because result is proper Java type instead of anonymous ‘var’ type which is returned by LINQ queries.
    2. Queried objects can be ANY Java type, instead of IEnumberable like in LINQ.
    3. SBQL4J has full expression power of SBQL language, many SBQL4J queries cannot be expressed in LINQ (see executable examples on project page)
    4. It is expressed by clear, precise semantics without needless, obscure syntactic sugar.
    5. According to Wikipedia “Some benchmark on simple use cases tend to show that LINQ to Objects performance has a large overhead compared to normal operation”. This problem doesn’t apply to SBQL4J, because it’s queries are finally translated to pure, fast Java code without any reflection usage.
    6. SBQL4J semantics is well-defined, so allows to use many unique query optimization techniques (mentioned by Prof. Subieta), which gives better results than in any other query language.
    7. SBQL is not bound to any data model, it deals in data structures in more abstract way, so it works perfectly both with simple object data model in Java and more sophisticated model implemented in ODRA system.

    I would like to encourage You to introduce with SBQL4J and rethink promoting LINQ as standard Java API to object databases.

    Emil

Leave a Reply

Note: HTML is allowed. Your email address will not be published.

Subscribe to this comment feed via RSS

Spam protection by WP Captcha-Free