Comments on: LINQ is the best option for a future Java query API

By: Emil Wcislo

Emil Wcislo — Fri, 02 Apr 2010 08:48:41 +0000

SBQL4J (http://code.google.com/p/sbql4j/) is extension of Java language similar to LINQ. It allows to query Java objects.
But it advantages LINQ in many aspects:
1. It’s type safe in compile time, even more than LINQ, because result is proper Java type instead of anonymous ‘var’ type which is returned by LINQ queries.
2. Queried objects can be ANY Java type, instead of IEnumberable like in LINQ.
3. SBQL4J has full expression power of SBQL language, many SBQL4J queries cannot be expressed in LINQ (see executable examples on project page)
4. It is expressed by clear, precise semantics without needless, obscure syntactic sugar.
5. According to Wikipedia “Some benchmark on simple use cases tend to show that LINQ to Objects performance has a large overhead compared to normal operation”. This problem doesn’t apply to SBQL4J, because it’s queries are finally translated to pure, fast Java code without any reflection usage.
6. SBQL4J semantics is well-defined, so allows to use many unique query optimization techniques (mentioned by Prof. Subieta), which gives better results than in any other query language.
7. SBQL is not bound to any data model, it deals in data structures in more abstract way, so it works perfectly both with simple object data model in Java and more sophisticated model implemented in ODRA system.

I would like to encourage You to introduce with SBQL4J and rethink promoting LINQ as standard Java API to object databases.

Emil

By: Timo Westkämper

Timo Westkämper — Mon, 11 Jan 2010 11:09:18 +0000

For an API approach to LINQ for Java consider using Querydsl : http://source.mysema.com/display/querydsl/Querydsl

By: Kazimierz

Kazimierz — Thu, 30 Oct 2008 12:21:33 +0000

Tegiri Nenashi said...
... Why object query language optimization is a myth? Because the foundation algebra is too complex....
I disagree that object query language optimization is a myth. In the SBA/SBQL research we have developed and implemented several optimization methods that are quite powerful:

a) factoring independent subqueries out of loops implied by non-algebraic operators. See http://www.sbql.pl/phds/PhD%20Jacek%20Plodzien.pdf.
This method is known from SQL in a less general variant. For instance, in the query:

select * from Employee where salary > select avg(salary) from Employee

the subquery

select avg(salary) from Employee

can be calculated in advance, to avoid recalculation it within each loop of the where operator. The method that is used in the mentioned PhD cannot be expressed in any algebra, it is based on analysis of scoping and binding names.

2) Exploiting the distributivity property of query operators. In SQL this method is known as pushing selections before joins. For example, the query

select * from Employee, Department
where Employee.D# = Department.D# and Department.dname = "Toys"

can be rewritten to:

select * from Employee, (Department where Department.dname = "Toys")
where Employee.D# = Department.D#

We much generalized it for OODB, but again, not on the basis of some algebra, but on analysis of scoping and binding rules.

3) Removing dead subqueries. They mostly appear by processing of views through the query modification technique. Usually a view delivers more than it is required in a particular query, hence unnecessary part can be cut off. This method is also known from SQL, but we much generalized it for object databases. The algorithm is rather complex. So far it is published only in my book (in Polish) http://www.sbql.pl/various/SBA_SBQL_book/Theory%20and%20Construction%20of%20OOQLs.html

d) Optimization by indices. We can optimize queries by indices organized according to different techniques. This is the subject of a PhD that will be completed soon. Transparent indices are fully implemented in ODRA and work in a way similar to SQL.

e) Optimization by query caching. This is the subject of another PhD, the result will be probably ready in a year.

e) Optimization by pipelining. The method is known from SQL, but we have generalized it for OO databases. It is the subject of another PhD. The method is developed mostly in the context of distributed databases.

f) Methods based on tuning of physical database structures and buffering. The most known method from this group is pointer swizzling. It is implemented in Objectivity/DB. We implemented it as so-called memory-mapping files.

g) One more PhD concerns the method of optimization in distributed object-oriented databases that is known from relational databases as a method based on semi-joins. We generalized it to a method based on so-called coloured query syntax trees, where "colours" denote different distributed servers.

There are more methods, in particular, based on chosing an optimal query execution plan. I have at least two more great ideas concerning query optimization in OODB and looking for talented people who want to investigate them.

I agree with Tegiri Nenashi that algebraic optimization methods in OODB are inefficient thus I am not following such ideas. SBA and SBQL have established an own theoretical school that is self-contained - it does not require object algebras, object calculi, monoid comprehensions calculus, F-logic and other mathematical concepts that people invented so far to cope with object-oriented queries.

Sorry for this long post, I hope it helps…

By: Kazimierz

Kazimierz — Thu, 30 Oct 2008 10:41:03 +0000

nina said...
No, this kind of tone is characteristic of an uncompetent troll. I didnt say that object query optimisation is nonexistent. You have no idea what we are talking about, sorry.

I think we shold stop this tone of polemics. We all are incompetent concerning a lot of matters. Let our discussion partners to learn a bit within this discussion.

By: nina

nina — Thu, 30 Oct 2008 08:32:13 +0000

No, this kind of tone is characteristic of an uncompetent troll. I didnt say that object query optimisation is nonexistent. You have no idea what we are talking about, sorry.

By: Tegiri Nenashi

Tegiri Nenashi — Thu, 30 Oct 2008 08:15:46 +0000

My apologies for inappropriate tone of the message. This kind of arrogance is typical for a relational zealot (who unfortunately I am:-), especially in discussion about “impedance mismatch”. Therefore, the right action is just not to be here.

Few farewell comments. Nina mentioned that object query language optimization is nonexistent, and let me defend this position. First, there is strong algebraic foundation for any kind of optimization. In procedural programming, when optimizer moves a statement outside of the loop, it essentially rewrites an expression in Kleene algebra. When a subquery is unnested in SQL it is also an algebraic transformation. Likewise, System R style evaluation of the cost of different join orders leverages join associativity of the relational algebra. Take a look at http://en.wikipedia.org/wiki/Relational_algebra#Use_of_algebraic__properties_for_query_optimization

Why object query language optimization is a myth? Because the foundation algebra is too complex. Sure some can write a PhD thesis finding few query transformations here and there, but the whole system would fall short of simplicity and clarity of System R method (which each and every database vendor copied ever since). Coming across a couple of such theses in the past, I would suggest that nobody except the author understands them, and this is why we don’t see any implementations.

The same applies to SQL, which had grown to monstrous proportions. However, nobody really cares about all this junk (my apologies again) that accumulated there in the past decades. Most people rarely step beyond basic select-project-join query — and this one has firm foundation.

By: Roberto V. Zicari

Roberto V. Zicari — Thu, 30 Oct 2008 00:34:40 +0000

To Tegiri Nenashi :

Out of courstesy to others it would be appropriate if you could

i) identify yourselves (give us a little background of who you are)

ii) keep the discussion to a level of courtesy, even if you may not agree on some technical points.

There is no point of being unecessary rude.
We are all trying to help finding a good solution...

By: Kazimierz

Kazimierz — Wed, 29 Oct 2008 21:36:21 +0000

Tegiri Nenashi said...
Kazimierz's website is kind of remarkable, full with "executable UML" and "data model independence" nonsense. Quote of the day:

"However, the theses that SQL is a syntactic variant of the relational algebra (or the mathematical logic) are worthless. Approximately, the relational algebra covers not more than 5% of the functionality of SQL. The rest is not founded on any theories. "

5% concerns the SQL-89 standard, if you take all syntactic constructs of SQL and try to realize which of them can be covered by the relational algebra. In case of SQL-92 this is probably much less, because SQL-92 introduces a lot of fatures that are close to programming languages, obiously not covered by the relational algebra. In case of SQL-99 this is 0%, because SQL-99 is a full programming languages and data structures that it addresses are no more flat tables and contain a lot of options fully incompatible with the relational algebra.

By: Kazimierz

Kazimierz — Wed, 29 Oct 2008 20:21:46 +0000

For those who do not know Japanese, Tegiri Nenashi is a joke name. So is Mikito Harakiri.

By: Kazimierz

Kazimierz — Wed, 29 Oct 2008 20:13:43 +0000

Tegiri Nenashi said…
Kazimierz’s website is kind of remarkable, full with “executable UML” and “data model independence” nonsense.
“executable UML”: see Wikipedia:
http://en.wikipedia.org/wiki/Executable_UML
Google reports 92 900 pages that contain “executable UML”.
In the European project VIDE (together with partners such as SAP, Fraunhofer Institute, Softeam) we have implemented executable UML together with another OMG standard known as OCL.

I dont’t want to comment other Tegiri Nenashi aggressive statements. I am very sorry that he/she is disappointed by some of my theses. I see no nonsense within them, they are based on more than 30 years of experience in databases and software engineeering.