standards – ODBMS Industry Watch

On Responsible AI. Interview with Kay Firth-Butterfield,World Economic Forum.

Roberto V. Zicari — Mon, 20 Sep 2021 09:19:12 +0000

“I think that many companies need to understand that their customers are worried about the use of AI and then act accordingly. I believe they should set up ethics advisory boards and then follow the advice or internal teams to advise on what they should do and take that advise.”
–Kay Firth-Butterfield

I have interviewed Kay Firth-Butterfield, Head of Artificial Intelligence and member of the Executive Committee at the World Economic Forum. We talked about Artificial Intelligence (AI) and in particular, we discussed responsible AI, trustworthy AI and AI ethics.

RVZ

Q1. You are the Head of Artificial Intelligence and a member of the Executive Committee at the World Economic Forum. What is your mission at the World Economic Forum?

Kay Firth-Butterfield: We are committed to improving the state of the world.

Q2. Could you summarize for us what are in your opinion the key aspects of the beneficial and challenging technical, economic and social changes arising from the use of AI?

Kay Firth-Butterfield: The potential benefits of AI being used across government, business and society are huge. For example using AI to help find ways of educating the uneducated, giving healthcare to those without it and helping to find solutions to climate change. Both embodied in robots and in our computers it can help keep the elderly in their homes and create adaptive energy plans for air conditioning so that we use less energy and help keep people safe. Apparently some 8800 people died of heat in US last year but only around 450 from hurricanes. Also, it helps with cyber security and corruption. On the other side, we only need to look at the fact that over 190 organisations have created AI principles and the EU is aiming to regulate use of AI and the OHCHR has called for a ban on AI which affects human rights to know that there are serious problems with the way we use the tech, even when we are careful.

Q3. The idea of responsible AI is now mainstream. But why when it comes to operationalizing this in the business, companies are lagging behind?

Kay Firth-Butterfield: I think they are worried about what regulations will come and the R&D which they might lose from entering the market too soon. Also, many companies don’t know enough about the reasons why they need AI. CEOs are not envisaging the future of the company with AI which, if available is often left to a CTO. It is still hard to buy the right AI for you and know whether it is going to work in the way it is intended or leave an organisation with an adverse impact on its brand. Boards often don’t have technologists and so they can help the CEO think through the use of AI for good or ill. Finally, its is hard to find people with the right skills. I think this may be helped by remote working when people don’t have to locate to a country which is reluctant to issue visas.

Q4. What is trustworthy AI?

Kay Firth-Butterfield: The design, development and use of AI tools which do more good for society than they do harm.

Q5. The Forum has developed a board tool kit to help board member on how to operationalize AI ethics. What is it? Do you have any feedback on how useful is it in practice?

Kay Firth-Butterfield: It provides Boards with information which allows they to understand how their role changes when their company uses AI and therefore gives them the tools to develop their governance and other roles to advise on this complex topic. Many Boards have indicated that they have found it useful and it has been downloaded more than 50,000 times.

Q6. Let´s talk about standards for AI. Does it really make sense to standardize an AI system? What is your take on this?

Kay Firth-Butterfield: I have been working with the IEEE on standards for AI since 2015, I am still the Vice-Chair. I think that we need to use all types of governance for AI from norms to regulation depending on risk. Standards provide us with an excellent tool in this regard.

Q7. There are some initiatives for Certification of AI. Who has the authority to define what a certification of AI is about?

Kay Firth-Butterfield: At the moment there are many who are thinking about certification. There is not regulation and no way of being certified to certify! This needs to be done or there will be a proliferation and no-one will be able to understand which is good and which is bad. Governments have a role here, for example Singapore’s work on certifying people to use their Model AI Governance Framework.

Q8. What kind of incentives are necessary in your opinion for helping companies to follow responsible AI practices?

Kay Firth-Butterfield: I think that many companies need to understand that their customers are worried about the use of AI and then act accordingly. I believe they should set up ethics advisory boards and then follow the advice or internal teams to advise on what they should do and take that advise. In our Responsible Use of Technology work we have considered this in detail.

Q9. Do you think that soft government mechanisms would be sufficient to regulate the use of AI or would it be better to have hard government mechanisms?

Kay Firth-Butterfield: both

Q10. Assuming all goes well, what do you think a world with advanced AI would look like?

Kay Firth-Butterfield: I think we have to decide what trade offs of privacy we want to allow for humans to develop harnessing AI. I believe that it should be up to each of us but sadly one person deciding to use surveillance via a doorbell surveills many. I believe that we will work with robots and AI so that we can do our jobs better. Our work on positive futures with AI is designed to help us better answer this question. Report out next month! Meanwhile here is an agenda.

…………………………………………………………

Kay Firth-Butterfield is a lawyer, professor, and author specializing in the intersection of business, policy, artificial intelligence, international relations, and AI ethics.

Since 2017, she has been the Head of Artificial Intelligence and a member of the Executive Committee at the World Economic Forum and is one of the foremost experts in the world on the governance of AI. She is a barrister, former judge and professor, technologist and entrepreneur and vice-Chair of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems. She was part of the group which met at Asilomar to create the Asilomar AI Ethical Principles, is a member of the Polaris Council for the Government Accountability Office (USA), the Advisory Board for UNESCO International Research Centre on AI and AI4All.

She regularly speaks to international audiences addressing many aspects of the beneficial and challenging technical, economic and social changes arising from the use of AI.

Resources

Related Posts

On Digital Transformation and Ethics. Interview with Eberhard Schnebel. ODBMS Industry Watch. November 23, 2020

On the new Tortoise Global AI Index. Interview with Alexandra Mousavizadeh. ODBMS Industry Watch, April 7, 2021

Follow us on Twitter: @odbmsorg

On big data analytics. Interview with Ajay Anand

Roberto V. Zicari — Wed, 16 Sep 2015 18:08:27 +0000

“Traditional OLAP tools run into problems when trying to deal with massive data sets and high cardinality.”–Ajay Anand

I have interviewed Ajay Anand, VP Product Management and Marketing, Kyvos Insights. Main topic of the interview is big data analytics.

RVZ

Q1. In your opinion, what are the current main challenges in obtaining relevant insights from corporate data, both structured and unstructured, regardless of size and granularity?

Ajay Anand: We focus on making big data accessible to the business user, so he/she can explore it and decide what’s relevant. One of the big inhibitors to the adoption of Hadoop is that it is a complex environment and daunting for a business user to work with. Our customers are looking for self-service analytics on data, regardless of the size or granularity. A business user should be able to explore the data without having to write code, look at different aspects of the data, and follow a train of thought to answer a business question, with instant, interactive response times.

Q2. What is your opinion about using SQL on Hadoop?

Ajay Anand: SQL is not the most efficient or intuitive way to explore your data on Hadoop. While Hive, Impala and others have made SQL queries more efficient, it can still take tens of minutes to get a response when you are combining multiple data sets and dealing with billions of rows.

Q3. Kyvos Insights emerged a couple of months ago from Stealth mode. What is your mission?

Ajay Anand: Our mission is to make big data analytics simple, interactive, enjoyable, massively scalable and affordable. It should not be just the domain of the data scientist. A business user should be able to tap into the wealth of information and use it to make better business decisions or wait for reports to be generated.

Q4. There are many diverse tools for big data analytics available today. How do you position your new company in the already quite full market for big data analytics?

Ajay Anand: While there are a number of big data analytics solutions available in the market, most customers we have talked to still had significant pain points. For example, a number of them are Tableau and Excel users. But when they try to connect these tools to large data sets on Hadoop, there is a significant performance impact. We eliminate that performance bottleneck, so that users can continue to use their visualization tool of choice, but now with response time in seconds.

Q5. You offer “cubes on Hadoop.” Could you please explain what are such cubes and what are the useful for?

Ajay Anand: OLAP cubes are not a new concept. In most enterprises, OLAP tools are the preferred way to do fast, interactive analytics.
However, traditional OLAP tools run into problems when trying to deal with massive data sets and high cardinality.
That is where Kyvos comes in. With our “cubes on Hadoop” technology, we can build linearly scalable, multi-dimensional OLAP cubes and store them in a distributed manner on multiple servers in the Hadoop cluster. We have built cubes with hundreds of billions of rows, including dimensions with over 300 million cardinality. Think of a cube where you can include every person in the U.S., and drill down to the granularity of an individual. Once the cube is built, now you can query it with instant response time, either from our front end or from traditional tools such as Excel, Tableau and others.

Q6. How do you convert raw data into insights?

Ajay Anand: We can deal with all kinds of data that has been loaded on Hadoop. Users can browse this data, look at different data sets, combine them and process them with a simple drag and drop interface, with no coding required. They can specify the dimensions and measures they are interested in exploring, and we create Hadoop jobs to process the data and build cubes. Now they can interactively explore the data and get the business insights they are looking for.

Q7. A good analytical process can result in poor results if the data is bad. How do you ensure the quality of data?

Ajay Anand: We provide a simple interface to view your data on Hadoop, decide the rules for dropping bad data, set filters to process the data, combine it with lookup tables and do ETL processing to ensure that the data fits within your parameters of quality. All of this is done without having to write code or SQL queries on Hadoop.

Q8. How do you ensure that the insights you obtained with your tool are relevant?

Ajay Anand: The relevance of the insights really depends on your use case. Hadoop is a flexible and cost-effective environment, so you are not bound by the constraints of an expensive data warehouse where any change is strictly controlled. Here you have the flexibility to change your view, bring in different dimensions and measures and build cubes as you see fit to get the insights you need.

Q9. Why do technical and/or business users want to develop multi-dimensional data models from big data, work with those models interactively in Hadoop, and use slice-and-dice methods? Could you give us some concrete examples?

Ajay Anand: An example of a customer that is using us in production to get insights on customer behavior for marketing campaigns is a media and entertainment company addressing the Latino market. Before using big data, they used to rely on surveys and customer diaries to track viewing behavior. Now they can analyze empirical viewing data from more than 20 million customers, combine it with demographic information, transactional information, geographic information and many other dimensions. Once all of this data has been built into the cube, they can look at different aspects of their customer base with instant response times, and their advertisers can use this to focus marketing campaigns in a much more efficient and targeted manner, and measure the ROI.

Q10. Could you share with us some performance numbers for Kyvos Insights?

Ajay Anand: We are constantly testing our product with increasing data volumes (over 50 TB in one use case) and high cardinality. One telecommunications customer is testing with subscriber information that is expected to grow to several trillion rows of data. We are also testing with industry standard benchmarks such as TPC-DS and the Star Schema Benchmark. We find that we are getting response times of under two seconds for queries where Impala and Hive take multiple minutes.

Q11. Anything else you wish to add?

Ajay Anand: As big data adoption enters the mainstream, we are finding that customers are demanding that analytics in this environment be simple, responsive and interactive. It must be usable by a business person who is looking for insights to aid his/her decisions without having to wait for hours for a report to run, or be dependent on an expert who can write map-reduce jobs or Hive queries. We are moving to a truly democratized environment for big data analytics, and that’s where we have focused our efforts with Kyvos.

———-
Ajay Anand is vice president of products and marketing at Kyvos Insights, delivering multi-dimensional OLAP solutions that run natively on Hadoop. Ajay has more than 20 years of experience in marketing, product management and development in the areas of big data analytics, storage and high availability clustered systems.

Prior to Kyvos Insights, he was founder and vice president of products at Datameer, delivering the first commercial analytics product on Hadoop. Before that he was director of product management at Yahoo, driving adoption of the Hadoop based data analytics infrastructure across all Yahoo properties. Previously, Ajay was director of product management and marketing for SGI’s Storage Division. Ajay has also held a number of marketing and product management roles at Sun, managing teams and products in the areas of high availability clustered systems, systems management and middleware.

Ajay earned an M.B.A. and an M.S. in computer engineering from the University of Texas at Austin, and a BSEE from the Indian Institute of Technology.

Resources

–Announcing the public review of the TPCx-V benchmark. BY Reza Taheri, Principal Engineer at VMware.ODBMs.org

–Thirst for Advanced Analytics Driving Increased Need for Collective Intelligence By John K. Thompson – General Manager, Advanced Analytics, Dell Software. ODBMS.org

–Evolving Analytics by Carlos Andre Reis Pinheiro, Data Scientist, Teradata. ODBMS.org

–From Classical Analytics to Big Data Analytics by Peter Weidl, IT-Architect, Zürcher Kantonalbank. ODBMS.org

Follow ODBMS.org on Twitter: @odbmsorg

On Versant`s technology. Interview with Vishal Bagga.

Roberto V. Zicari — Wed, 17 Aug 2011 13:53:03 +0000

“We believe that data only becomes useful once it becomes structured.” — Vishal Bagga

There is a lot of discussion on NoSQL databases nowdays. But what about object databases?
I asked a few questions to Vishal Bagga, Senior Product Manager at Versant.

RVZ

Q1. How has Versant’s technology evolved over the past three years?

Vishal Bagga: Versant is a customer driven company. We work closely with our customers trying to understand how we can evolve our technology to meet their challenges – whether it’s regarding complexity, data size or demanding workloads.

In the last 3 years we have seen 2 very clear trends from our interaction with our new and existing customers – growing data sizes and increasingly parallel workloads. This is very much in-line with what the general database market is seeing. In addition there was request for simplified database management and monitoring.

Our state of the art Versant Object Database 8 released last year was designed for exactly these scenarios. We have added increased scalability and performance on multi-core architectures, faster and better defragmentation tools, Eclipse based management and monitoring tools to name a few. We are also re-architecting our database server technology to automatically scale when possible without manual DBA intervention and allow online tuning (reconfigure the database instance online without impacting applications).

Q2. On December 1, 2008 Versant acquired the assets of the database software business of Servo Software, Inc. (formerly db4objects, Inc.). What happened to db4objects since then? How does db4objects fit into Versant technology strategy?

Vishal Bagga: The db4o community is doing well and is an integral part of Versant. In fact, when we first acquired db4o at the end of 2008, there were just short of 50,000 registered members.
Today, the db4o community boasts nearly 110,000 members having more than doubled in size in the last 2+ years.
In addition, db4o has had 2 major releases with some significant advances in enterprise type features allowing things like online defragmentation support. In our latest major release, we announced a new data replication capability between db4o and the large scale enterprise class Versant database.
Versant sees a great need in the mobile markets for technology like db4o which can play well in the lightweight handheld, mobile computing and machine-to-machine space while leveraging big data aggregation servers like Versant which can handle the huge number of events coming off of these intelligent edge devices.
In the coming year, even greater synergies are being developed and our communities are merging into one single group dedicated to next generation NoSQL 2.0 technology development.

Q3. Versant database and NoSQL databases: what are the similarities and what are the differences?

Vishal Bagga: The Not Only SQL databases are essentially systems that have evolved out of a certain business need – The need was essentially to have horizontally scalable systems running on commodity hardware with a simple “soft-schema” model for example social networking, offline data crunching, distributed logging system, event processing systems etc.

Relational databases were considered to be too slow, expensive and difficult to manage and administrate, expensive and difficult to adapt to quick changing models.

If I look at similarities between Versant and NoSQL, I would say that:

Both systems have designed around the inefficiency of JOINs. This is the biggest problem with relational databases. If you think about it, in most operational systems relations don’t change e.g. Blog:Article, Order:OrderItem, so why recalculate those relations each time they are accessed using a methodology which gets slower and slower as the amount of data gets larger. JOINs have a use case, but for some 20%, not 100% of the use cases.

Both systems leverage an architectural shift to a “soft-schema” which allows scale-out capability – the ability to partition information across many physical nodes and treat those nodes as 1 ubiquitous database.

When it comes to differences:

The biggest in my opinion is the complexity of the data. Versant allows to you to model very complicated data models seamlessly with ease whereas doing so with a NoSQL solution would be much more effort and you would need to write a lot of code in the application to represent the data model.
In this respect, Versant prefers to use the term “soft-schema” –vs- the term “schemaless”, terms which are often interchanged in discussion.
We believe that data only becomes useful once it becomes structured, in fact that is the whole point of technologies like Hadoop, to churn unstructured data looking for a way to structure it into something useful.
NoSQL technologies that bill themselves as “schema-less” are in denial of the fact that they are leaving the application developer the burden of defining the structure and mapping the data into that structure in the language of the application space. In many ways, it is the mapping problem all over again. Plus, that kind of data management is very hard to change it over time, leading to a brittle solution difficult to optimize for more than 1 use case. The use of “soft-schema” lends itself to a more enterprise manageable and extensible system where the database still retains important elements of structure, while still being able to store and manipulate unstructured types.

Another is the difference in the consistency model. Versant is ACID centric and Versant’s customers depend on this for their mission critical systems – it would be nearly impossible for these systems to use NoSQL given the relaxed constraints. Versant can do a CAP mode, but that is not our only mode of operation. You use it where it is really needed; you are not forced into using it unilaterally.

NoSQL systems make you store your data in a way that you can lookup efficiently by a key. But what if want to lookup something differently; it is likely to be terribly inefficient. This may be okay for the design but a lot of people do not realize that this is a big change in mindset. Versant offers a more balanced approach where you can navigate between related objects using references; you can for example define a root object and then navigate your tree from that object. At the same time you can run ad-hoc queries whenever you want to.

Q4. Big Data: Can Versant database be useful when dealing with petabytes of user data? How?

Vishal Bagga: I don’t see why not. Versant was always designed to work on a network of databases from the very start. Dealing with a Petabyte is really about designing a system with the right architecture. Versant has that architecture just as intact as anyone in the database space saying they can handle a Petabyte. Make no mistake, no matter how you do it, it is a non-trivial task. Today, our largest customer databases are in the 100’s of terrabyte range, so getting to a Petabyte is really a matter of needing that much data.

Q5. Hadoop is designed to process large batches of data quickly. Do you plan to use Hadoop and leverage components of the Hadoop ecosystem like HBase, Pig, and Hive?

Vishal Bagga: Yes, and some of our customers already do that today. A question for you: “Why are those layers in existence?” I would say the answer is that most of these early NoSQL 1.0 technologies do not handle real world complexity in information models. So, these layers are built to try and compensate for that fact. That is the exact point where Versant’s NoSQL 2.0 technology fits into the picture, we help people deal with complexity of information models, something that 1st generation NoSQL has not managed to accomplish.

Q6. Do you think that projects such as JSON (JavaScript Object Notation) and MessagePack (binary-based efficient object serialization library ) play a role in the odbms market?

Vishal Bagga: Absolutely. We believe in open standards. Fortunately, you can store any type in an ODBMS. These specific libraries are particularly important for current most popular client frameworks like Ajax. Finding ways to deliver a soft-schema into a client friendly format is essential to help ease the development burden.

Q7. Looking at three elements: Data, Platform, Analysis, where is Versant heading up?

Vishal Bagga: It is a difficult question as database and data management is increasingly a cross cutting concern. It used to be perfectly fine to keep your Analysis as part of your off-line OLAP systems, but these days there is an increasing push to get Analytics to the real time business.
So, you play with Data, you play with Analytics whether you do it directly or in concert with other technologies through partnership. Certainly, as Versant embraces Platform as a Service, we will do so through eco system partners who are paving the way with new development and deployment methodologies.

– Benchmarking ORM tools and Object Databases. (March 14, 2011)

– Robert Greene on “New and Old Data stores” . (December 2, 2010)

– Object Database Technologies and Data Management in the Cloud. (September 27, 2010)

ODBMS.ORG Useful Links

Roberto V. Zicari — Mon, 09 Feb 2009 04:14:00 +0000

Since we started up in September 2005, ODBMS.ORG has grown quite a bit. A lot of free resources have been added in the course of the years.

I thought it could be useful to give you a few links to easy your search for useful resources….

Here we are:

If you are interested in Lecture Notes:
Object Databases – Lecture Notes

OO Programming – Lecture Notes

Database in General Lecture notes

If you are interested in testing some vendors software and/or download some free software:
Object Databases – Free Software

OO Programming – Free Software

If you are interested in standards, and in the Object Data Management Group -Past Resources in particular:
Object Data Management Group -Past Resources (ODMG Version 1-3)

If you would like to read user reports on how persistent objects are handled in various domains.

If you are interested in dedicated articles from ODBMS.ORG’s Panel of Experts

And plenty more of Articles and Papers on Object Databases

If you are looking to know more about Commercial and Open Source Object Database Vendors

Last but least if you are looking for books

Hope it helps….

RVZ

OMG ODBTWG next steps

Roberto V. Zicari — Tue, 16 Dec 2008 08:23:00 +0000

This is a short note related to the OMG ODBTWG meeting, on December 9, 2008.

During the meeting there was a consensus that the OMG’s Semantic Meta Object Facility (“semantic MOF” or “S-MOF”) would be a good place to start for the object model in the Object Database Standard RFP.

Mike Card is planning to publish a rough draft of an OMG RFP for the new database standard in advance of the March 2009 OMG meeting in Washington DC.

RFP stands for Request for Proposals; the OMG technology adoptions revolve around the RFP.
More info on the OMG Technology Adoption Process.

OMG is hosting an Object Database Standard Definition Scope meeting in Santa Clara

Roberto V. Zicari — Fri, 05 Dec 2008 06:18:00 +0000

I have received a note from Mike Card that I would like to share with you.

“The OMG is hosting an Object Database Standard Definition Scope meeting in Santa Clara, CA at the Hyatt Regency on Tuesday afternoon, December 9th.

The purpose of this meeting will be to define what the scope of the new object database standard should be.

We have already done some work in this area but more remains to be done.
Our goal is to complete the definition of what will and will not be included in the scope of the new standard at this meeting. Once we have defined what will and will not be included, I can begin work on a draft OMG Request For Proposal (RFP).
The RFP is important because this is the mechanism by which the OMG generates standards – an RFP is put out there and a group of vendors who intend to implement the final standard responds to the RFP with a standard.
So, we cannot get the ball rolling until we get the RFP out there, and we are getting close. Once the RFP is put out by the OMG, then the “real work” begins where object database vendors intending to submit and other interested parties begin working together to develop a response to the RFP.
It is this response that will become the successor to ODMG 3.0.

The agenda for this meeting will be as follows:

1300-1310 Welcome and introductory comments (Mike Card)
1310-1330 Review of scoping consensus thus far and db4o comments from last meeting (Mike Card)
1330-1630 Discussion of scope areas to be included or excluded (all participants)
1630-1700 Wrap-up and discussion of next steps (Mike Card)

We got some excellent feedback from db4o at our last meeting on these topics and we would like input from other vendors as well.

We very much hope to see you there! There is a $150 registration fee for this event, to register please visit the registration page

There should be a link there soon to register for this event. Thanks!

Michael P. Card
Syracuse Research Corporation “

For a summary of the work done until now by the OMG on the definition of a new object database standard, pls see my interview to Mike Card

O/R Impedance Mismatch? Users Speak Up! Third Series of User Reports published.

Roberto V. Zicari — Thu, 23 Oct 2008 02:12:00 +0000

I have published the third series of user reports on using technologies for storing and handling persistent objects.
I have defined “users” in a very broad sense, including: CTOs, Technical Directors, Software Architects, Consultants, Developers, and Researchers.

The third series includes 7 new user reports from the following users:

– Peter Train, Architect, Standard Bank Group Limited, South Africa.
– Biren Gandhi, IT Architect and Technical Consultant, IBM Global Business Services, Germany.
– Sven Pecher, Senior Consultant, IBM Global Business Services, Germany.
– Frank Stuch, Managing Consultant, IBM Global Business Services, Germany.
– Hiroshi Miyazaki, Software Architect, Fujitsu, Japan.
– Robert Huber, Managing Director, 7r gmbh, Switzerland.
– Thomas Amberg, Software Engineer, Oberon microsystems, Switzerland.

I asked each users a number of equal questions, among them what experience do they have in using the various options available for persistence for new projects and what are the lessons learned in using such solution(s).

“Some of our newer systems have been developed in-house using an object oriented paradigm. Most (if not all) of these use Relational Database systems to store data and the “impedance mismatch” problem does apply” says Peter Train from Standard Bank.

The lessons learned using Object Relational mapping tools confirm the complexity of such technologies.

Peter Train explains: “The most common problems that we have experienced with object Relational mapping tools are:
i) The effort required to define mappings between the object and the relational models; ii) Difficulty in understanding how the mapping will be implemented at runtime and how this might impact performance and memory utilization. In some cases, a great deal of effort is spent tweaking configurations to achieve satisfactory performance.”

Frank Stuch from IBM Global Business Services has used Hibernate, EJB 2 and EJB 3 Entity Beans in several projects.
Talking about his experience with such tools he says: “EJB 2 is too heavy weight and outdated by EJB 3. EJB 3 is not supported well by development environments like Rational Application Developer and not mature enough. In general all of these solutions give the developer 90% of the comfort of an OODBMS with well established RDBMS.
The problem is that this comfort needs a good understanding of the impedance mismatch and the consequences on performance (e.g. “select n+1 problem”). Many junior developers don’t understand the impact and therefore the performance of the generated/created data queries are often very poor. Senior developers can work very efficient with e.g. Hibernate. “

In some special cases custom solutions have been built, like in the case of Thomas Amberg who works in mobile and embedded software and explains “We use a custom object persistence solution based on sequential serialized update operations appended to a binary file”.

The new 7 reports and the complete series of user reports are available for free download.

I plan to continue to publish users reports on a regular base.

LINQ: the best option for a future Java query API?

Roberto V. Zicari — Tue, 07 Oct 2008 04:49:00 +0000

My interview to Mike Card has triggered an intense discussion (still ongoing), on the pros and cons of considering LINQ as the best option for a future Java query API.

There is a consensus that a common query mechanism for odbms is needed.

However, there is quite a disagreement on how this should be done. In particular, some see LINQ as a solution, provided that LINQ is also available for Java. Others on the contrary do not like LINQ, but would rather prefer a vendor neutral solution, for example based on SBQL.

You can follow the discussion here.

I have listed here some useful resources I published in ODBMS.ORG – related to this discussion:

Erik Meijer, José Blakeley
The Microsoft perspective on ORM
An Interview in ACM Queue Magazine with Erik Meijer and José Blakeley. With LINQ (language-integrated query) and the Entity Framework, Microsoft divided its traditional ORM technology into two parts: one part that handles querying (LINQ) and one part that handles mapping (Entity Framework).| September 2008 |

Panel Discussion “ODBMS: Quo Vadis?
Panel discussion with Mike Card, Jim Paterson, and Kazimierz Subieta, on their views on on some critical questions related to Object Databases: Where are Object Database Systems going? Are Relational database systems becoming Object Databases?
Do we need a standard for Object Databases? Why ODMG did not succeed?

Java Object Persistence: State of the Union PART II
Panel discussion with Jose Blakeley (Microsoft), Rick Cattell (Sun Microsystems), William Cook (University of Texas at Austin), Robert Greene (Versant), and Alan Santos (Progress). The panel addressed the ever open issue of the impedance mismatch.

Java Object Persistence: State of the Union PART I
Panel discussion with Mike Keith: EJB co-spec lead, main architect of Oracle Toplink ORM, Ted Neward: Independent consultant, often blogging on ORM and persistence topics, Carl Rosenberger: lead architect of db4objects, open source embeddable object database. Craig Russell: Spec lead of Java Data Objects (JDO) JSR, architect of entity bean engine in Sun’s appservers prior to Glassfish, on their views on the current State of the Union of object persistence with respect to Java.

Stack-Based Approach (SBA) and Stack-Based Query Language (SBQL)
Kazimierz Subieta, Polish-Japanese Institute of Information Technology
Introduction to object-oriented concepts in programming languages and databases, SBA and SBQL

The Object-Relational Impedance Mismatch
Scott Ambler, IBM. Scott explores the technical and the cultural impedance mismatch between the relational and the object world.

ORM Smackdown – Transcript
Ted Neward, Oren “Ayende” Eini. Transcripts of the Panel discussion “ORM Smackdown” on different viewpoints on Object-Relational Mapping (ORM) systems, courtesy of FranklinsNet.

OOPSLA Panel Objects and Databases
William Cook et.al. Transcript of a high ranking panel on objects and databases at the OOPSLA conference 2006, with representatives from BEA, db4objects, GemStone, Microsoft, Progress, Sun, and Versant.

Do you have an impedance mismatch problem? Users speak up! Second series of user reports published.

Roberto V. Zicari — Thu, 04 Sep 2008 05:03:00 +0000

I have started a new series of interviews with users of technologies for storing and handling persistent objects, around the globe.

6 additional user reports (12-17/08) have been published, from the following users:

Ajay Deshpande, Persistent
Horst Braeuner, City of Schwaebisch Hall
Tore Risch, Uppsala University
Michael Blaha, OMT Associates
Stefan Keller, HSR Rapperswil
Mohammed Zaki, Rensselaer

The complete initial series of user reports is available as always for free download.

Here I define “users” in a very broad sense, including: CTOs, Technical Directors, Software Architects, Consultants, Developers, Researchers.

I have asked 5 questions:

Q1. Please explain briefly what are your application domains and your role in the enterprise.

Q2. When the data models used to persistently store data (whether file systems or database management systems) and the data models used to write programs against the data (C++, Smalltalk, Visual Basic, Java, C#) are different, this is referred to as the “impedance mismatch” problem. Do you have an “impedance mismatch” problem?

Q3. What solution(s) do you use for storing and managing persistence objects? What experience do you have in using the various options available for persistence for new projects? What are the lessons learned in using such solution(s)?

Q4. Do you believe that Object Database systems are a suitable solution to the “object persistence” problem? If yes why? If not, why?

Q5. What would you wish as new research/development in the area of Object Persistence in the next 12-24 months?

More information here.

LINQ is the best option for a future Java query API

Roberto V. Zicari — Wed, 27 Aug 2008 12:26:00 +0000

A conversation with Mike Card.

I have interviewed Mike Card on the latest development of the OMG working group which aims at defining a new standards for Object Database Systems.

Mike works with Syracuse Research Corporation (SRC) and is involved in object databases and their application to challenging problems, including pattern recognition. He chairs the ODBT group in OMG to advance object database standardization.

R. Zicari: Mike, you recently chaired an OMG ODBTWG meeting, on June 24, 2008 What kind of synergy do you see outside OMG in relation to your work?

Mike Card: We think it is likely that the OMG would need to participate in the Java Community Process (JCP) in order to write a Java Specification Request (JSR) to add LINQ functionality to Java.

R. Zicari: There has been a lot of discussion lately on the merit of SBQL vs. LINQ as a possible query API standard for object databases . Did you discuss this issue at the meeting?

M. Card: I began the technical part of our meeting by reviewing Professor Subieta’s comparison of SBQL and LINQ. It was my understanding from this comparison that LINQ was technically capable of performing any query that could be performed by SBQL, and I wanted to know if the participants saw this the same way. They agreed in general, and believed that even if LINQ were only able to do 90% of what SBQL could do in terms of data retrieval that it would still be the way to go.

R. Zicari: Could you please go a bit more in detail on this?

M. Card: Sure. At the meeting it was pointed out that Prof. Subieta had noted in his comparison that he had not shown queries using features that are not a part of LINQ, such as fixed-point arithmetic, numeric ranges, etc.

These are language features that would be familiar to users of Ada but which are not found in languages like C++, C#, and Java so they would likely not be missed and would be considered esoteric.

It was also pointed out that the query examples chosen by Prof. Subieta in his comparison were all “projections” (relational term meaning a query or operation that produces as its output table a subset of the input table, usually containing only some of the input table’s columns).

A query like this by definition will rely on iteration, and this will show the inherent expressive power of SBQL since the abstract machine contains a stack that can be used to do the iteration processing and thus avoid the loops, variables, etc. needed by SQL/LINQ.

R. Zicari: Did you agree on a common direction for your work in the group?

M. Card: The consensus at this meeting and at ICOODB conference in Berlin was that LINQ was the best option for a future Java query API since it already had broad support in the .Net community. We will have to choose a new name for the OMG-Java effort, however, as LINQ is trademarked by Microsoft.

It was also agreed that the query language need not include object update capability, as object updates were generally handled by object method invocations and not from within query expressions.

Now, since LINQ allows method invocations as part of navigation (e.g. “my_object.getBoss().getName()”) it is entirely possible that these method calls could have side effects that update the target objects, perhaps in such a way that the changes would not get saved to the database.

This was recognized as a problem, ideas kicked around for how to solve it included source code analysis tools.
This is something we will need a good answer for as it is a potential “open manhole cover” if we intend the LINQ API to be read-only and not capable of updating the database (especially unintentionally!)

R. Zicari: What else did you address at the meeting?

Mike Card: The discussion then moved on to a list of items included Carl Rosenberger’s ICOODB presentation.
Other items were also reviewed from an e-mail thread in the ODBMS.ORG forumthat included comments from both Prof. Subieta and Prof. William Cook.

The areas discussed were broken down into 3 groups:
i) those things there was consensus on for standardization,
ii) those things that needed more discussion/participation by a larger group, and
iii) those things that there was consensus on for exclusion from standardization.

R. Zicari: What are the areas you agree to standardize?

Mike Card: The areas we agree to standardize are:

1. object lifecycle (in memory): What happens at object creation/deletion, “attached” and “detached” objects, what happens during a database transaction (activation and de-activation), etc. It is desirable that we base our efforts in this area on what has already been done in existing standards for Java such as JDO, JPA, OMG, et. al. This interacts with the concurrency control mechanism for the database engine, may need to refer to Bernstein et. al. for serialization theory / CC algorithms.

2. object identification: A participant raised a concern here RE: re-use of OID where the OID is implemented as a physical pointer and memory is re-cycled resulting in re-use of an OID, which can corrupt some applications. He favored a standard requiring all OIDs to be unique and not re-used

3. session:: what are the definition and semantics of a session?
a. Concurrency control: again, we should refer to Bernstein et. al. for proven algorithms and mathematical definitions in lieu of ACID criteria (ACA: Avoidance of Cascading Aborts, ST: Strict, SR: Serializable, RC: Recoverable for characterizing transaction execution sequences)
b. Transactions: semantics/behavior and span/scope

4. Object model: what OM will we base our work upon?

5. Native language APIs: how will we define these? Will they be based on the Java APIs in ODMG 3.0, or will they be different? Will they be interfaces?

6. Conformance test suite: we will need one of these for each OO language we intend to define a standard for. The test suite, however, is not the definition of the standard; the definition must exist in the specification.

7. Error behavior: exception definitions etc.

R. Zicari: What are the areas where no agreement was (yet) found?

Mike Card: Areas we need to find agreement on are:

1. keys and indices: how do you sort objects? How do you define compound keys or spatial keys? Uniqueness constraints? Can this be handled by annotation, with the annotation being standardized but the implementation being vendor-specific? This interacts with the query mechanism, e.g. availability of an index could be checked for by the query optimizer.

2. referential integrity: do we want to enforce this? Avoidance of dangling pointers, this interacts with object lifecycle/GC considerations.

3. cascaded delete: when you delete an object, do you also delete all objects that it references? It was pointed out that this has issues for a client/server model ODBMS like Versant because it may have to “push” out to clients that objects on the server have been deleted, so you have a distributed cache consistency problem to solve.

4. replication/synchronization: how much should we standardize the ability to keep a synchronized copy of part or all of an object database? Should the replication mechanism be interoperable with relational databases? Part or all of this capability could be included in an optional portion of the standard.

a. Backup:
this is a specialized form of replication, how much should this be standardized? Is the answer to this
question dependent upon the kind of environment (DBA or DBA-less/embedded) that the ODBMS is operating in?

5. events/triggers: do we want to standardize certain kinds of activity (callbacks et. al.) when certain database operations occur?

6. update within query facility: this is a recognition of the limitations of LINQ, which does not support object update it is “read-only.” Generally, object updates and deletes are performed by method invocations in a program and not by query statements.
The question is, since LINQ allows method invocations as part of navigation, e.g. “my_employee_obj.getBoss().getName(),” is it possible in cases like this that such method calls could have side effects which update the object(s) in the navigation statement? If so, what should be done?

7. extents: do we expose APIs for extents to the user?

8. support for C++: how will we support C++/legacy languages for which a LINQ-like facility is not available? We could investigate string-based QL like OQL and/or we could use a facility similar to Cook/db4o “native queries”

R. Zicari: And what are the areas you definitely do not want to standardize?

Mike Card: Areas we do not want to standardize are:

1. garbage collection: issue here is behavioral differences between “embedded” (linked-in) OODBMS vs. client/server OODBMS

2. stored procedures/functions/views: these are relational/SQL concepts that are not necessarily applicable to object-oriented programming languages which are the purview of object databases.

R. Zicari: How will you ensure that the vendor community will support this proposal?

Mike Card: We plan on discussing this list and verify that others not present agree with the grouping of these items. We should also figure out what we want to do with the items in the “middle” group and then begin prioritizing these things. It appears likely that a next-generation ODBMS standard will follow a “dual-track” model in that the query mechanism (at least for Java) will be developed as a JSR within the JCP, while all of the other items will be developed within the OMG process.

For C# (assuming C# is a language we will want an ODBMS standard for, and I think it is), the query API will be built into the language via LINQ and we will need to address all of the “other” issues within our OMG effort just as with Java. In the case of C# and Java, most of these issues can probably be dealt with in the same manner.

How much interest there is in a C++ standardization effort is unclear, this is an area we will need to discuss further.
A LINQ-like facility for C++ is not an option since unlike C# and Java there is no central maintenance point for C++ compilers.

There is an ISO WG that maintains the C++ standard, but C++ “culture” accepts non-conformant compilers so there are many C++ compilers out there that only conform to part of the ISO standard.

The developers present who work with C++ mentioned that their C++ code base must be “tweaked” to work with various compilers as a given set of C++ code might compile fine with 7 compilers but fail with the compiler from vendor number 8.
In general, the maintenance of C++ is more difficult than for Java and C# due to inconsistency in compiler implementation and this complicates anything we want to do with something as complex as object persistence.
##

Some Useful Resources:
– Panel Discussion “ODBMS: Quo Vadis?

– Java Object Persistence: State of the Union PART II

– Java Object Persistence: State of the Union PART I