ODBMS Industry Watch » Java Object Persistence http://www.odbms.org/blog Trends and Information on Big Data, New Data Management Technologies, Data Science and Innovation. Sun, 02 Apr 2017 17:59:10 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.13 Big Data from Space: the “Herschel” telescope. http://www.odbms.org/blog/2013/08/big-data-from-space-the-herschel-telescope/ http://www.odbms.org/blog/2013/08/big-data-from-space-the-herschel-telescope/#comments Fri, 02 Aug 2013 12:45:02 +0000 http://www.odbms.org/blog/?p=2169

” One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on”–Jon Brumfitt.

On May 14, 2009, the European Space Agency launched an Arianne 5 rocket carrying the largest telescope ever flown: the “Herschel” telescope, 3.5 meters in diameter.

I first did an interview with Dr. Jon Brumfitt, System Architect & System Engineer of Herschel Scientific Ground Segment, at the European Space Agency in March 2011. You can read that interview here.

Two years later, I wanted to know the status of the project. This is a follow up interview.


Q1. What is the status of the mission?

Jon Brumfitt: The operational phase of the Herschel mission came to an end on 29th April 2013, when the super-fluid helium used to cool the instruments was finally exhausted. By operating in the far infra-red, Herschel has been able to see cold objects that are invisible to normal telescopes.
However, this requires that the detectors are cooled to an even lower temperature. The helium cools the instruments down to 1.7K (about -271 Celsius). Individual detectors are then cooled down further to about 0.3K. This is very close to absolute zero, which is the coldest possible temperature. The exhaustion of the helium marks the end of new observations, but it is by no means the end of the mission.
We still have a lot of work to do in getting the best results from the data processing to give astronomers a final legacy archive of high-quality data to work with for years to come.

The spacecraft has been in orbit around a point known as the second Lagrangian point “L2″, which is about 1.5 million kilometres from Earth (around four times as far away as the Moon). This location provided a good thermal environment and a relatively unrestricted view of the sky. The spacecraft cannot be left in this orbit because regular correction manoeuvres would be needed. Consequently, it is being transferred into a “parking” orbit around the Sun.

Q2. What are the main results obtained so far by using the “Herschel” telescope?

Jon Brumfitt: That is a difficult one to answer in a few sentences. Just to take a few examples, Herschel has given us new insights into the way that stars form and the history of star formation and galaxy evolution since the big-bang.
It has discovered large quantities of cold water vapour in the dusty disk surrounding a young star, which suggests the possibility of other water covered planets. It has also given us new evidence for the origins of water on Earth.
The following are some links giving more detailed highlights from the mission:

– Press
– Results
– Press Releases
– Latest news

With its 3.5 metre diameter mirror, Herschel is the largest space telescope ever launched. The large mirror not only gives it a high sensitivity but also allows us to observe the sky with a high spatial resolution. So in a sense every observation we make is showing us something we have never seen before. We have performed around 35,000 science observations, which have already resulted in over 600 papers being published in scientific journals. There are many years of work ahead for astronomers in interpreting the results, which will undoubtedly lead to many new discoveries.

Q3. How much data did you receive and process so far? Could you give us some up to date information?

Jon Brumfitt: We have about 3 TB of data in the Versant database, most of which is raw data from the spacecraft. The data received each day is processed by our data processing pipeline and the resulting data products, such as images and spectra, are placed in an archive for access by astronomers.
Each time we make a major new release of the software (roughly every six months at this stage), with improvements to the data processing, we reprocess everything.
The data processing runs on a grid with around 35 nodes, each with typically 8 cores and between 16 and 256 GB of memory. This is able to process around 40 days worth of data per day, so it is possible to reprocess everything in a few weeks. The data in the archive is stored as FITS files (a standard format for astronomical data).
The archive uses a relational (PostgreSQL) database to catalogue the data and allow queries to find relevant data. This relational database is only about 60 GB, whereas the product files account for about 60 TB.
This may reduce somewhat for the final archive, once we have cleaned it up by removing the results of earlier processing runs.

Q4. What are the main technical challenges in the data management part of this mission and how did you solve them?

Jon Brumfitt: One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on.

The lifetime of Herschel will have been 18 years from the start of software development to the end of the post-operations phase.
We designed a single system to meet the needs of all mission phases, from early instrument development, through routine in-flight operations to the end of the post-operations phase. Although the spacecraft was not launched until 2009, the database was in regular use from 2002 for developing and testing the instruments in the laboratory. By using the same software to control the instruments in the laboratory as we used to control them in flight, we ended up with a very robust and well-tested system. We call this approach “smooth transition”.

The development approach we adopted is probably best classified as an Agile iterative and incremental one. Object orientation helps a lot because changes in the problem domain, resulting from changing requirements, tend to result in localised changes in the data model.
Other important factors in managing change are separation of concerns and minimization of dependencies, for example using component-based architectures.

When we decided to use an object database, it was a new technology and it would have been unwise to rely on any database vendor or product surviving for such a long time. Although work was under way on the ODMG and JDO standards, these were quite immature and the only suitable object databases used proprietary interfaces.
We therefore chose to implement our own abstraction layer around the database. This was similar in concept to JDO, with a factory providing a pluggable implementation of a persistence manager. This abstraction provided a route to change to a different object database, or even a relational database with an object-relational mapping layer, should it have proved necessary.

One aspect that is difficult to abstract is the use of queries, because query languages differ. In principle, an object database could be used without any queries, by navigating to everything from a global root object. However, in practice navigation and queries both have their role. For example, to find all the observation requests that have not yet been scheduled, it is much faster to perform a query than to iterate by navigation to find them. However, once an observation request is in memory it is much easier and faster to navigate to all the associated objects needed to process it. We have used a variety of techniques for encapsulating queries. One is to implement them as methods of an extent class that acts as a query factory.

Another challenge was designing a robust data model that would serve all phases of the mission from instrument development in the laboratory, through pre-flight tests and routine operations to the end of post-operations. We approached this by starting with a model of the problem domain and then analysing use-cases to see what data needed to be persistent and where we needed associations. It was important to avoid the temptation to store too much just because transitive persistence made it so easy.

One criticism that is sometimes raised against object databases is that the associations tend to encode business logic in the object schema, whereas relational databases just store data in a neutral form that can outlive the software that created it; if you subsequently decide that you need a new use-case, such as report generation, the associations may not be there to support it. This is true to some extent, but consideration of use cases for the entire project lifetime helped a lot. It is of course possible to use queries to work-around missing associations.

Examples are sometimes given of how easy an object database is to use by directly persisting your business objects. This may be fine for a simple application with an embedded database, but for a complex system you still need to cleanly decouple your business logic from the data storage. This is true whether you are using a relational or an object database. With an object database, the persistent classes should only be responsible for persistence and referential integrity and so typically just have getter and setter methods.
We have encapsulated our persistent classes in a package called the Core Class Model (CCM) that has a factory to create instances. This complements the pluggable persistence manager. Hence, the application sees the persistence manager and CCM factories and interfaces, but the implementations are hidden.
Applications define their own business classes which can work like decorators for the persistent classes.

Q5. What is your experience in having two separate database systems for Herschel? A relational database for storing and managing processed data products and an object database for storing and managing proposal data, mission planning data, telecommands and raw (unprocessed) telemetry?

Jon Brumfitt: There are essentially two parts to the ground segment for a space observatory.
One is the “uplink” which is used for controlling the spacecraft and instruments. This includes submission of observing proposals, observation planning, scheduling, flight dynamics and commanding.
The other is the “downlink”, which involves ingesting and processing the data received from the spacecraft.

On some missions the data processing is carried out by a data centre, which is separate from spacecraft operations. In that case there is a very clear separation.
On Herschel, the original concept was to build a completely integrated system around an object database that would hold all uplink and downlink data, including processed data products. However, after further analysis it became clear that it was better to integrate our product archive with those from other missions. This also means that the Herschel data will remain available long after the project has finished. The role of the object database is essentially for operating the spacecraft and storing the raw data.

The Herschel archive is part of a common infrastructure shared by many of our ESA science projects. This provides a uniform way of accessing data from multiple missions.
The following is a nice example of how data from Herschel and our XMM-Newton X-ray telescope have been combined to make a multi-spectral image of the Andromeda Galaxy.

Our archive, in turn, forms part of a larger international archive known as the “Virtual Observatory” (VO), which includes both space and ground-based observatories from all over the world.

I think that using separate databases for operations and product archiving has worked well. In fact, it is more the norm rather than the exception. The two databases serve very different roles.
The uplink database manages the day-to-day operations of the spacecraft and is constantly being updated. The uplink data forms a complex object graph which is accessed by navigation, so an object database is well suited.
The product archive is essentially a write-once-read-many repository. The data is not modified, but new versions of products may be added as a result of reprocessing. There are a large number of clients accessing it via the Internet. The archive database is a catalogue containing the product meta-data, which can be queried to find the relevant product files. This is better suited to a relational database.

The motivation for the original idea of using a single object database for everything was that it allowed direct association between uplink and downlink data. For example, processed products could be associated with their observation requests. However, using separate databases does not prevent one database being queried with an observation identifier obtained from the other.
One complication is that processing an observation requires both downlink data and the associated uplink data.
We solved this by creating “uplink products” from the relevant uplink data and placing them in the archive. This has the advantage that external users, who do not have access to the Versant database, have everything they need to process the data themselves.

Q6. What are the main lessons learned so far in using Versant object database for managing telemetry data and information on steering and calibrating scientific on-board instruments?

Jon Brumfitt: Object databases can be very effective for certain kinds of application, but may have less benefit for others. A complex system typically has a mixture of application types, so the advantages are not always clear cut. Object databases can give a high performance for applications that need to navigate through a complex object graph, particularly if used with fairly long transactions where a significant part of the object graph remains in memory. Web (JavaEE) applications lose some of the benefit because they typically perform many short transactions with each one performing a query. They also use additional access layers that result in a system which loses the simplicity of the transparent persistence of an object database.

In our case, the object database was best suited for the uplink. It simplified the uplink development by avoiding object-relational mapping and the complexity of a design based on JDBC or EJB 2. Nowadays with JPA, relational databases are much easier to use for object persistence, so the rationale for using an object database is largely determined by whether the application can benefit from fast navigational access and how much effort is saved in mapping. There are now at least two object database vendors that support both JDO and JPA, so the distinction is becoming somewhat blurred.

For telemetry access we query the database instead of using navigation, as the packets don’t fit neatly into a single containment hierarchy. Queries allows packets to be accessed by many different criteria, such as time, instrument, type, source and so on.
Processing calibration observations does not introduce any special considerations as far as the database is concerned.

Q7. Did you have any scalability and or availability issues during the project? If yes, how did you solve them?

Jon Brumfitt: Scalability would have been an important issue if we had kept to the original concept of storing everything including products in a single database. However, using the object database for just uplink and telemetry meant that this was not a big issue.

The data processing grid retrieves the raw telemetry data from the object database server, which is a 16-core Linux machine with 64 GB of memory. The average load on the server is quite low, but occasionally there have been high peak loads from the grid that have saturated the server disk I/O and slowed down other users of the database. Interactive applications such as mission planning need a rapid response, whereas batch data processing is less critical. We solved this by implementing a mechanism to spread out the grid load by treating the database as a resource.

Once a year, we have made an “Announcement of Opportunity” for astronomers to propose observations that they would like to perform with Herschel. It is only human nature that many people leave it until the last minute and we get a very high peak load on the server in the last hour or two before the deadline! We have used a separate server for this purpose, rather than ingesting proposals directly into our operational database. This has avoided any risk of interfering with routine operations. After the deadline, we have copied the objects into the operational database.

Q8. What about the overall performance of the two databases? What are the lessons learned?

Jon Brumfitt: The databases are good at different things.
As mentioned before, an object database can give a high performance for applications involving a complex object graph which you navigate around. An example is our mission planning system. Object persistence makes application design very simple, although in a real system you still need to introduce layers to decouple the business logic from the persistence.

For the archive, on the other hand, a relational database is more appropriate. We are querying the archive to find data that matches a set of criteria. The data is stored in files rather than as objects in the database.

Q9. What are the next steps planned for the project and the main technical challenges ahead?

Jon Brumfitt: As I mentioned earlier, the coming post-operations phase will concentrate on further improving the data processing software to generate a top-quality legacy archive, and on provision of high-quality support documentation and continued interactive support for the community of astronomers that forms our “customer base”. The system was designed from the outset to support all phases of the mission, from early instrument development tests in the laboratory, though routine operations to the end of the post-operations phase of the mission. The main difference moving into post-operations is that we will stop uplink activities and ingesting new telemetry. We will continue to reprocess all the data regularly as improvements are made to the data processing software.

We are currently in the process of upgrading from Versant 7 to Versant 8.
We have been using Versant 7 since launch and the system has been running well, so there has been little urgency to upgrade.
However, with routine operations coming to an end, we are doing some “technology refresh”, including upgrading to Java 7 and Versant 8.

Q10. Anything else you wish to add?

Jon Brumfitt: These are just some personal thoughts on the way the database market has evolved over the lifetime of Herschel. Thirteen years ago, when we started development of our system, there were expectations that object databases would really take off in line with the growing use of object orientation, but this did not happen. Object databases still represent rather a niche market. It is a pity there is no open-source object-database equivalent of MySQL. This would have encouraged more people to try object databases.

JDO has developed into a mature standard over the years. One of its key features is that it is “architecture neutral”, but in fact there are very few implementations for relational databases. However, it seems to be finding a new role for some NoSQL databases, such as the Google AppEngine datastore.
NoSQL appears to be taking off far quicker than object databases did, although it is an umbrella term that covers quite a few kinds of datastore. Horizontal scaling is likely to be an important feature for many systems in the future. The relational model is still dominant, but there is a growing appreciation of alternatives. There is even talk of “Polyglot Persistence” using different kinds of databases within a system; in a sense we are doing this with our object database and relational archive.

More recently, JPA has created considerable interest in object persistence for relational databases and appears to be rapidly overtaking JDO.
This is partly because it is being adopted by developers of enterprise applications who previously used EJB 2.
If you look at the APIs of JDO and JPA they are actually quite similar apart from the locking modes. However, there is an enormous difference in the way they are typically used in practice. This is more to do with the fact that JPA is often used for enterprise applications. The distinction is getting blurred by some object database vendors who now support JPA with an object database. This could expand the market for object databases by attracting some traditional relational type applications.

So, I wonder what the next 13 years will bring! I am certainly watching developments with interest.

Dr Jon Brumfitt, System Architect & System Engineer of Herschel Scientific Ground Segment, European Space Agency.

Jon Brumfitt has a background in Electronics with Physics and Mathematics and has worked on several of ESA’s astrophysics missions, including IUE, Hipparcos, ISO, XMM and currently Herschel. After completing his PhD and a post-doctoral fellowship in image processing, Jon worked on data reduction for the IUE satellite before joining Logica Space and Defence in 1980. In 1984 he moved to Logica’s research centre in Cambridge and then in 1993 to ESTEC in the Netherlands to work on the scientific ground segments for ISO and XMM. In January 2000, he joined the newly formed Herschel team as science ground segment System Architect. As Herschel approached launch, he moved down to the European Space Astronomy Centre in Madrid to become part of the Herschel Science Operations Team, where he is currently System Engineer and System Architect.

Related Posts

The Gaia mission, one year later. Interview with William O’Mullane. January 16, 2013

Objects in Space: “Herschel” the largest telescope ever flown. March 18, 2011


Introduction to ODBMS By Rick Grehan

ODBMS.org Resources on Object Database Vendors.

You can follow ODBMS.org on Twitter : @odbmsorg


http://www.odbms.org/blog/2013/08/big-data-from-space-the-herschel-telescope/feed/ 0
On Impedance Mismatch. Interview with Reinhold Thurner http://www.odbms.org/blog/2012/08/on-impedance-mismatch-interview-with-reinhold-thurner/ http://www.odbms.org/blog/2012/08/on-impedance-mismatch-interview-with-reinhold-thurner/#comments Mon, 27 Aug 2012 16:34:21 +0000 http://www.odbms.org/blog/?p=1693 “Many enterprises sidestep applications with “Shadow IT” to solve planning, reporting and analysis problems” — Reinhold Thurner.

I am coming back to the topic of “Impedance Mismatch”.
I have interviewed one of our experts, Dr. Reinhold Thurner founder of Metasafe GmbH in Switzerland.


Q1. In a recent interview José A. Blakeley and Rowan Miller of Microsoft, said that “the impedance mismatch problem has been significantly reduced, but not entirely eliminated”? Do you agree?

Thurner: Yes I agree, with some reservations and only for the special case for the impedance mismatch between a conceptual model, a relational database and an oo-program. However even an advanced ORM is not really a solution for the more general case of complex data which affects any (also non oo) programmer and especially also an end user.

Q2. Could you please explain better what you mean here?

Thurner: My reservations concern the tools and the development process: Several standalone tools (model-designer, mapper, code generator, schema-loader) are connected by intermediate files. Is is difficult if not impossible to develop a transparent model transformation which relieves the developer from the necessity to “think” on both levels – the original model and the transformed model – at the same time. The conceptual models can be “painted” easily, but they cannot be “executed” and tested with test data.
They are practically disjoint from the instance data. It takes a lot of discipline to avoid that changes in the data structures are directly applied to the final database with the consequence that the conceptual model is lost.
I rephrase from a document about ADO.net: “Most significant applications involve a conceptual design phase early in the application development lifecycle. Unfortunately, however, the conceptual data model is captured inside a database design tool that has little or no connection with the code and the relational schema used to implement the application. The database design diagrams created in the early phases of the application life cycle usually stay “pinned to a wall” growing increasingly disjoint from the reality of the application implementation with time.”

Q3. You are criticizing the process and the tools – what is the alternative?

Thurner: I compare this tool-architecture with the idea of an “integrated view of conceptual modeling, databases and CASE” (actually the title of one of your books). The basic ideas did exist already in the early 90es but were not realized because the means to implement a “CASE database” were missing: modeling concepts (OMG), languages (java, c#), frameworks (Eclipse), big cheap memory, powerful cpus, big screens etc. Today we are in a much better position and it is now feasible to create a data platform (i.e. a database for CASE) for tool integration. As José A. Blakeley argues, ‘(…) modern applications and data services need to target a higher-level conceptual model based on entities and relationships (…)’. A modern data platform is a prerequisite to supports such a concept?

Q4. Could you give us some examples of typical (impedance mismatch) problems still existing in the enterprise? How are they practically handled in the enterprise?

Thurner: As a consequence of the problems with the impedance mismatch some applications don’t use database technology at all or develop a thick layer of proprietary services which are in fact a sort of private DBMS.
Many enterprises sidestep applications with “Shadow
” to solve planning, reporting and analysis problems– i.e. Spreadsheets instead of databases, mail for multi-user access and data exchange, security by obscurity and a lot of manual copy and paste.
Another important area is development tools: Development tools deal with a large number of highly interconnected artifacts which must be managed in configurations and versions. These artifacts are still stored in files, libraries and some in relational databases with a thick layer on top. A proper repository would provide better services for a tool developer and helps to create products which are more flexible and easier to use.
Data management and information dictionary: IT-Governance (COBIT) stipulates that a company should maintain an “information dictionary” which contains all “information assets”, their logical structure, the physical implementations and the responsibilities (RACI-Charts, data steward). The common warehouse model (OMG) describes the model of the most common types of data stores – which is a good start: but companies with several DMBSs, hundreds of databases, servers and programs accessing thousands of tables and IMS-segments need a proper database to store the instances to make the “information model real”. Users of such a dictionary (designers, programmers, testers, integration services, operations, problem management etc.) need an easy to use query-language to access these data in an explorative manner.

Q5. If ORM technology cannot solve this kind of problem? What are the alternatives?

Thurner: The essence of ORM-technology is to create a bridge between the “existing eco-system of databases based on the relational model and the conceptual model”. The “relational model” is not the one and only possible approach to persist data. Data storage technology has moved up the ladder from sequential files to index-sequential, to multi-index, to codasyl, to hierarchical (IMS) and today’s market leader RDBMS. This is certainly not the end and the future seems to become very colorful. As Michael Stonebraker explains “In summary, there may be a substantial number of domain-specific database engines with differing capabilities off into the future. See his paper “One Size fits all – an Idea whose time has come and gone“.
ADO.net has been described as “a part of the broader Microsoft Data Access vision” and covers a specific subset of applications. Is the “other part” – the “executable conceptual model” which was mentioned by Peter Chen in a discussion with José Blakely about “The future of Database Systems”?
I am convinced that an executable conceptual model will play an important role for the aforementioned problems: A DMBS with an entity-relationship model implements the conceptual model without an impedance mismatch. To succeed it needs however all the properties José mentioned like queries, transactions, access-rights and integrated tools.

Q6. You started a company which developed a system called Metasafe-Repository. What is it?

Thurner: It started long ago with a first version developed in C, which was e.g. used in a reengineering project to manage a few hundred types, one million instances and about five million bidirectional relationships. In 2006 we decided to redevelop the system from scratch in java and the necessary tools with the Eclipse framework. We started with the basic elements – multi-level architecture based on the entity-relationship-model, integration of models and instance-data, ACID transactions, versioning and user access rights. During development the system morphed from the initial idea of a repository-service to a complete DBMS. We developed a set of entirely model driven tools – modeler, browser, import-/export utilities, Excel-interface, ADO-Driver for BIRT etc.
Metasafe has a multilevel structure: an internal metamodel, the global data model, external models as subsets (views) of the global data model and the instance data – in OMG-Terms it stretches from M3 to M0. All types (M2, M1) are described by meta-attributes as defined in the metamodel. User access rights to models and instance data are tied to the external models. Entity instances (M0) may exist in several manifestations (Catalog, Version, Variant). An extension of the data model e.g. by additional data types, entity types, relationship types or submodels can be defined using the metaModeler tool (or via the API by a program). From the moment the model changes are committed, the database is ready to accept instance data for the new types without unload/reload of the database.

Q7. Is the Metasafe repository the solution to the impedance mismatch problem?

Thurner: It is certainly a substantial step forward because we made the conceptual model and the database definition one and the same. We take the conceptual model literally by its word: If an ‘Order has Orderdetails’, we tell the database to create two entity types ‘Order’ and ‘Orderdetails’ and the relation ‘has’ between them. This way Metasafe implements an executable conceptual model with all the required properties of a real database management system: an open API, versioning, “declarative, model related queries and transactions” etc. Our own set of tools and especially the integration of BIRT (the eclipse Business Intelligence and Reporting Tool) demonstrate how it can be done. Our graphical erSQL query builder is even integrated into the BIRT designer. The erSQL queries are translated on the fly and BIRT accesses our database without any intermediate files.

Q8: What is the API of the Metasafe repository?

Thurner: Metasafe provides an object-oriented Java-API for all methods required to search, create, read, update, delete the elements of the database – i.e. schemas, user groups /users, entities, relationships and their attributes – both on type- and on instance-level. All the tools of Metasafe (modeler, browser, import/export, query builder etc.) are built with this public API. This approach has led to an elaborate set of methods to support an application programmer. The erSQL query-builder and also the query-translator and processor were implemented with this API. An erSQL query can be embedded in a java-program to retrieve a result-set (including its metadata) or to export the result-set.
In early versions we had a C#-version in parallel but we discontinued this branch when we started with the development of the tools based on Eclipse RCP. The reimplementation of the core in C# would be relatively easy. I think that also the tools could be reimplemented because they are entirely model-driven.

Q9 How does Metasafeˈs query language differ from the Microsoft Entity Framework built-in query capabilities (i.e. Language Integrated Query (LINQ)?

Thurner: It is difficult to compare because Metasafe’s ersql query-language was designed with respect to the special nature of an Entity-Relationship model with heavily cross linked information. So the erSQL query language maps directly to the conceptual model. Also “end users” can create queries with the graphical query builder with point and click on the graphical representation of the conceptual model to identify the path through the model and to collect the information of interest.

The queries are translated on the fly and processed by the query processor. The validation and translation of a query into a command structure of the query processor is a matter or milliseconds. The query processor returns result sets of metadata and typed instance data. The query result can also be exported as Excel-Table or as XML-file. In “read-mode” the result of each retrieval step (instance objects and their attributes) is returned to the invoking program instead of building the complete result set. A query represents a sort of “user” model and is also documented graphically. “End users” can easily create queries and retrieve data from the database. erSQL and the graphical query builder is fully integrated in BIRT to create reports on the fly.
The present version supports only information retrieval. We plan to extend it by a ” … for update” feature which locks all selected entity instances for further operation.
E.g. an update query for {an order and all its order items and products} would lock “the order” until backout or commit.

Q10. There are concerns about the performance and the overhead generated by ORM technology. Is performance an issue for Metasafe?

Thurner: Performance is always an issue when the number of concurrent users and the size and complexity of the data grow. The system works quite well for medium size systems with a few hundred types, a few million instances and a few GBs. The performance depends on the translation of the logical requests into physical access commands and on the execution of the physical access to the persistence. Metasafe uses a very limited functionality of an RDBMS (currently SQLServer, Derby, Oracle) for persistence. Locking, transactions, multi-user management is handled by Metasafe; the locking tables are kept in memory. After a commit it writes all changes in one burst to the database. We could of course use an in-memory DBMS to gain performance. E.g. VoltDB with the direct transaction access could be integrated easily and would certainly lead to superior performance.
We have also another kind of performance in mind – the user performance. For many applications the number of milliseconds to execute a transaction are less important than the ability to quickly create or change a database and to create and launch queries in a matter of minutes. Metasafe is especially helpful for this kind of application.

Q11. What problems is Metasafe designed to solve?

Thurner: Metasafe is designed as a generic data platform for medium sized (XX GB) model-driven applications. The primary purpose is the support for applications with large, complex and volatile data structures as tools, models, catalogs or process managers etc. Metasafe could be used to replace some legacy repositories.
Metasafe is certainly the best data platform (repository) for the construction of an integrated development environment. Metasafe can also serve as DBMS for business applications.
We evaluate also the possibilities to use that Metasafe DBMS as data platform for portable devices as phones and tablet computers: This could be a real killer application for application developers.

Q12. How do you position Metasafe in the market?

Thurner: I had the vision of an entity relationship base database system as future data platform and decided to develop Metasafe to a really useful level without the pressure of the market (namely the first time users). Now we have our product on the necessary level of quality and we are planning the next steps. It could be the “open source approach” for a limited version or the integration into a larger organization.
We have a number of applications and POCs but we have no substantial customer base yet, which would require an adequate support and sales organization. But we have not the intension to convert a successful development setup into a mediocre service and sales organization. We are not under time pressure and are looking at a number of possibilities.

Q13. How can the developers community test your system?

Thurner: We provide an evaluation version upon request.

Related Posts

Do we still have an impedance mismatch problem? – Interview with José A. Blakeley and Rowan Miller. by Roberto V. Zicari on May 21, 2012


“Implementing the Executable Conceptual Model (ECM)” (download as .pdf),
by Dr. Reinhold Thurner, Metasafe.

ODBMS.org Free Resources on:
Entity Framework (EF) Resources
ORM Technology
Object-Relational Impedance Mismatch


http://www.odbms.org/blog/2012/08/on-impedance-mismatch-interview-with-reinhold-thurner/feed/ 0
In-memory database systems. Interview with Steve Graves, McObject. http://www.odbms.org/blog/2012/03/in-memory-database-systems-interview-with-steve-graves-mcobject/ http://www.odbms.org/blog/2012/03/in-memory-database-systems-interview-with-steve-graves-mcobject/#comments Fri, 16 Mar 2012 07:43:44 +0000 http://www.odbms.org/blog/?p=1371 “Application types that benefit from an in-memory database system are those for which eliminating latency is a key design goal, and those that run on systems that simply have no persistent storage, like network routers and low-end set-top boxes” — Steve Graves.

On the topic of in-memory database systems, I did interview one of our expert, Steve Graves, co-founder and CEO of McObject.


Q1. What is an in-memory database system (IMDS)?

Steve Graves: An in-memory database system (IMDS) is a database management system (DBMS) that uses main memory as its primary storage medium.
A “pure” in-memory database system is one that requires no disk or file I/O, whatsoever.
In contrast, a conventional DBMS is designed around the assumption that records will ultimately be written to persistent storage (usually hard disk or flash memory).
Obviously, disk or flash I/O is expensive, in performance terms, and therefore retrieving data from RAM is faster than fetching it from disk or flash, so IMDSs are very fast.
An IMDS also offers a more streamlined design. Because it is not built around the assumption of storage on hard disk or flash memory, the IMDS can eliminate the various DBMS sub-systems required for persistent storage, including cache management, file management and others. For this reason, an in-memory database is also faster than a conventional database that is either fully-cached or stored on a RAM-disk.

In other areas (not related to persistent storage) an IMDS can offer the same features as a traditional DBMS. These include SQL and/or native language (C/C++, Java, C#, etc.) programming interfaces; formal data definition language (DDL) and database schemas; support for relational, object-oriented, network or combination data designs; transaction logging; database indexes; client/server or in-process system architectures; security features, etc. The list could go on and on. In-memory database systems are a sub-category of DBMSs, and should be able to do everything that entails.

Q2. What are significant differences between an in-memory database versus a database that happens to be in memory (e.g. deployed on a RAM-disk).

Steve Graves: We use the comparison to illustrate IMDSs’ contribution to performance beyond the obvious elimination of disk I/O. If IMDSs’ sole benefit stemmed from getting rid of physical I/O, then we could get the same performance by deploying a traditional DBMS entirely in memory – for example, using a RAM-disk in place of a hard drive.

We tested an application performing the same tasks with three storage scenarios: using an on-disk DBMS with a hard drive; the same on-disk DBMS with a RAM-disk; and an IMDS (McObject’s eXtremeDB). Moving the on-disk database to a RAM drive resulted in nearly 4x improvement in database reads, and more than 3x improvement in writes. But the IMDS (using main memory for storage) outperformed the RAM-disk database by 4x for reads and 420x for writes.

Clearly, factors other than eliminating disk I/O contribute to the IMDS’s performance – otherwise, the DBMS-on-RAM-disk would have matched it. The explanation is that even when using a RAM-disk, the traditional DBMS is still performing many persistent storage-related tasks.
For example, it is still managing a database cache – even though the cache is now entirely redundant, because the data is already in RAM. And the DBMS on a RAM-disk is transferring data to and from various locations, such as a file system, the file system cache, the database cache and the client application, compared to an IMDS, which stores data in main memory and transfers it only to the application. These sources of processing overhead are hard-wired into on-disk DBMS design, and persist even when the DBMS uses a RAM-disk.

An in-memory database system also uses the storage space (memory) more efficiently.
A conventional DBMS can use extra storage space in a trade-off to minimize disk I/O (the assumption being that disk I/O is expensive, and storage space is abundant, so it’s a reasonable trade-off). Conversely, an IMDS needs to maximize storage efficiency because memory is not abundant in the way that disk space is. So a 10 gigabyte traditional database might only be 2 gigabytes when stored in an in-memory database.

Q3. What is in your opinion the current status of the in-memory database technology market?

Steve Graves: The best word for the IMDS market right now is “confusing.” “In-memory database” has become a hot buzzword, with seemingly every DBMS vendor now claiming to have one. Often these purported IMDSs are simply the providers’ existing disk-based DBMS products, which have been tweaked to keep all records in memory – and they more closely resemble a 100% cached database (or a DBMS that is using a RAM-disk for storage) than a true IMDS. The underlying design of these products has not changed, and they are still burdened with DBMS overhead such as caching, data transfer, etc. (McObject has published a white paper, Will the Real IMDS Please Stand Up?, about this proliferation of claims to IMDS status.)

Only a handful of vendors offer IMDSs that are built from scratch as in-memory databases. If you consider these to comprise the in-memory database technology market, then the status of the market is mature. The products are stable, have existed for a decade or more and are deployed in a variety of real-time software applications, ranging from embedded systems to real-time enterprise systems.

Q4. What are the application types that benefit the use of an in-memory database system?

Steve Graves: Application types that benefit from an IMDS are those for which eliminating latency is a key design goal, and those that run on systems that simply have no persistent storage, like network routers and low-end set-top boxes. Sometimes these types overlap, as in the case of a network router that needs to be fast, and has no persistent storage. Embedded systems often fall into the latter category, in fields such as telco and networking gear, avionics, industrial control, consumer electronics, and medical technology. What we call the real-time enterprise sector is represented in the first category, encompassing uses such as analytics, capital markets (algorithmic trading, order matching engines, etc.), real-time cache for e-commerce and other Web-based systems, and more.

Software that must run with minimal hardware resources (RAM and CPU) can also benefit.
As discussed above, IMDSs eliminate sub-systems that are part-and-parcel of on-disk DBMS processing. This streamlined design results in a smaller database system code size and reduced demand for CPU cycles. When it comes to hardware, IMDSs can “do more with less.” This means that the manufacturer of, say, a set-top box that requires a database system for its electronic programming guide, may be able to use a less powerful CPU and/or less memory in each box when it opts for an IMDS instead of an on-disk DBMS. These manufacturing cost savings are particularly desirable in embedded systems products targeting the mass market.

Q5. McObject offers an in-memory database system called eXtremeDB, and an open source embedded DBMS, called Perst. What is the difference between the two? Is there any synergy between the two products?

Steve Graves: Perst is an object-oriented embedded database system.
It is open source and available in Java (including Java ME) and C# (.NET) editions. The design goal for Perst is to provide as nearly transparent persistence for Java and C# objects as practically possibly within the normal Java and .NET frameworks. In other words, no special tools, byte codes, or virtual machine are needed. Perst should provide persistence to Java and C# objects while changing the way a programmer uses those objects as little as possible.

eXtremeDB is not an object-oriented database system, though it does have attributes that give it an object-oriented “flavor.” The design goals of eXtremeDB were to provide a full-featured, in-memory DBMS that could be used right across the computing spectrum: from resource-constrained embedded systems to high-end servers used in systems that strive to squeeze out every possible microsecond of latency. McObject’s eXtremeDB in-memory database system product family has features including support for multiple APIs (SQL ODBC/JDBC & native C/C++, Java and C#), varied database indexes (hash, B-tree, R-tree, KD-tree, and Patricia Trie), ACID transactions, multi-user concurrency (via both locking and “optimistic” transaction managers), and more. The core technology is embodied in the eXtremeDB IMDS edition. The product family includes specialized editions, built on this core IMDS, with capabilities including clustering, high availability, transaction logging, hybrid (in-memory and on-disk) storage, 64-bit support, and even kernel mode deployment. eXtremeDB is not open source, although McObject does license the source code.

The two products do not overlap. There is no shared code, and there is no mechanism for them to share or exchange data. Perst for Java is written in Java, Perst for .NET is written in C#, and eXtremeDB is written in C, with optional APIs for Java and .NET. Perst is a candidate for Java and .NET developers that want an object-oriented embedded database system, have no need for the more advanced features of eXtremeDB, do not need to access their database from C/C++ or from multiple programming languages (a Perst database is compatible with Java or C#), and/or prefer the open source model. Perst has been popular for smartphone apps, thanks to its small footprint and smart engineering that enables Perst to run on mobile platforms such as Windows Phone 7 and Java ME.
eXtremeDB will be a candidate when eliminating latency is a key concern (Perst is quite fast, but not positioned for real-time applications), when the target system doesn’t have a JVM (or sufficient resources for one), when the system needs to support multiple programming languages, and/or when any of eXtremeDB’s advanced features are required.

Q6. What are the current main technological developments for in-memory database systems?

Steve Graves: At McObject, we’re excited about the potential of IMDS technology to scale horizontally, across multiple hardware nodes, to deliver greater scalability and fault-tolerance while enabling more cost-effective system expansion through the use of low-cost (i.e. “commodity”) servers. This enthusiasm is embodied in our new eXtremeDB Cluster edition, which manages data stores across distributed nodes. Among eXtremeDB Cluster’s advantages is that it eliminates any performance ceiling from being CPU-bound on a single server.

Scaling across multiple hardware nodes is receiving a lot of attention these days with the emergence of NoSQL solutions. But database system clustering actually has much deeper roots. One of the application areas where it is used most widely is in telecommunications and networking infrastructure, where eXtremeDB has always been a strong player. And many emerging application categories – ranging from software-as-a-service (SaaS) platforms to e-commmerce and social networking applications – can benefit from a technology that marries IMDSs’ performance and “real” DBMS features, with a distributed system model.

Q7. What are the similarities and differences between current various database clustering solutions? In particular, let’s look at dimensions such as scalability, ACID vs. CAP, intended/applicable problem domains, structured vs. unstructured, and complexity of implementation.

Steve Graves: ACID support vs. “eventual consistency” is a good place to start looking at the differences between clustering database solutions (including some cluster-like NoSQL products). ACID-compliant transactions will be Atomic, Consistent, Isolated and Durable; consistency implies the transaction will bring the database from one valid state to another and that every process will have a consistent view of the database. ACID-compliance enables an on-line bookstore to ensure that a purchase transaction updates the Customers, Orders and Inventory tables of its DBMS. All other things being equal, this is desirable: updating Customers and Orders while failing to change Inventory could potentially result in other orders being taken for items that are no longer available.

However, enforcing the ACID properties becomes more of a challenge with distributed solutions, such as database clusters, because the node initiating a transaction has to wait for acknowledgement from the other nodes that the transaction can be successfully committed (i.e. there are no conflicts with concurrent transactions on other nodes). To speed up transactions, some solutions have relaxed their enforcement of these rules in favor of an “eventual consistency” that allows portions of the database (typically on different nodes) to become temporarily out-of-synch (inconsistent).

Systems embracing eventual consistency will be able to scale horizontally better than ACID solutions – it boils down to their asynchronous rather than synchronous nature.

Eventual consistency is, obviously, a weaker consistency model, and implies some process for resolving consistency problems that will arise when multiple asynchronous transactions give rise to conflicts. Resolving such conflicts increases complexity.

Another area where clustering solutions differ is along the lines of shared-nothing vs. shared-everything approaches. In a shared-nothing cluster, each node has its own set of data.
In a shared-everything cluster, each node works on a common copy of database tables and rows, usually stored in a fast storage area network (SAN). Shared-nothing architecture is naturally more complex: if the data in such a system is partitioned (each node has only a subset of the data) and a query requests data that “lives” on another node, there must be code to locate and fetch it. If the data is not partitioned (each node has its own copy) then there must be code to replicate changes to all nodes when any node commits a transaction that modifies data.

NoSQL solutions emerged in the past several years to address challenges that occur when scaling the traditional RDBMS. To achieve scale, these solutions generally embrace eventual consistency (thus validating the CAP Theorem, which holds that a system cannot simultaneously provide Consistency, Availability and Partition tolerance). And this choice defines the intended/applicable problem domains. Specifically, it eliminates systems that must have consistency. However, many systems don’t have this strict consistency requirement – an on-line retailer such as the bookstore mentioned above may accept the occasional order for a non-existent inventory item as a small price to pay for being able to meet its scalability goals. Conversely, transaction processing systems typically demand absolute consistency.

NoSQL is often described as a better choice for so-called unstructured data. Whereas RDBMSs have a data definition language that describes a database schema and becomes recorded in a database dictionary, NoSQL databases are often schema-less, storing opaque “documents” that are keyed by one or more attributes for subsequent retrieval. Proponents argue that schema-less solutions free us from the rigidity imposed by the relational model and make it easier to adapt to real-world changes. Opponents argue that schema-less systems are for lazy programmers, create a maintenance nightmare, and that there is no equivalent to relational calculus or the ANSI standard for SQL. But the entire structured or unstructured discussion is tangential to database cluster solutions.

Q7. Are in-memory database systems an alternative to classical disk-based relational database systems?

Steve Graves: In-memory database systems are an ideal alternative to disk-based DBMSs when performance and efficiency are priorities. However, this explanation is a bit fuzzy, because what programmer would not claim speed and efficiency as goals? To nail down the answer, it’s useful to ask, “When is an IMDS not an alternative to a disk-based database system?”

Volatility is pointed to as a weak point for IMDSs. If someone pulls the plug on a system, all the data in memory can be lost. In some cases, this is not a terrible outcome. For example, if a set-top box programming guide database goes down, it will be re-provisioned from the satellite transponder or cable head-end. In cases where volatility is more of a problem, IMDSs can mitigate the risk. For example, an IMDS can incorporate transaction logging to provide recoverability. In fact, transaction logging is unavoidable with some products, such as Oracle’s TimesTen (it is optional in eXtremeDB). Database clustering and other distributed approaches (such as master/slave replication) contribute to database durability, as does use of non-volatile RAM (NVRAM, or battery-backed RAM) as storage instead of standard DRAM. Hybrid IMDS technology enables the developer to specify persistent storage for selected record types (presumably those for which the “pain” of loss is highest) while all other records are managed in memory.

However, all of these strategies require some effort to plan and implement. The easiest way to reduce volatility is to use a database system that implements persistent storage for all records by default – and that’s a traditional DBMS. So, the IMDS use-case occurs when the need to eliminate latency outweighs the risk of data loss or the cost of the effort to mitigate volatility.

It is also the case that FLASH and, especially, spinning memory are much less expensive than DRAM, which puts an economic lid on very large in-memory databases for all but the richest users. And, riches notwithstanding, it is not yet possible to build a system with 100’s of terabytes, let alone petabytes or exabytes, of memory, whereas spinning memory has no such limitation.

By continuing to use traditional databases for most applications, developers and end-users are signaling that DBMSs’ built-in persistence is worth its cost in latency. But the growing role of IMDSs in real-time technology ranging from financial trading to e-commerce, avionics, telecom/Netcom, analytics, industrial control and more shows that the need for speed and efficiency often outweighs the convenience of a traditional DBMS.

Steve Graves is co-founder and CEO of McObject, a company specializing in embedded Database Management System (DBMS) software. Prior to McObject, Steve was president and chairman of Centura Solutions Corporation and vice president of worldwide consulting for Centura Software Corporation.

Related Posts

A super-set of MySQL for Big Data. Interview with John Busch, Schooner.

Re-thinking Relational Database Technology. Interview with Barry Morris, Founder & CEO NuoDB.

On Data Management: Interview with Kristof Kloeckner, GM IBM Rational Software.

vFabric SQLFire: Better then RDBMS and NoSQL?

Related Resources

ODBMS.ORG: Free Downloads and Links:
Object Databases
NoSQL Data Stores
Graphs and Data Stores
Cloud Data Stores
Object-Oriented Programming
Entity Framework (EF) Resources
ORM Technology
Object-Relational Impedance Mismatch
Databases in general
Big Data and Analytical Data Platforms


http://www.odbms.org/blog/2012/03/in-memory-database-systems-interview-with-steve-graves-mcobject/feed/ 0
Interview with Iran Hutchinson, Globals. http://www.odbms.org/blog/2011/06/interview-with-iran-hutchinson-globals/ http://www.odbms.org/blog/2011/06/interview-with-iran-hutchinson-globals/#comments Mon, 13 Jun 2011 22:06:22 +0000 http://www.odbms.org/blog/?p=820 “ The newly launched Globals initiative is not about creating a new database.
It is however, about exposing the core multi-dimensional arrays directly to developers.” — Iran Hutchinson.


InterSystems recently launched a new initiative: Globals.
I wanted to know more about Globals. I have therefore interviewed Iran Hutchinson, software/systems architect at InterSystems and one of the people behind the Globals project.


Q1. InterSystems recently launched a new database product: Globals. Why a new database? What is Globals?

Iran Hutchinson: InterSystems has continually provided innovative database technology to its technology partners for over 30 years. Understanding customer needs to build rich, high-performance, and scalable applications resulted
in a database implementation with a proven track record. The core of the database technology is multi-dimensional arrays (aka globals).
The newly launched Globals initiative is not about creating a new database. It is however, about exposing the core multi-dimensional arrays directly to developers. By closely integrating access into development technologies like Java and JavaScript, developers can take full advantage of high-performance access to our core database components.

We undertook this project to build much broader awareness of the technology that lies at the heart of all of our products. In doing so, we hope to build a thriving developer community conversant in the Globals technology, and aware of the benefits to this approach of building applications.

Q2. You classify Globals as a NoSQL-database. Is this correct? What are the differences and similarities of Globals with respect to other NoSQL databases in the market?

Iran Hutchinson: While Globals can be classified as a NoSQL database, it goes beyond the definition of other NoSQL databases. As you there are many different offerings in NoSQL and no key comparison matrices or feature lists. Below we list some comparisons and differences with hopes of later expanding the available information on the globalsdb.org website.

Globals differs from other NoSQL databases in a number of ways.

o It is not limited to one of the known paradigms in NoSQL (Column/Wide Column, Key-Value, Graph, Document, etc.). You can build your own paradigm on top of the core engine. This is an approach we took as we evolved Caché to support objects, xml, and relational, to name a few.
o Globals still offers optional transactions and locking. Though efficient in implementation we wanted to make sure that locking and transactions were at the discretion of the developer.
o MVCC is built into the database.
o Globals runs in-memory and writes data to disk.
o There is currently no sharding or replication available in Globals. We are discussing options for these features.
o Globals builds on the over 33 years of success of Caché. It is well proven. It is the exact same database technology. Globals will continue to evolve, and receive the innovations going into the core of Caché.
o Our goal with Globals is be a very good steward of the project and technology. The Globals initiative will also start to drive contests and events to further promote adoption of the technology, as well as innovative approaches to building applications. We see this stewardship as a key differentiator, along with the underlying flexible core technology.

• Globals shares similar traits with other NoSQL databases in the market.

o It is free for development and deployment.
o The data model can optionally use a schema. We mitigate the impact of using schemas by using the same infrastructure we use to store the data. The schema information and the data are both stored in globals.
o Developers can index their data.
o The document paradigm enabled by the Globals Document Store (GDS) API enables a query language for data stored using the GDS API. GDS is also an example of how to build a storage paradigm in Globals. Globals APIs are open source and available on the github link.
o Globals is fast and efficient at storing data. We know performance is one of many hallmarks of NoSQL. Globals can store data at rates exceeding 100,000 objects/records per second.
o Different technology APIs are available for use with Globals. We’ve released 2 Java APIs and the JavaScript API is immanent.

Q3. How do you position Globals with respect to Caché? Who should use Globals and who should use Caché?

Iran Hutchinson: Today, Globals offers multi-dimensional array storage, whereas Caché offers a much richer set of features. Caché (and the InterSystems technology it powers including Ensemble, DeepSee, HealthShare, and TrakCare) offers a core underlying object technology, native web services, distributed communication via ECP (Enterprise Cache Protocol), strategies for high availability, interactive development environment, industry standard data access (JDBC, ODBC, SQL, XML, etc.) and a host of other enterprise ready features.

Anyone can use Globals or Caché to tackle challenges with large data volumes (terabytes, petabytes, etc.), high transactions (100,000+ per second), and complex data (healthcare, financial, aerospace, etc.). However, Caché provides much of the needed out-of-box tooling and technology to get started rapidly building solutions in our core technology, as well as a variety of languages. Currently provided as Java APIs, Globals is a toolkit to build the infrastructure already provided by Caché. Use Caché if you want to get started today; use Globals if you have a keen interest in building the infrastructure of your data management system.

Q4. Globals offers multi-dimensional array storage. Can you please briefly explain this feature, and how this can be beneficial for developers?

Iran Hutchinson: It is beneficial to go here. I grabbed the following paragraphs directly from this page:

Summary Definition: A global is a persistent sparse multi-dimensional array, which consists of one or more storage elements or “nodes”. Each node is identified by a node reference (which is, essentially, its logical address). Each node consists of a name (the name of the global to which this node belongs) and zero or more subscripts.

Subscripts may be of any of the types String, int, long, or double. Subscripts of any of these types can be mixed among the nodes of the same global, at the same or different levels.

Benefits for developers: Globals does not limit developers to using objects, key-value, or any other type of storage paradigm. Developers are free to think of the optimal storage paradigm for what they are working on. With this flexibility, and the history of successful applications powered by globals, we think developers can begin building applications with confidence.

Q5. Globals does not include Objects. Is it possible to use Globals if my data is made of Java objects? If yes, how?

Iran Hutchinson:. Globals exposes a multi-dimensional sparse array directly to Java and other languages. While Globals itself does not include direct Java object storage technology like JPA or JDO, one can easily store and retrieve data in Java objects using the APIs documented here. Anyone can also extend Globals to support popular Java object storage and retrieval interfaces.

One of the core concepts in Globals is that it is not limited to a paradigm, like objects, but can be used in many paradigms. As an example, the new GDS (Globals Document Store) API enables developers to use the NoSQL document paradigm to store their objects in Globals. GDS is available here (more docs to come).

Q6. Is Globals open source?

Iran Hutchinson: Globals itself it not open source. However, the Globals APIs hosted at the github location are open source.

Q7. Do you plan to create a Globals Community? And if yes, what will you offer to the community and what do you expect back from the community?

Iran Hutchinson: We created a community for Globals from the beginning. One of the main goals of the Globals initiative is to create a thriving community around the technology, and applications built on the technology.
We offer the community:
• Proven core data management technology
• An enthusiastic technology partner that will continue to evolve and support project ◦ Marketing the project globally
◦ Continual underlying technology evolution ◦ Involvement in the forums and open source technology development ◦ Participation in or hosting events and contests around Globals.
• A venue to not only express ideas, but take a direct role in bringing those ideas to life in technology
• For those who want to build a business around Globals, 30+ years of experience in supplying software developers with the technology to build successful breakthrough applications.


Iran Hutchinson serves as product manager and software/systems architect at InterSystems. He is one of the people behind the Globals project. He has held architecture and development positions at startups and Fortune 50 companies. He focuses on language platforms, data management technologies, distributed/cloud computing, and high performance computing. When not on trail talking with fellow geeks or behind the computer you can find him eating (just look for the nearest steak house).


Globals is a free database from InterSystem. Globals offer multi-dimensional storage. The first version is for Java. Software | Intermediate | English | LINK | May 2011

Globals APIs
Globals APIs are open source available at github location .

Related Posts

Interview with Jonathan Ellis, project chair of Apache Cassandra.

The evolving market for NoSQL Databases: Interview with James Phillips.

– “Marrying objects with graphs”: Interview with Darren Wood.

“Distributed joins are hard to scale”: Interview with Dwight Merriman.

On Graph Databases: Interview with Daniel Kirstenpfad.

Interview with Rick Cattell: There is no “one size fits all” solution.

http://www.odbms.org/blog/2011/06/interview-with-iran-hutchinson-globals/feed/ 0
O/R Impedance Mismatch? Users Speak Up! Third Series of User Reports published. http://www.odbms.org/blog/2008/10/or-impedance-mismatch-users-speak-up-2/ http://www.odbms.org/blog/2008/10/or-impedance-mismatch-users-speak-up-2/#comments Thu, 23 Oct 2008 02:12:00 +0000 http://www.odbms.org/odbmsblog/2008/10/23/or-impedance-mismatch-users-speak-up-third-series-of-user-reports-published/ I have published the third series of user reports on using technologies for storing and handling persistent objects.
I have defined “users” in a very broad sense, including: CTOs, Technical Directors, Software Architects, Consultants, Developers, and Researchers.

The third series includes 7 new user reports from the following users:

– Peter Train, Architect, Standard Bank Group Limited, South Africa.
– Biren Gandhi, IT Architect and Technical Consultant, IBM Global Business Services, Germany.
– Sven Pecher, Senior Consultant, IBM Global Business Services, Germany.
– Frank Stuch, Managing Consultant, IBM Global Business Services, Germany.
– Hiroshi Miyazaki, Software Architect, Fujitsu, Japan.
– Robert Huber, Managing Director, 7r gmbh, Switzerland.
– Thomas Amberg, Software Engineer, Oberon microsystems, Switzerland.

I asked each users a number of equal questions, among them what experience do they have in using the various options available for persistence for new projects and what are the lessons learned in using such solution(s).

“Some of our newer systems have been developed in-house using an object oriented paradigm. Most (if not all) of these use Relational Database systems to store data and the “impedance mismatch” problem does apply” says Peter Train from Standard Bank.

The lessons learned using Object Relational mapping tools confirm the complexity of such technologies.

Peter Train explains: “The most common problems that we have experienced with object Relational mapping tools are:
i) The effort required to define mappings between the object and the relational models; ii) Difficulty in understanding how the mapping will be implemented at runtime and how this might impact performance and memory utilization. In some cases, a great deal of effort is spent tweaking configurations to achieve satisfactory performance.”

Frank Stuch from IBM Global Business Services has used Hibernate, EJB 2 and EJB 3 Entity Beans in several projects.
Talking about his experience with such tools he says: “EJB 2 is too heavy weight and outdated by EJB 3. EJB 3 is not supported well by development environments like Rational Application Developer and not mature enough. In general all of these solutions give the developer 90% of the comfort of an OODBMS with well established RDBMS.
The problem is that this comfort needs a good understanding of the impedance mismatch and the consequences on performance (e.g. “select n+1 problem”). Many junior developers don’t understand the impact and therefore the performance of the generated/created data queries are often very poor. Senior developers can work very efficient with e.g. Hibernate. “

In some special cases custom solutions have been built, like in the case of Thomas Amberg who works in mobile and embedded software and explains “We use a custom object persistence solution based on sequential serialized update operations appended to a binary file”.

The new 7 reports and the complete series of user reports are available for free download.

I plan to continue to publish users reports on a regular base.

http://www.odbms.org/blog/2008/10/or-impedance-mismatch-users-speak-up-2/feed/ 0
LINQ: the best option for a future Java query API? http://www.odbms.org/blog/2008/10/is-really-linq-best-option-for-future/ http://www.odbms.org/blog/2008/10/is-really-linq-best-option-for-future/#comments Tue, 07 Oct 2008 04:49:00 +0000 http://www.odbms.org/odbmsblog/2008/10/07/linq-the-best-option-for-a-future-java-query-api/ My interview to Mike Card has triggered an intense discussion (still ongoing), on the pros and cons of considering LINQ as the best option for a future Java query API.

There is a consensus that a common query mechanism for odbms is needed.

However, there is quite a disagreement on how this should be done. In particular, some see LINQ as a solution, provided that LINQ is also available for Java. Others on the contrary do not like LINQ, but would rather prefer a vendor neutral solution, for example based on SBQL.

You can follow the discussion here.

I have listed here some useful resources I published in ODBMS.ORG – related to this discussion:

Erik Meijer, José Blakeley
The Microsoft perspective on ORM
An Interview in ACM Queue Magazine with Erik Meijer and José Blakeley. With LINQ (language-integrated query) and the Entity Framework, Microsoft divided its traditional ORM technology into two parts: one part that handles querying (LINQ) and one part that handles mapping (Entity Framework).| September 2008 |

Panel Discussion “ODBMS: Quo Vadis?
Panel discussion with Mike Card, Jim Paterson, and Kazimierz Subieta, on their views on on some critical questions related to Object Databases: Where are Object Database Systems going? Are Relational database systems becoming Object Databases?
Do we need a standard for Object Databases? Why ODMG did not succeed?

Java Object Persistence: State of the Union PART II
Panel discussion with Jose Blakeley (Microsoft), Rick Cattell (Sun Microsystems), William Cook (University of Texas at Austin), Robert Greene (Versant), and Alan Santos (Progress). The panel addressed the ever open issue of the impedance mismatch.

Java Object Persistence: State of the Union PART I
Panel discussion with Mike Keith: EJB co-spec lead, main architect of Oracle Toplink ORM, Ted Neward: Independent consultant, often blogging on ORM and persistence topics, Carl Rosenberger: lead architect of db4objects, open source embeddable object database. Craig Russell: Spec lead of Java Data Objects (JDO) JSR, architect of entity bean engine in Sun’s appservers prior to Glassfish, on their views on the current State of the Union of object persistence with respect to Java.

Stack-Based Approach (SBA) and Stack-Based Query Language (SBQL)
Kazimierz Subieta, Polish-Japanese Institute of Information Technology
Introduction to object-oriented concepts in programming languages and databases, SBA and SBQL

The Object-Relational Impedance Mismatch
Scott Ambler, IBM. Scott explores the technical and the cultural impedance mismatch between the relational and the object world.

ORM Smackdown – Transcript
Ted Neward, Oren “Ayende” Eini. Transcripts of the Panel discussion “ORM Smackdown” on different viewpoints on Object-Relational Mapping (ORM) systems, courtesy of FranklinsNet.

OOPSLA Panel Objects and Databases
William Cook et.al. Transcript of a high ranking panel on objects and databases at the OOPSLA conference 2006, with representatives from BEA, db4objects, GemStone, Microsoft, Progress, Sun, and Versant.

http://www.odbms.org/blog/2008/10/is-really-linq-best-option-for-future/feed/ 0
Java Object Persistence: State of the Union PART II Published http://www.odbms.org/blog/2008/05/java-object-persistence-state-of-union/ http://www.odbms.org/blog/2008/05/java-object-persistence-state-of-union/#comments Thu, 15 May 2008 22:09:00 +0000 http://www.odbms.org/odbmsblog/2008/05/15/java-object-persistence-state-of-the-union-part-ii-published/ More on the topic of Java Object Persistence …
I have this time interviewed the following ODBMS.ORG experts Jose Blakeley (Microsoft), Rick Cattell (Sun Microsystems), William Cook (University of Texas at Austin), Robert Green (Versant) , and Alan Santos (Progress).

The panel addressed the ever open issue of the impedance mismatch, a problem which has existed ever since computers were used to persistently store data – in file systems or database management systems -, and where no fully satisfactory solutions have been found as of yet.

The complete panel transcript is available for free download (PDF)

“Today, I see two types of impedance mismatch problems,” says Jose Blakeley, a Partner Architect in the SQL Server Division at Microsoft. “(1) the application’s impedance mismatch problem, and (2) the impedance mismatch in data services.”

Alan Santos from data integration specialist Progress Software takes a different view: “Historically impedance mismatch has referred to the issues encountered when mapping data from a relational store into an object oriented data model. For some people, in some very practical ways, impedance mismatch is not an issue and has been solved with improvements in O/R mapping libraries and performance improvements in the runtime environments, as well as hardware itself.”

Rick Cattell, formerly Distinguished Engineer at Sun Microsystems who has been instrumental in the foundation of J2EE, SQL Access/ODBC and JDBC, sees three solutions to overcome the mismatch: “The top three options for Java are JDBC, O/R mapping, and an ODBMS.”

But panelists differed when asked about their views on whether object-relational mappers, relational databases and object databases were a suitable solution to the “object persistence” problem.

The panel also attempted to define new areas of research and development in object persistence.

Microsoft’s Blakeley: “I would like to see technologies like the EDM, EntitySQL, and EF be absorbed natively by relational database systems.”

UT Austin’s William Cook, a father of Apple Script, Safe and Native Queries agreed and wished that “major database vendors implement OQL (or some variant, like HQL) as a native database interface to their databases.”

I reccomend it, it is a very informative readings!

Here are the questions at a glance:

Question 1: Do we still have an “impedance mismatch problem”?

Question 2: In terms of what you’re seeing used in the industry, how would you position the various options available for persistence for new projects?

Question 3: What are in your opinion the pros and cons of these existing solutions?

Question 4: Do you believe that Object Relational Mappers are a suitable solution to the “object persistence” problem? If yes why? If not, why?

Question 5: Do you believe that Relational Database systems are a suitable solution to the “object persistence” problem? If yes why? If not, why?

Question 6: Do you believe that Object Database systems are a suitable solution to the “object persistence” problem? If yes why? If not, why?

Question 7: What would you wish as a new research/development in the area of Object Persistence in the next 12 months?

Question 8: If you were all powerful and could have influenced technology adoption in the last 10 years, what would today’s typical project use as a persistence mechanism and why?

Question 9: Any parting words about this topic?


http://www.odbms.org/blog/2008/05/java-object-persistence-state-of-union/feed/ 0
Java Object Persistence: State of the Union Published http://www.odbms.org/blog/2008/04/java-object-persistence-state-of-union-2/ http://www.odbms.org/blog/2008/04/java-object-persistence-state-of-union-2/#comments Wed, 02 Apr 2008 08:32:00 +0000 http://www.odbms.org/odbmsblog/2008/04/02/java-object-persistence-state-of-the-union-published/ The topic of Java Object Persistence is as actual as ever…

I have therefore interviewed together with InfoQ.com’s Floyd Marinescu the following group of leading persistence solution architects on their views on the current State of the Union of object persistence with respect to Java:

Mike Keith: EJB co-spec lead, main architect of Oracle Toplink ORM

Ted Neward: Independent consultant, often blogging on ORM and persistence topics

Carl Rosenberger: lead architect of db4objects, open source embeddable object database

Craig Russell: Spec lead of Java Data Objects (JDO) JSR, architect of entity bean engine in Sun’s appservers prior to Glassfish

Here are the questions at a glance:

Question 1: Do we still have an “impedance mismatch problem”?

Question 2: In terms of what you’re seeing used in the industry, how would you position the various options available for persistence for new projects?

Question 3: What are in your opinion the pros and cons of these existing solutions?

Question 4: Do you believe that Object Relational Mappers are a suitable solution to the “object persistence” problem? If yes why? If not, why?

Question 5: Do you believe that Relational Database systems are a suitable solution to the “object persistence” problem? If yes why? If not, why?

Question 6: Do you believe that Object Database systems are a suitable solution to the “object persistence” problem? If yes why? If not, why?

Question 7: What would you wish as a new research/development in the area of Object Persistence in the next 12 months?

Question 8: If you were all powerful and could have influenced technology adoption in the last 10 years, what would today’s typical project use as a persistence mechanism and why?

Question 9: Any parting words about this topic?

The answers we got differ, but I believe all panelists agree that there is no silver bullet….

The complete panel transcript is available for free download (PDF)


http://www.odbms.org/blog/2008/04/java-object-persistence-state-of-union-2/feed/ 0
Java Object Persistence: State of the Union http://www.odbms.org/blog/2008/03/java-object-persistence-state-of-the-union/ http://www.odbms.org/blog/2008/03/java-object-persistence-state-of-the-union/#comments Tue, 04 Mar 2008 00:55:00 +0000 http://www.odbms.org/odbmsblog/2008/03/04/java-object-persistence-state-of-the-union/ I have been working together with Floyd Marinescu, editor of InfoQ.com, and produced a virtual panel asking a group of leading persistence solution architects their views on the current state of the union in persistence in the Java community.

The Panelists we interviewed are:

Mike Keith : EJB co-spec lead, main architect of Oracle Toplink ORM

Ted Neward: Independent consultant, often blogging on ORM and persistence topics

Carl Rosenberger: lead architect of db4objects, open source embeddable object database

Craig Russell: Formerly the spec lead of Java Data Objects (JDO) JSR, architect of entity bean engine in Sun’s appservers prior to Glassfish

The complete panel transcript is also available for free download (PDF). It is an interesting readings…

Roberto V. Zicari

http://www.odbms.org/blog/2008/03/java-object-persistence-state-of-the-union/feed/ 0