Agile Software Development with Object Databases
by Tilmann Zäschke, January 2015.
With the rise of agile software development methodologies, object databases gain new momentum outside their traditional markets. Personally, I gained 2y experience with object databases (first ONTOS, then Versant) in a project for the European Space Agency’s ESOC facility in Germany. After that I worked on a second project (using Versant’s ODBMS) for ESA’s research center (ESTEC) in Netherlands. The second project involved an application with several TB of data and about 250 persistent classes. Especially the second project taught me many pitfalls of agile development with databases and how some of them can be alleviated by using ODBMS. Much of this experience plus additional research in form of a PhD thesis can be found here (http://dx.doi.org/10.3929/ethz-a-010346761).
One central feature of agile development practices is close collaboration with the end-users, ideally through frequent releases of the software. While such frequent releases to users are not necessarily common in agile projects, they should be employed whenever the project setting allows it. Because it allows users to get familiar with the software, contribute new ideas and identify bad requirements. If the user base is limited, direct communication between developers and end-users may also be a great motivation and allow more ‘agile’ development with respect to more direct feedback and changing requirements.
Frequent releases (every few weeks) imply that the user’s databases need to be frequently evolved to be compatible with the evolving data model of the application. Unfortunately, database evolution, which consists of schema evolution and data evolution.
For such frequent evolution, ODBMS have a number of advantages over traditional RDBMS. First, ODBMS have only one data model that needs to be understood (essential for the semantics of data evolution) and evolved, whereas RDBMS have a conceptual model, a separate logical model and a mapping layer. While there exist tools to automate schema and data evolution, the tools usually only support very simple schema changes, such as adding attributes, and basic data evolution, such as changing an attribute’s type from ‘float’ to ‘int’. Anything more complex, such as externalizing an attribute into a separate class (‘String address’ -> ‘class Address’), splitting attributes (‘name’ -> ‘firstName’ + ‘lastName’) or semantic changes (‘int lengthInFeet’ -> ‘int lengthInMeter’) can rarely be automated and need to be implemented manually.
What makes things even worse is that the development of database evolution ‘scripts’ is never ‘agile’, even in agile project. The usual incremental approach can not be applied to database evolution scripts because they need to be developed from scratch every few weeks. Furthermore, the quality requirements are higher than for other code, because loss of data (especially when it is subtle and noticed only much later) cannot be tolerated. In my experience, while ODBMS cannot solve these problems, they simplify the evolution task and let the developer focus on getting evolution right.
Another advantage of object databases is the ‘navigation’ feature that lets an application traverse references between objects. ‘Navigation’ reduces the need for queries and thus, since they can be refactored mostly automatically by modern IDEs, reduce the need for manual query refactoring. Anything that cannot be refactored automatically will result in compile time errors (which is much better than runtime errors from incompatible queries.
Furthermore, the tendency of ODBMS to not support UPDATE or WRITE in queries can be seen as an advantage, because updates need to be written in an OO-language, which means they will be encapsulated in classes where they are difficult to miss. This should prevent inconsistent databases resulting from UPDATE queries which we may have forgotten to evolve. As with ‘navigation’, these read-only queries force functionality into the programming language where refactoring is partly automated and thus less likely to be done wrong.
Finally, it seems that profiling and performance-related refactoring of the conceptual data model is much less worse than intuition may tell us if we apply some constraints. First we limit refactorings to split-class, merge-class, externalize attribute, remove-middle-man and similar operations. Then we ensure that we profile only navigation paths (sequences of accessed class fields, including references) within transaction boundaries. As a result, performance-related refactorings often coincide with semantically and conceptually meaningful refactorings. In simple terms this can be explained by the fact that data which is accessed in a single navigation path, especially inside a single transaction, often has a semantic connection. The strength of this semantic connection is directly related to the frequency with which the data is access together. Therefore, refactorings that optimise and shorten access paths to data appear likely to make semantic sense on the conceptual level of an application.
In summary, my experience shows that ODBMS can have numerous advantages over RDBMS, especially in agile projects with frequent user releases and complex data models. Even in cases where ODBMSs would not be the ‘technically’ best choice for a given project, I would still consider them because they can avoid headaches by reducing complexity and thus effectively reduce development time.