How good is UML for Database Design? Interview with Michael Blaha.
„ The tools are not good at taking changes to a model and generating incremental design code to alter a populated database.”— Michael Blaha
The Unified Modeling Language™ – UML – is OMG’s most-used specification.
UML is a de facto standard for object modeling, and it is often used for database design as well. But how good is UML really for the task of database conceptual modeling?
I asked a few questions to Dr. Michael Blaha, one of the leading authorities on databases and data modeling.
Q1. Why using UML for database design?
Blaha: Often the most difficult aspect of software development is abstracting a problem and thinking about it clearly — that is the purpose of conceptual data modeling.
A conceptual model lets developers think deeply about a system, understand its core essence, and then choose a proper representation. A sound data model is extensible, understandable, ready to implement, less prone to errors, and usually performs well without special tuning effort.
The UML is a good notation for conceptual data modeling. The representation stands apart from implementation choices, be it a relational database, object oriented database, files, or some other mechanism.
Q2. What are the main weaknesses of UML for database design? And how do you cope with them in practice?
Blaha: First consider object databases. The design of object database code is similar to the design of OO programming code. The UML class model specifies the static data structure. The most difficult implementation issue is the weak support in many object database engines for associations. The workaround depends on object database features and the application architecture.
Now consider relational databases. Relational database tools do not support the UML. There is no technical reason for this, but several cultural reasons. One is that there is a divide between the programming and database communities; each has their own jargon, style, and history and pay little attention to the other.
Also the UML creators focused on unifying programming notation, but spent little time talking to the database community.
The bottom line is that the relational database tools do not support the UML and the UML tools do not support relational databases. In practice, I usually construct a conceptual model with a UML tool (so that I can think deeply and abstractly).
Then I rekey the model into a database tool (so that I can generate schema).
Q3. Even if you have a sound high level UML design, what else can get wrong?
Blaha: I do lots of database reverse engineering for my consulting clients, mostly for relational database applications because that’s what’s used most often in practice. I start with the database schema and work backwards to a conceptual model. I published a paper 10 years ago with statistics for what does go wrong.
In practice, I would say that about 25% of applications have a solid conceptual model, 50% have a mediocre conceptual model, and 25% are just downright awful. Given that a conceptual model is the foundation for an application, you can see why many applications go awry.
In practice, about 50% of applications have a professional database design and 50% are substantially flawed. It’s odd to see so many database design mistakes, given the wide availability of database design tools. It’s relatively easy to take a conceptual model and generate a database design. This illustrates that the importance of software engineering has not reached many developers.
Of course, there can always be flaws in programming logic and user interface code, but these kinds of flaws are easier to correct if there is a sound conceptual model underlying the application and if the model is implemented well with a database schema.
Q4. And specifically for object databases?
Blaha: An object database is nothing special when it comes to the benefits of a sound model and software engineering practice. A carefully considered conceptual model gives you a free hand to choose the most appropriate development platform.
One of my past books (Object-Oriented Modeling and Design for Database Applications) vividly illustrated this point by driving object-oriented data models into different implementation targets, specifically relational databases, object databases, and flat files.
Q5. What are most common pitfalls?
Blaha: It is difficult to construct a robust conceptual model. A skilled modeler must quickly learn the nuances of a problem domain and be able to meld problem content with data abstractions and data patterns.
Another pitfall is that it is important to perform agile development. Developers much work quickly, deliver often, obtain feedback, and build on prior results to evolve an application. I have seen too many developers not take the principles of agile development to heart and become bogged down by ponderous development of interminable scope.
Another pitfall is that some developers are sloppy with database design. Nowdays there really is no excuse for that as tools can
generate database code. Object-oriented CASE tools can generate programming stubs that can seed an object database.
For relational database projects, I first construct an object-oriented model, then re-enter the design into a relational database tool, and finally generate the database schema. (The UML data modeling notation is nearly isomorphic with the modeling language in most relational database design tools.)
Q6. In your experience, how do you handle the situation when a UML conceptual database design is done and a database is implemented using such design, but then later on, updates to the implementation are done without considering the original conceptual design. What to do in such cases?
Blaha: The more common situation is that an application gradually evolves and the software engineering documentation (such as the conceptual model) is not kept up to date.
With a lack of clarity for its intellectual focus, an application gradually degrades. Eventually there has to be a major effort to revamp the application and clean it up, or replace the application with a new one.
The database design tools are good at taking a model and generating the initial database design.
The tools are not good at taking changes to a model and generating incremental design code to alter a populated database.
Thus much manual effort is needed to make changes as an application evolves and keep documentation up to date. However, the alternative of not doing so is an application that eventually becomes a mess and is unmaintainable.
Michael Blaha is a partner at Modelsoft Consulting Corporation.
Dr. Blaha is recognized as one of the world’s leading authorities on databases and data modeling. He has more than 25 years of experience as a consultant and trainer in conceiving, architecting, modeling, designing, and tuning databases for dozens of major organizations around the world. He has authored six U.S. patents, six books, and many papers. Dr. Blaha received his doctorate from Washington University in St. Louis and is an alumnus of GE Global Research in Schenectady, New York.
– Object-Oriented Design of Database Stored Procedures, By Michael Blaha, Bill Huth, Peter Cheung
– Models, By Michael Blaha
– Universal Antipatterns, By Michael Blaha
– Patterns of Data Modelling (Database Systems and Applications),Blaha, Michael, CRC Press, May 2010, ISBN 1439819890
Regardless of the hype, UML is just a modeling notation with some semantics behind it. As I showed at http://www.agiledata.org/essays/umlDataModelingProfile.html it’s pretty straightforward to extend UML for physical data modeling as well as conceptual modeling.
UML is an excellent modeling notation for the design and development of database based applications. We are using it for years now and are using our own tool allowing us to transform an UML conceptual model into code for an applcation. This way we are not losing the power of UML and do not introduce errors by “translating” this model into a notation for yet another tool. The model has to stay the base from where to start and to which all changes must be applied to keep it consistent and up-to-date through the application life-cycle.
I definitly agree wtih this article. I recognized importance of conceptual models of UML Class diagram, which can be compatible with “Entity Relation” model. Furhtermore, CWM (Common warehouse metamodel), which is also OMG specification, provides ER model representation in UML profile.