Graphs vs. SQL. Interview with Michael Blaha
“For traditional business applications, the schema is known in advance, so there is no need to use a graph database which has weaker enforcement of integrity. If instead, you’re dealing with at best a generic model to which it conforms, then a schema-oriented approach does not provide much. Instead a graph-oriented approach is more natural and easier to develop against.”— Michael Blaha
Graphs, SQL and Databases. On this topic I have interviewed our expert Michael Blaha.
Q1. A lot of today’s data can be modeled as a heterogeneous set of “vertices” connected by a heterogeneous set of “edges”, people, events, items, etc. related by knowing, attending, purchasing, etc. This world view is not new as the object-oriented community has a similar perspective on data. What is in your opinion the main difference with respect to a graph-centric data world?
Michael Blaha: This world view is also not new because this is the approach Charlie Bachman took with network databases many years ago. I can think of at least two major distinguishing aspects of graph-centric databases relative to relational databases.
(1) Graph-centric databases are occurrence-oriented while relational databases are schema-oriented. If you know the schema in advance and must ensure that data conforms to it, then a schema-oriented approach is best. Examples include traditional business applications, such as flight reservations, payroll, and order processing.
(2) Graph-centric databases emphasize navigation. You start with a root object and pull together a meaningful group of related objects. Relational databases permit navigation via joins, but such navigation is more cumbersome and less natural. Many relational database developers are not adept at performing such navigation.
Q2. The development of scalable graph applications such for example for Facebook, and Twitter require different kind of databases than SQL. Most of these large Web companies have built their own internal graph databases. But what about other enterprise applications?
Michael Blaha: The key is the distinction between being occurrence-oriented and schema-oriented. For traditional business applications, the schema is known in advance, so there is no need to use a graph database which has weaker enforcement of integrity. If instead, you’re dealing with at best a generic model to which it conforms, then a schema-oriented approach does not provide much. Instead a graph-oriented approach is more natural and easier to develop against.
Q3: Marko Rodriguez and Peter Neubauer in an interview say that “the benefit of the graph comes from being able to rapidly traverse structures to an arbitrary depth (e.g., tree structures, cyclic structures) and with an arbitrary path description (e.g. friends that work together, roads below a certain congestion threshold). We call this data processing pattern, the graph traversal pattern. This mind set is much different from the set theoretic notions of the relational database world. In the world of graphs, everything is seen as a walk’s traversal”. What is your take on this?
Michael Blaha: That’s a great point and one that I should have mentioned in my answer to Q1. Relational databases have poor handling of recursion. I will note that the vendor products have extensions for this but they aren’t natural and are an awkward graft onto SQL. Graph databases, in contrast, are great with handling recursion. This is a big advantage of graph databases for applications where recursion arises.
Q4. Is there any synergy between graphs and conventional relational databases?
Michael Blaha: Graphs are also important for relational databases, and more so than some persons may realize…
— Graphs are clearly relevant for data modeling. An Entity-Relationship data model portrays the database structure as a graph.
— Graphs are also important for expressing database constraints. The OMG’s Object Constraint Language (OCL) expresses database constraints using graph traversal. The OCL is a textual language so it can be tedious to use, but it is powerful. The Common Warehouse Metamodel (CWM) specifies many fine constraints with the OCL and is a superb example of proper OCL usage.
— Even though the standard does not emphasize it, the OCL is also an excellent language for database traversal as a starting point for database queries. Bill Premerlani and I explained this in a past book (Object-Oriented Modeling and Design for Database Applications).
— Graphs are also helpful for characterizing the complexity of a relational database design. Robert Hilliard presents an excellent technique for doing this in his book (Information-Driven Business).
Q5. You say that graphs are important for data modeling, but at the end you do not store graphs in a relational database but tables, and you need joins to link them together… Graph databases in contrast cache what is on disk into memory and vendors claim that this makes for a highly reusable in-memory cache. What is your take on this?
Michael Blaha: Relational databases play many optimization games behind the covers. So in general, possible performance differences are often not obvious. I would say that the difference in expressiveness is what determines suitable applications for graph and relational databases and performance is a secondary issue, except for very specialized applications.
Q6: What are advantages of SQL relative to graph databases?
Michael Blaha: Here are some advantages of SQL:
— SQL has a widely-accepted standard.
— SQL is a set-oriented language. This is good for mass-processing of set-oriented data.
— SQL databases have powerful query optimizers for handling set-oriented queries, such as for data warehouses.
— The transaction processing behavior of relational databases (the ACID properties) are robust, powerful, and sound.
— SQL has extensive support for controlling data access.
Q7: What are disadvantages of SQL relative to graph databases?
Michael Blaha: Here are some disadvantages of SQL:
— SQL is awkward for processing the explosion of data that can result from starting with an object and traversing a graph.
SQL, at best, awkwardly handles recursion.
— SQL has lots of overhead for multi-user locking that can make it difficult to access individual objects and their data.
— Advanced and specialty applications often require less rigorous transaction processing with reduced overhead and higher throughput.
Q8: For which applications is SQL best? For which applications are graph databases best?
— SQL is schema based. Define the structure in advance and then store the data. This is a good approach for conventional data processing such as many business and financial systems.
— Graph databases are occurrence based. Store data and relationships as they are encountered. Do not presume that there is an encompassing structure. This is a good approach for some scientific and engineering applications as well as data that is acquired from Web browsers and search engines.
Q9. What about RDF quad/triple stores?
Michael Blaha: I have not paid much attention to this. RDF is an entity-attribute-value approach. From what I can tell, it seems occurrence based and not schema based and my earlier comments apply.
Michael Blaha is a partner at Modelsoft Consulting Corporation.
Blaha received his doctorate from Washington University in St. Louis, Missouri in Chemical Engineering with his dissertation being about databases. Both his academic background and working experience involve engineering and computer science. He is an alumnus of the GE R&D Center in Schenectady, New York, working there for eight years. Since 1993, Blaha has been a consultant and trainer in the areas of modeling, software architecture, database design, and reverse engineering. Blaha has authored six U.S. patents, four books, and many papers. Blaha is an editor for IEEE Computer as well as a member of the IEEE-CS publications board. He has also been active in the IEEE Working Conferences on Reverse Engineering.
Follow ODBMS.org on Twitter: @odbmsorg