Embracing the evolution of Graphs
Embracing the evolution of Graphs
by Stephen Dillon, Data Architect, Schneider Electric. –January 2015.
Graph database technologies are poised to become even more influential in both the Big Data and Analytics communities.
There is a growing demand across many industries, more recently in the Energy sector, for a deeper understanding of data than what contemporary Row and Column store technologies provide. As our problems become increasingly complex and specialized, such as seeking semantic and associative results against static, dynamic, or kinetic data, we require more specialized solutions. Graphs are one such answer. However; tomorrow’s data storage and analytics problems will require such technologies to develop greater scalability, better optimized search paths, and a better overall experience.
It is fair to say; despite the renewed popularity of Graph stores, in recent years, they are still widely perceived to reside in a dark corner of the NoSQL industry. Users simply have not adopted them en masse. I expect this adoption rate will change in the near future, as such technologies evolve and as data\database professionals are forced to become both knowledgeable and skilled in the Graph domain.
There are some key reasons Graph data stores have not been adopted as highly as other NoSQL data stores.
First, the Big Data community has genrally lacked the knowledge and skills to work with them. We’ve simply had a lack of understanding of the applications of Graphs themselves. Graph theory and Graph data stores are often rather mysterious and one property graph database tends to look like another. It does not help that most vendors will tell you…”yes our solution can do that” regardless of what you are trying to do. So we, as data professionals, need to be better educated in order to best assess our needs. Second, there is a great misunderstanding held by many regarding data storage\structures in general. Some will attest that you can store any type of data in any database and it shouldn’t matter. I suppose; we may also contend that you can store soup in a shoebox…but we all know it’s going to get messy. It is rather surprising that it is not more obvious a one-size-fits-all data mentality is inherently flawed.
Although we can certainly can use non-graph databases, accompanied by some middleware to perform graph operations, as a bridge towards the very near future we ultimately will need to embrace proper graph technologies. We additionally must develop a fundamental understanding of the Graph domain. Else we cannot assess or create such technologies or know when we need to find alternative solutions to fill the gaps.
I believe Graph database technology will evolve and will do so relatively soon. Regardless of any advancements in Graph theory, from the academic community, I foresee such databases adopting many of the same practices their NoSQL and NewSQL breatheren have implemented over recent years. These include in-memory computing and horizontal data distribution in a shared-nothing architecture. I particularly emphasize the shared-nothing aspect so as not to confuse it with horizontal distribution across a single server. After all; you can only store so much data in a scale-up solution! I firmly believe the benefits of main-memory computing transcend all types of databases and Graph data stores are no exception.
We’ve already witnessed the significant improvements main-memory solutions such as VoltDB and Vertica have afforded Row and Column stores. There are also examples, albeit few circa 2014, of some Graph DB vendors beginning to offer main-memory and horizontally distributed solutions respectively. The reality is that such evolved Graph data stores need maturing. Databases leveraging main-memory should be designed for memory-first and not simply an addon to an existing architecture.
This is where Graph DB vendors would be wise to follow the VoltDB design model. Also; not all so called horizontal data patterns are equal and we can reasonably pereceive the limits of such architectures.
There are valid arguments against applying Graphs in a horizontally scaled architecture. There are however many use cases that may benefit from it and much more so in the property graph domain. Data simply is not going to stop growing. However; applying property graphs to such a data pattern will incur overhead when running breadth-based traversals (or non-targeted queries) across many nodes. Nobody said it would be easy but the storage and perfromance benefits for extremely large Graphs can be enormous. The good news is such issues are already understood in the academic world and in some cases are being resolved in the distributed RDBMS domain to which we may look to for guidance.
In conclusion; I believe, despite the need for some evolution, Graph databases hold a lot of potential for many data analysis problems and more people will adopt them. The hope is they are not merely adopted but are done so in an educated way. It is important we understand these technologies now so as to better solve the next generation of data analysis problems.
I encourage you to explore Graph theory and the supporting database technologies.