On Graph Databases. Q&A with Jim Webber
Q1. What are your current projects at Neo4j?
> I’m the “bookends” person at Neo4j. Part of my job is working with academia, where we’re doing intensive research on transaction processing, query languages, and so on. The other part of my job is working with customers who are using the product right now. The challenge is no less than the long-term research work, but the immediacy is incredible. The mixture of the two streams helps shape products that are fit for purpose now and in the future.
Q2. You recently released a book Graph Databases For Dummies. For whom is this book relevant?
> Graph Databases For Dummies was written for curious technical or managerial readers who want to get started with, or want to help others get started with, graph database technology.
My co-author (Rik Van Bruggen) and I thought hard about what we would want to see in a beginner’s book, based on our own experiences. There were plenty of useful information sources around for technically experienced folks (including our previous books), but there was a yawning gap for people completely new to the area. We wanted to fix that with a book that you could read in an evening.
Q3. What are the benefits in gaining some graph knowledge in business?
> Graphs are a really natural way of thinking about data. “This thing connects to that thing” is widely applicable and easy to understand. Multiply that up, and you can create some very sophisticated models. The increase in today’s data volume and connectedness presents an excellent opportunity for sustained competitive advantage for businesses.
Historically, you’d have to take a graph representation and twist it into other forms like tables or documents. But now we have graph databases, purpose-built to handle connected data so that impedance mismatch has gone, and along with it – I think – any excuse for not learning graphs.
Having the ability to store and query graphs allow businesses to exploit relevant and timely data and cut through complexity. What’s more is that graphs are inherently horizontal. Any effort invested in them in one vertical can be ported over to another – from healthcare to finance and energy to disaster response. At Neo4j, we work with customers like Comcast, eBay, NASA, UBS, and Volvo to solve the most challenging and valuable data problems.
Q4. If somebody is just diving into graphs, what are the main challenges in mastering graph databases?
> I fell into the graph universe over a decade ago. Back then, the biggest challenge was to unlearn what we’d learned about relational databases and accept that graphs are different. That unlearning curve plagued many of my early projects with a sense of anxiety – am I doing this right? What’s the equivalent of normal forms? Where do I go to learn more? Others who’ve been around the block expressed the same.
Those challenges are less of a problem now. We have great books, blog posts, whitepapers, conferences, tutorials, videos that help folks at all stages of their mastery. But the scope of graphs has expanded from databases to visualizations, analytics, and machine learning, which, taken as a whole, can be daunting.
I’d suggest taking it a step at a time. Pick a transactional graph problem that you understand and that is valuable to you, and solve that. You’re still building an app with your standard toolkit; it’s just the database and data model are different, so you have confidence. Once you’re happy with modeling and querying, then you can take steps to feed your graph into analytics and machine learning pipelines. But you don’t have to bite it all off at once.
Q5. What is the difference between a graph and a graph database?
> A graph comes from math. It’s a way of structuring and reasoning about values (data) and how they relate. As it happens, we’ve been informally using these ideas in software for ages. Whenever we design systems, we go to the whiteboard and sketch circles and arrows. That’s a graph, and it’s one reason software folks are so comfortable with graphs (and often less comfortable with other parts of math!).
Consequently, a graph database safely stores and efficiently queries graphs. That’s Neo4j‘s job – to take the nodes and relationships and properties in your domain model and make sure they’re redundantly stored and conveniently queried to power your applications and systems.
Q6. What are the pros and cons of using a graph database?
> Graphs are an expressive data type and can deal with the complexities, irregularities, and contradictions of modern business. Users of graphs experience excellent performance (particularly compared to other forms of databases) and find that their results are high quality. Moreover, graphs are the natural underlay for ML systems. Used together, businesses can expect accurate, high-value transactional, analytical, and learned results.
The cons are relatively few. There is a modest intellectual cost to learning graphs and the supporting technology stack, or you might not have a graph problem. I don’t think these are huge drawbacks since so many data problems are graphs. But if you have one that isn’t, it’s probably best to use a different technology.
Q7. When is it appropriate to use a Graph Database and when it is not appropriate to solve a specific problem? Any guidelines you could give us?
> The key to understanding when to use a graph database is the value of links. If your data is connected, then a graph is a good choice. This is often expressed in the language of the domain where words like “path” or “pathway,” “link,” or “relationship,” or “connection” are prevalent.
On the other hand, if your data is bulk storage, blob storage, time series, or logs, then a graph may not be the best choice because there aren’t many links between the data to exploit. Graphs are meant for general-purpose, but they aren’t the only useful data model.
Dr. Jim Webber is Neo4j’s Chief Scientist and Visiting Professor at Newcastle University. At Neo4j, Jim works on fault-tolerant graph databases and co-wrote Graph Databases (1st and 2nd editions, O’Reilly) and Graph Databases for Dummies (Wiley).
Prior to Neo4j, Jim worked on fault-tolerant distributed systems. First at Newcastle University startup Arjuna and then for a variety of clients for global consulting firm ThoughtWorks. Along the way Jim co-authored the books REST in Practice (O’Reilly) and Developing Enterprise Web Services – An Architect’s Guide (Prentice Hall).
Jim is active in the software development and database research communities, presenting regularly around the world. His blog is located at https://jimwebber.org and he tweets sometimes at @jimwebber.