Graph Databases for Beginners: Why a Database Query Language Matters
Graph Databases for Beginners: Why a Database Query Language Matters
By Bryce Merkl Sasaki, Aspiring Graphista, Neo Technology | August 21, 2015
Finding the best database for your application or development stack is about more than just features, scalability and performance. While all of those are essential, there’s another element of a graph database too many architects overlook: the database query language.
Most relational databases (RDBMS) use a variant of SQL (Structured Query Language), making SQL the de facto database query language amongst most data professionals. But with the advent ofgraph databases that are more efficient than relational databases, it’s time for a corresponding shift to a more powerful query language.
In this “Graph Databases for Beginners” blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackledwhy graphs are the future, why data relationships matter, the basics of data modeling and how to avoid the most common (and fatal) data modeling mistakes.
This week, we’ll discuss why a database query language matters – even if you’re not a developer.
Why We Need Query Languages
Up to this point in our beginner’s series, all of our database models have been in the form of diagrams like the one below.
Graph diagrams like this one are perfect for describing a graph database outside of any technology context. However, when it comes to actually using a database, every developer, architect and business stakeholder needs a concrete mechanism for creating, manipulating and querying data. That is, we need a query language.
Up until now, the query language used by developers and data architects (i.e., SQL) was too arcane and esoteric to be understood by business decision makers. But just as graph databases have made the modeling process more understandable for the uninitiated, so has a graph database query language made it easier than ever for the common person to understand and create their own queries.
Why Linguistic Efficiency Matters
If you’re not a techie, you might be wondering why a database query language matters at all. After all, if query languages are anything like natural human languages, then shouldn’t they all be able to ultimately communicate the same point with just a few differences in phrasing? The answer is both yes and no.
Let’s consider a natural language example. In English, you might say, “I used to enjoy after-dinner conversation” while reminiscing about your childhood. In Spanish, this same phrase is written as, “Disfrutaba sobremesa.” Both languages express the same idea, but one is far more efficient at communicating it.
When it comes to a query language, the linguistics of efficiency are similar. A single query in SQL can be many lines longer than the same query in a graph database query language like Cypher. (Here’s one great example of efficient mapping from a natural language to Cypher.)
Lengthy queries not only take more time to run, but they also are more likely to include human coding mistakes because of their complexity. In addition, shorter queries increase the ease of understanding and maintenance across your team of developers. For example, imagine if an outside developer had to pick through a complicated query and try to figure out the intent of the original developer – trouble would certainly ensue.
But what level of efficiency gains are we talking about between SQL queries and graph queries? Howmuch more efficient is one versus another? The answer: Fast enough to make a significant difference to your business.
The efficiency of graph queries means they run in real time, and in an economy that runs at the speed of a single tweet, that’s a bottom-line difference you can’t afford to ignore.
The Intimate Relationship between Modeling and Querying
Before diving into the mechanics of a graph database query language below, it’s worth noting that a query language isn’t just about asking (a.k.a. querying) the database for a particular set of results; it’s also about modeling that data in the first place.
We know from previous posts that data modeling for a graph database is as easy as connecting circles and lines on a whiteboard. What you sketch on the whiteboard is what you store in the database.
On its own, this ease of modeling has many business benefits, the most obvious of which is that you can understand what the hell your database developers are actually creating. But there’s more to it: An intuitive model built with the right query language ensures there’s no mismatch between how you built the data and how you analyze it.
A query language represents its model closely. That’s why SQL is all about tables and joins while Cypher is about relationships between entities. As much as the graph model is more natural to work with, so is Cypher as it borrows from the pictorial representation of circles connected with arrows which even a child can understand.
In a relational database, the data modeling process is so far abstracted from actual day-to-day SQL queries that there’s a major disparity between analysis and implementation. In other words, the process of building a relational database model isn’t fit for asking (and answering) questions efficiently from that same model.
Graph database models, on the other hand, not only communicate how your data is related, but they also help you clearly communicate the kinds of questions you want to ask of your data model. Graph models and graph queries are just two sides of the same coin.
The right database query language helps us traverse both sides.
An Introduction to Cypher, the Graph Database Query Language
It’s time to dive into specifics. While most relational databases use a form of SQL as their query language, the graph database world is more varied so we’ll look specifically at a single graph database query language: Cypher.
Although currently specific to Neo4j, Cypher’s close affinity of representing graphs as diagrams makes it ideal for describing graphs. Cypher is arguably the easiest graph query language to learn, and once you understand Cypher, it becomes very easy to branch out and learn other graph query languages.
This introduction isn’t a reference document for Cypher but merely a high-level overview.
Cypher is designed to be easily read and understood by developers, database professionals and business stakeholders alike. It’s easy to use because it matches the way we intuitively describe graphs using diagrams.
The basic notion of Cypher is that it allows you to ask the database to find data that matches a specific pattern. Colloquially, we might ask the database to “find things like this,” and the way we describe what “things like this” look like is to draw them using ASCII art.
Consider the simple pattern in the figure below.
This graph diagram describes three mutual friends.
If we want to express the pattern of this basic graph in Cypher, we would write:
(emil)<-[:KNOWS]-(jim)-[:KNOWS]->(ian)-[:KNOWS]->(emil)
This Cypher statement describes a path which forms a triangle that connects an node we call jim
to the two nodes we call ian
and emil
, and which also connects the ian
node to the emil
node. As you can see, Cypher naturally follows the way we draw graphs on the whiteboard.
Now, while this Cypher pattern describes a simple graph structure it doesn’t yet refer to any particular data in the database. To bind the pattern to specific nodes and relationships in an existing dataset we first need to specify some property values and node labels that help locate the relevant elements in the dataset.
Here’s our more fleshed-out query:
(emil:Person {name:'Emil'}) <-[:KNOWS]-(jim:Person {name:'Jim'}) -[:KNOWS]->(ian:Person {name:'Ian'}) -[:KNOWS]->(emil)
Here we’ve bound each node to its identifier using its name
property and Person
label. The emil
identifier, for example, is bound to a node in the dataset with a label Person
and a name
property whose value is Emil
. Anchoring parts of the pattern to real data in this way is normal Cypher practice.
The Beginner’s Guide to Cypher Clauses
(Disclaimer: This section is still for beginners, but it’s definitely developer-oriented. If you’re just curious about database query languages in general, skip to the “Other Query Languages” section below for a nice wrap-up.)
Like most query languages, Cypher is composed of clauses.
The simplest queries consist of a MATCH
clause followed by a RETURN
clause. Here’s an example of a Cypher query that uses these three clauses to find the mutual friends of a user named Jim
:
MATCH (a:Person {name:'Jim'})-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:KNOWS]->(c) RETURN b, c
Let’s look at each clause in further detail:
MATCH
The MATCH
clause is at the heart of most Cypher queries.
Using ASCII characters to represent nodes and relationships, we draw the data we’re interested in. We draw nodes with parentheses, just like in these examples from the query above:
(a:Person {name:'Jim'}) (b) (c) (a)
We draw relationships using using pairs of dashes with greater-than or less-than signs (-->
and <--
) where the <
and >
signs indicate relationship direction. Between the dashes, relationship names are enclosed by square brackets and prefixed by a colon, like in this example from the query above:
-[:KNOWS]->
Node labels are also prefixed by a colon. As you see in the first node of the query, Person is the applicable label.
(a:Person … )
Node (and relationship) property key-value pairs are then specified within curly braces, like in this example:
( … {name:'Jim'})
In our original example query, we’re looking for a node labeled Person
with a name
property whose value is Jim
. The return value from this lookup is bound to the identifier a
. This identifier allows us to refer to the node that represents Jim throughout the rest of the query.
It’s worth noting that this pattern
(a)-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:KNOWS]->(c)
could, in theory, occur many times throughout our graph data, especially in a large user set.
To confine the query, we need to anchor some part of it to one or more places in the graph. In specifying that we’re looking for a node labeled Person
whose name property value is Jim
, we’ve bound the pattern to a specific node in the graph — the node representing Jim.
Cypher then matches the remainder of the pattern to the graph immediately surrounding this anchor point based on the provided information on relationships and neighboring nodes. As it does so, it discovers nodes to bind to the other identifiers. While a
will always be anchored to Jim, b
and c
will be bound to a sequence of nodes as the query executes.
RETURN
This clause specifies which expressions, relationships and properties in the matched data should be returned to the client. In our example query, we’re interested in returning the nodes bound to the b
and c
identifiers.
Other Cypher Clauses
Other clauses you can use in a Cypher query include:
WHERE
Provides criteria for filtering pattern matching results.
CREATE and CREATE UNIQUE
Create nodes and relationships.
MERGE
Ensures that the supplied pattern exists in the graph, either by reusing existing nodes and relationships that match the supplied predicates, or by creating new nodes and relationships.
DELETE/REMOVE
Removes nodes, relationships, and properties.
SET
Sets property values and labels.
ORDER BY
Sorts results as part of a RETURN
.
SKIP LIMIT
Skip results at the top and limit the number of results
FOREACH
Performs an updating action for each element in a list.
UNION
Merges results from two or more queries.
WITH
Chains subsequent query parts and forwards results from one to the next. Similar to piping commands in Unix.
If these clauses look familiar – especially if you’re a SQL developer – that’s great! Cypher is intended to be easy-to-learn for SQL veterans while also being easy for beginners. (Click here for the most up-to-date Cypher Refcard to take a deeper dive into the Cypher query language.)
At the same time, Cypher is different enough to emphasize that we’re dealing with graphs, not relational sets.
Other Query Languages
Cypher isn’t the only graph database query language; other graph databases have their own means of querying data as well. Many, including Neo4j, support the RDF query language SPARQL and the imperative, path-based query language Gremlin.
Conclusion
Not everyone gets hands-on with their database query language on the day-to-day level; however, your down-in-the-weeds development team needs a practical way of modeling and querying data, especially if they’re tackling a graph-based problem.
If your team comes from an SQL background, a query language like Cypher will be easy to learn and even easier to execute. And when it comes to your enterprise-level application, you’ll be glad that the language underpinning it all is build for speed and efficiency
Deepen your understanding of graph databases: Click below to get your free copy of the O’Reilly Graph Databases ebook and discover how to apply graph technologies to mission-critical problems at your enterprise.
Catch up with the rest of the Graph Databases for Beginners series:
- Graph Databases for Beginners: Why Graphs Are the Future
- Graph Databases for Beginners: Why Data Relationships Matter
- Graph Databases for Beginners: The Basics of Data Modeling
- Graph Databases for Beginners: Data Modeling Pitfalls to Avoid
About the Author
Bryce Merkl Sasaki, Aspiring Graphista
Bryce Merkl Sasaki is the Content Marketing Manager for Neo Technology. He studied professional and creative writing for undergrad and has been freelancing for 7 years. Recently, he worked at an inbound marketing agency in Philadelphia as a copywriter before moving to California. When not working, he likes to spend his time working on his novel, looking for pickup soccer games and reading voraciously.
Sponsored by Neo Technology