Is openCypher The Main Portal For Graph Prime Time?
Emil Eifrem of Neo Technology explains how graph databases no longer just offer developers great database design – they also have their own “SQL”
Graph database approaches based on NoSQL are making a great contribution to business, as they are so good at managing the burgeoning amounts of unstructured data being collected by brands – connected data. And as the volume, velocity and variety of data increases, the network of relationships connecting data is growing even faster.
That’s a problem for older ways of working with data in commercial data processing, as relational (SQL) databases are designed for tabular data, with a consistent structure and a fixed schema. In fact, RDBMS works best for problems that are well defined at the outset: trying to answer queries about data relationships (e.g. a product recommendations engine, a social graph or the connections involved in uncovering fraud) with a relational database involves numerous JOINs between database tables.
The reality is that, despite their name, relational databases do not store relationships between data elements and so are not well suited for today’s highly connected data challenges. However, one clear advantage relational software holds over its NoSQL peers is SQL, i.e. that universal query language known far and wide adoption and which is the pre-eminent data query language in the minds of IT, analysts and business users to access database information.
What’s more, SQL and relational software grew up together and have had a successful working partnership for more than 35 years. The accessibility of SQL was probably one of the biggest factors in the rise of relational over its then-dominant network data format rivals.
Recall, in the 1980s when we had the “database wars” the original relational database players had their own individual query languages. But as relational databases were on the cusp of mainstream adoption, the major industry players all coalesced around SQL – and that’s exactly where we are with graphs today.
So, up until this juncture, graph databases offered great database design, but there has been no universal query support, which would be the key to broader market adoption. After all, when it comes to actually using a database, every developer, architect and business stakeholder needs a concrete mechanism for creating, manipulating and querying the data it holds.
There may be a seismic shift about to happen that will alter this picture radically. Step forward the openCypher project, a shared graph query language – agnostic of vendor or platform – that many of us believe will be a huge benefit to both vendors and users.
A high-quality query language that has broad adoption, Cypher is the best hope we have in the graph space of a SQL-like common language to help grow the space, as well as encouraging healthy graph supplier competition (another advantage to users).
Setting aside a few small but smart players such as Datastax, Neo4j has led and been largely unchallenged in the graph database sector. But as graph technologies go mainstream thanks to an increased understanding of real-world use cases, the development of fully native graph databases (with a sharded database in development) and wider graph adoption across major verticals like healthcare, media, and government, that’s set to change.
What’s more, Forrester Research estimates that one in four enterprises will be using such technology by 2017, and Gartner reports 70% of leading companies will pilot a graph database project of some significant kind by 2018 – so we know there’s enormous market potential here.
Big technology vendors from Oracle to IBM, Amazon and others, are using graph and the rise of a standard and open declarative data query language for graphs, openCypher, can only be a huge accelerant.
All these factors explain why we launched the openCypher project. We knew we’d struck gold when we released Neo4j 2.0 with Cypher because that was the day the demand for graph databases increased dramatically. As an objective measure of success the db-engines ranking for graph databases soared at the same time.
Meanwhile Cypher has received a lot of real-world and user validation. And on the technical level, Cypher is particularly well-suited to the challenges of querying connected data because it uses symbols to express patterns that correspond to our visual and intuitive representation of data. Plus, as a declarative query language Cypher lets users focus on their domain and express what data to retrieve, instead of getting lost in the mechanics of data access.
Designed to be a human-readable query language, Cypher is suitable for both the developer and the operations professional. That’s because the expressive querying of Cypher is inspired by a number of established practices: most of its keywords, such as WHERE and ORDER BY, are inspired by SQL, while pattern matching borrows from SPARQL. In addition some of the collection semantics have evolved from languages such as Haskell and Python.
All in all, Cypher is the closest thing to drawing on a whiteboard with a keyboard. Put another way graph databases are whiteboard friendly; Cypher makes them keyboard-friendly.
Join the growing openCypher community
The openCypher project already has the support of a wide community of graph technology players, including Oracle, Databricks (the company behind Apache Spark), Neo Technology, Tableau, Structr and a host of others.
However, we don’t just take suggestions from big players – we welcome suggestions from everyone in the developer world. It’s our aim to make the process of specifying and evolving the Cypher query language as open as possible. You can help by reading through and commenting on published language proposals, or if you want to go all in, write your own proposal with an implementation.
openCypher is a continual work in progress. Over the next few months we will move more of the language artifacts over to GitHub to make it available for everyone. In the meantime join our Google Group to stay involved in the evolution of the new SQL for graphs.
Cypher has introduced a world of opportunity to today’s graph developers; openCypher aims to open up more. SQL may still be ubiquitous, but nonetheless it is a language that can only understood by techies, as it’s too arcane and esoteric to be used by business decision makers. But just as graph databases has made the data modelling process more understandable, so a graph database query language makes it easier than ever for the non-technical to understand and create their own queries.
That makes Cypher a worthy successor to SQL – and which is why we’d love the programming world’s support for the openCypher project.
The author is co-founder and CEO of Neo Technology, the company behind Neo4j, the world’s leading graph database
Find out more about openCypher and its significance and momentum by watching Emil’s keynote at GraphConnect San Francisco
What does this mean in practice?
The openCypher project makes Cypher available to everyone – every data store, every tooling provider, every application developer. It promises to be as instrumental in the growth of graph processing and analysis as SQL was in accelerating the adoption of RDBMS.
openCypher is an open source project that delivers four key artifacts released under a permissive licenses:
Extensive Cypher reference documentation
A comprehensive user documentation describing use of the Cypher query language with examples and tutorials
A technology compatibility kit (TCK)
The TCK consists of a number of tests that a software supplier would run in order to self-certify support for a given version of the query language.
Distributed under the Apache 2.0 license, the reference implementation is a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool. The first planned deliverable is a parser that will take a Cypher statement and parse it into an AST (abstract syntax tree) representation. The reference implementation complements the documentation and tests by providing working implementations of Cypher – which are permissively licensed – and can be used as examples or as a foundation for one’s own implementation.
Cypher language specification
Licensed under a Creative Commons license, the Cypher language specification is a technical expression of the language syntax to enable parsers to auto-generate the query syntax. A full semantic specification is also planned as a part of the openCypher project.
It’s important to mote that the openCypher philosophy is one of practicality. We explicitly structured openCypher around working systems – this isn’t just a theoretical, academic discussion in a committee; we want openCypher to be a substance-oriented initiative that collaborates around actual working code. Even better, your code!