On RedisGraph. Q&A with Daniel Howard
Q1. Why graphs are so important?
Daniel Howard: Organisations increasingly want to run heavily relationship-oriented queries on (very) large datasets. Graphs are very, very good at modelling relationships, and can dramatically increase the speed of the aforementioned queries, in some cases making them viable to run at all. Moreover, graphs are a very natural and intuitive way of viewing and understanding a heavily related system – to my mind, much more so than, for example, relational databases.
Q2. You recently reviewed RedisGraph. What is special about RedisGraph?
Daniel Howard: RedisGraph stores and represents graphs as sparse adjacency matrices, instead of adjacency lists, which are the industry standard. It then leverages the GraphBLAS processing engine to query them. This has a number of effects, but what it ultimately means is that querying in RedisGraph is exceptionally fast.
Q3. What are the advantages of representing and storing graphs as sparse adjacent matrices?
Daniel Howard: The big advantage of adjacency matrices (sparse or not) is that they reduce many queries to a matter of matrix multiplication, which is very fast. The classic disadvantage is that a full-sized adjacency matrix for a large or even medium-sized graph takes up an immense amount of space, to the point that storage becomes impractically expensive. However, since sparse adjacency matrices only store nonzero values – of which most real-world graphs have relatively few – they take up much less space. Hence, they retain the advantages in speed while being practical to store.
Q4. Can you please explain the main differences between sparse adjacent matrices vs. adjacency lists?
Daniel Howard: Adjacency lists are just that – lists. Each node in the graph has one, containing a list of all the other nodes it’s connected to. The collection of all these lists forms the representation of the graph. On the other hand, an adjacency matrix is a matrix (effectively a table) containing one row and column for each node in your graph, with any given entry containing a nonzero value if (and only if) the row and column it represents are connected. Therefore, any given graph will only have one adjacency matrix. Sparse adjacency matrices are conceptually similar to regular adjacency matrices, but only store nonzero values.
In RedisGraph’s case, they store these values in a single list, along with another two lists that together store the position each value would have in an equivalent full-sized adjacency matrix.
Q5. Historically, matrix representation scales poorly. What about RedisGraph?
Daniel Howard: As mentioned above, historically matrix representations have scaled poorly because they have taken up far too much space for large graphs. Consider: adding a new node to a non-sparse adjacency matrix means adding a new entry for every node that currently exists in your graph – you can imagine how expensive this would get in terms of storage when your graph has over a million nodes! Conversely, adding a new node to a sparse adjacency matrix only requires storing an additional three values (in RedisGraph’s implementation) for each existing node your new node is connected to. Since in most cases this will only be a handful of nodes, this means that the matrix grows in size far more slowly, to the point that it is quite comfortably manageable.
Q6. Don’t you think that writing queries in a subset of the Cypher query language is a limitation?
Daniel Howard: In this specific context the fact that it is a subset of Cypher is, of course, limiting, but that is likely to be temporary as we expect Redis to extend its support for Cypher over time. More generally, obviously the ideal would be to allow users to write queries in whichever language they want. However, this isn’t what we see happening in the market – most vendors currently support a single query language – and as languages go, openCypher is certainly a good pick. It’s well known and widely used, owing to it being open source and, of course, the fact that its parent language is used by Neo4j, which is undeniably the most popular product in the market. And frankly, those are the most useful properties that a language can have. Moreover, since Redis Labs is participating in the GQL project – a project for creating a dedicated, standardised graph query language – we fully expect them to support GQL when (if) it’s released.
Q7. Why not extending SQL for graphs, instead of creating a graphs query engine such as GraphBLAS?
Daniel Howard: SQL is a query language; GraphBLAS is, as you say, an engine, so they’re apples and oranges to begin with. Moreover, GraphBLAS is an open effort created specifically to facilitate applying graph algorithms to matrices, and particularly sparse matrices. Accordingly, it is much more specialised to RedisGraph’s purposes than any SQL query engine. That said, there are efforts underway to agree graph extensions to SQL, though it is likely that we are some years away from any standards emerging.
Q8. Why should we care of RedisGraph?
Daniel Howard: Because it’s one of fastest graphs available on the market, built on technology that a lot of organisations are already using. A recent benchmark (carried out by Redis Labs) put them roughly on par with TigerGraph (one of the fastest graphs currently available, without getting into extreme cases like Cray) and far faster than everything else assessed. Moreover, RedisGraph is still in its infancy as a solution. It’s entirely possible – in fact, from my position it seems quite likely – that RedisGraph will improve substantially in the months and years to come.
Daniel Howard, IT Industry Analyst and Senior Researcher at Bloor Research.
Daniel started in the IT industry relatively recently, in only 2014. Following the completion of his Masters in Mathematics at the University of Bath, he started working as a developer and tester at IPL (now part of Civica Group). His work there included all manner of software and web development and testing, usually in an Agile environment and usually to a high standard, including a stint working at an ‘innovation lab’ at Nationwide.
In the summer of 2016, Daniel’s father, Philip Howard, approached him with a piece of work that he thought would be enriched by the development and testing experience that Daniel could bring to the table. Shortly afterward, Daniel left IPL to work for Bloor Research as a researcher and the rest (so far, at least) is history.
Daniel primarily (although by no means exclusively) works alongside his father, providing technical expertise, insight and the ‘on-the-ground’ perspective of a (former) developer, in the form of both verbal explanation and written articles. His area of research is principally DevOps, where his previous experience can be put to the most use, but he is increasingly branching into related areas.
Outside of work, Daniel enjoys latin and ballroom dancing, skiing, cooking and playing the guitar.