Data Science and Graphs. Q&A with Ben Squire (Meredith)
“The value of graph data lies in the story told by the relationships between data points“
Q1. Can you talk about the major differences in the data science projects you work on before using graph databases and after?
Two major differences in data science projects before and after working with graphs databases are in framing the problem statement and new approaches to solutions of the problem.
One clear example can be seen with our work with recommendation engines. Look-a-like modelling and customer segmentation are a major research & development area at Meredith, previously our work focused on web logs at a cookie based level, whereby given a particular action in our web logs, such as signing up for a newsletter or clicking on an advertisement, we would analyze traits of these cookies to find others which had similar interests or reading patterns to present similar offers or ads in order to have a higher click through rate than randomly selected audiences.
This solution was effective and showed positive results however drawbacks in activation and average number of visits per cookie left room for improvement. Graph databases reframed the problem in two ways, namely, we were able to visualize our web logs to find interconnected cookies overtime and transform these into profiles leading to a higher average number of visits per profile creating more accurate models.
Further, these profiles were seen over longer spans of time allowing for higher potential to reactivate again when added to a customer segmentation audience. Secondly, the potential to perform simple calculations based on profiles and reading patterns provides a new way for both URL recommendation engines or Look-a-like models using collaborative filtering that is inherently built into counting relationships that are observed in graph format. Hence, graph databases transformed a problem where our data was more interconnected than could be seen using the RDMS we had before as well as new approaches to previously solved problems to customer and article recommendation.
Q2. What’s the value of using graph data versus data from a traditional relational database?
The value of graph data lies in the story told by the relationships between data points. Data cubes stored in traditional relational databases are optimized for a variety of different applications, however they are constrained in ways that graph data is not with regards to how data is interrelated with each other. Graph data inherently tells a story by itself, visually, which can provide immediate insights that may not be possible in a row and column format.
Furthermore, flexibility in annotation and metadata related to various constructs with graph data is easily conveyed allowing for quick comparisons and references. There are an enormous amount of problems where relationships are key to understanding the insights that data can provide, traditional relational databases require joining different entities together in order to discover these insights where as a properly modelled graph database has them inherently built in. Graph data can utilize the power of graph algorithms and analytics to solve problems with much more efficiency than modeling the same problem in a RDBMS.
Q3. What have been some of the business benefits you’ve seen?
Business benefits I have seen with respect to graph databases are in data discovery, data lineage, and data visualization among others. Data discovery in the fact that by exploring different ways to model data as a graph which previously was contained in row column format, new insights can be immediately discovered with the topologic structure, i.e. – how the data is connected overall. I worked with web traffic logs for over 2 years in RDBMS before toying with graph database models and its impact was profound for our work in identity and analytics.
Data I had taken for granted as knowing front and back had different relationships that were not evident before that allowed us to build our own inhouse identity graph database. Another business impact I have seen from graph databases is analyzing data lineage across systems. Previously we had a variety of data systems stitched together across multiple analytics platforms, data marts, and data warehouses. Key engineers that had the full understanding of data flow through that system had left leaving only highly connected data dictionaries and ETL scripts to describe how data arrived from different sources, was processed, and eventually transformed into inputs for analytical models downstream across a variety of platforms.
Using a graph database of all of these documents detailing the ingestion, transformation, and processing from sources to consumption allowed a new perspective on how data arrived, what its dependencies were, and what downstream processed were affected by it. Lastly, data visualization is critical to why graph databases can be such effective tools in business. Decisions are most often made based on visualizations of the data, not numbers in a row and column matrix in excel. Having the ability to view the data in a connected system which can highlight both macro and micro relationships in the data can drive key decisions in business.
Ben Squire. Ben is a Senior Data Scientist at Meredith Corporation. His focus is digital advertising and profiling where he applies graph databases and algorithms to consolidate anonymous user profiles across various digital properties owned by Meredith.