On Graphs, AI and ML. Q&A with Alicia Frame
Dr. Alicia Frame, Senior Data Scientist at Neo4j
Q1. Co-founder and CEO of Neo4j Emil Eifrem delivered a keynote at your recent NODES online developer summit. What are the latest trends in graphs? And Neo4j?
Our NODES (Neo4j Online Developer Expo and Summit) online conference went really well, with over 1600 developers and data scientists in attendance. Our CEO, Emil Eifrem, spoke to the Neo4j roadmap in which he laid out plans for future Neo4j product offerings and preparing the core Neo4j architecture for the performance demands of critical applications. Artificial intelligence is emerging as a use case for graph databases, with knowledge graphs often being the first step towards learning applications and predictive analytics.
Q2. The conference had more than 50 different talks by Neo4j employees and community members. What are the main highlights of the NODES program?
We had 53 talks with 60 speakers from 15 countries across five tracks covering topics from spatial, GraphQL/GRANDstack to software analytics and construction applications. At the highest level, it was a clear mark of the maturity and vibrancy of the Neo4j developer community, and their trust in Neo4j for being a ‘developer first’ organization.
Highlights included PayPal talking about using Neo4j for recommendations in a unified data catalog, Rice University talking about teaching graph databases as part of an introduction to databases course, Under Armour talking about using Neo4j to unite disparate systems that are involved in bringing a product to market, and Autodesk talking about using Neo4j to gain collaboration insights from data access analytics.
Q3. How do Graphs relate to AI and ML? Can you share some examples?
Our customers are leading the way in directing us on the many ways in which Neo4j graph databases can underpin artificial intelligence and machine learning. One broad area are network science based algorithm “recipes.” Customers use specific algorithms to answer questions about their data or pre-process their graph. Examples of algorithms used include:
- Weakly Connected Components to identify connected subgraphs to break a graph up in a pre-processing step
- Louvain to identify clusters and then manually reviewing the results
A second area is feature engineering for traditional model building approaches. This is where graphs are used to generate features which describe the topology and connectivity of a graph which turn out to be highly predictive features. Examples of feature engineering could include:
- Using graph based queries to extract relationship driven features that a domain expert thinks are important. This could include something like calculating “how many accounts within three hops of this individual have been labelled as fraudulent.”
- Using graph algorithms like page rank or label propagation to calculate new properties for each node (pageRank score, label propagation community) that describe a specific graph based measurement of each node’s connectivity or importance. Similarly a node embedding could be used to calculate specific aspects of connectivity in a machine readable way, which could be passed to an ML pipeline.
Another broad area is how graphs are applied for deep learning to create graph neural networks or graph convolutional neural networks. By representing data as a graph — instead of as a grid or linearly — our customers are finding that they have much greater flexibility in learned representations and transformations. This results in their deep learning models being more interpretable and having a far higher degree of accuracy.
Q4. Can ML be used to build a Knowledge Graph? What are the benefits?
Yes, machine learning can be used at various stages, including:
- Creation: To build the initial knowledge graph, NLP techniques can be used to extract concepts and their relationships from documents. For example, Named Entity Recognition can be used to extract Syntactically Linked Pairs. Machine learning can be used to standardize key terms against an ontology, and automate the construction of a knowledge graph from a body of documents. Named Entity Recognition (NER) can leverage everything from traditional dictionary based approaches to domain specific word embeddings, and Speech and Language Processing (SLP) can be extracted based on grammatical relationships or deep learning based sentence embeddings.
- Post data extraction: Once raw data extracted, you can add weights to the relationships based on, for example, how many times a pair of terms co-occurred, or you can use TF-IDF scores to identify which terms are relevant based on their relative frequency.
- Completion: If you have a knowledge graph, machine learning can be used for completion. Link prediction techniques can be used to predict new links that will form in the future, or links that should exist but were not observed in the input data. This is a popular area of research, especially in the biomedical space.
Qx. Anything else you wish to add?
The overall benefits of building a knowledge graph using machine learning are sizeable at every stage. If applied correctly, you can automate the creation of a knowledge graph from a large corpus of data, such as documents and records, to get started quickly. These automated techniques are less biased than manual review and it is easy to scale and repeat the process. The technique also pinpoints defined metrics for the quality of the data being generated. If your knowledge graph is near completion, machine learning assists in finding new patterns in the data you already have.
Sponsored by Neo4j