On Designing and Building Enterprise Knowledge Graphs. Interview with Ora Lassila and Juan Sequeda
“The limits of my language mean the limits of my world.” – Ludvig Wittgenstein
I have interviewed Ora Lassila, Principal Graph Technologist in the Amazon Neptune team at AWS and Juan Sequeda, Principal Scientist at data.world. We talked about knowledge graphs and their new book.
Q1. You wrote a book titled “Designing and Building Enterprise Knowledge Graphs”. What was the main motivation for writing such a book?
Ora Lassila and Juan Sequeda: We wanted to tackle the topic of knowledge graphs more broadly than just from the technology standpoint. There is more than just technology (e.g., graph databases) when it comes to successfully building a knowledge graph.
Time and time again we see people thinking about knowledge graphs and jumping to the conclusion that they just need a graph database and start there. Not only is there more technology you need, but there are issues with people, processes, organizations, etc.
Q2. What are knowledge graphs and what are they useful for?
Ora Lassila and Juan Sequeda: We see knowledge graphs as a vehicle for data integration and to make data accessible within an organization. Note that when we say “accessible data”, we really mean this: accessible data = physical bits + semantics. The semantics part is really important, since no data is truly accessible unless you also understand what the data means and how to interpret it. We call this issue the “knowledge/data gap”; Chapter 1 of our book gets deep into this.
You could say that knowledge graphs are a way to “democratize” data: make data more accessible and understandable to people who are not technology experts.
Q3. Why connecting relational databases with knowledge graphs?
Ora Lassila and Juan Sequeda: Frankly, the majority of enterprise data is in relational databases, so this seemed like a very good way to scope the problem. At the beginning of our book we show examples of how data is connected today and frankly, it’s a pain. And it’s not just a technical pain, there are important social and organizational aspects to this.
Juan Sequeda: Understanding the relationship between relational databases and the semantic web/knowledge graphs has been my quest since my undergraduate years. The title of my PhD dissertation is “Integrating Relational Databases with the Semantic Web”. Therefore I can say that this is a passion of mine.
Q4. Does it make more sense to use a native graph database instead or a NoSQL database?
Ora Lassila and Juan Sequeda: There is always the question “why use X instead of Y?”… and the answer almost always is “it depends”. We even bring this up in the foreword: As computer scientists we understand that there are many technologies that can be used to solve any particular problem. Some are easier, more convenient, and others are not. Just because you can write software in assembly language does not mean you shouldn’t seek to use a high-level programming language. Same with databases: find one that suits your purpose best.
Q5. What are the typical roles within an organization responsible for the knowledge graph?
Ora Lassila and Juan Sequeda: Organizations really need to get into the mindset of treating data as a product. When you acknowledge this, you realize you need the roles for designing, implementing and managing products, in this case data products. We see upcoming roles such as data product managers and knowledge scientists (i.e. Knowledge Engineers 2.0). We get into this in Chapter 4 of our book.
Q6. Data and knowledge are often in silos. Sharing knowledge and data is sometimes hard in an enterprise. What are the technical and non technical reasons for that?
Ora Lassila and Juan Sequeda: Technical problems are solvable, and many solutions exist. That said, we think knowledge graphs are really addressing this issue nicely.
The non-technical issues are an interesting challenge, and in many ways more difficult: people and process, organizational structure, centralization vs decentralization, etc. One specific issue that shows up all the time is this: If you want to share knowledge within a broader organization, you have to cross organizational boundaries, and that lands you on someone else’s “turf”. There is a great deal of diplomacy that is needed to tackle these kinds of issues.
Q7. When is it more appropriate to use RDF graph technologies instead of native property graph technologies?
Ora Lassila and Juan Sequeda: First, we object to the notion of “native” when it comes to property graphs, they are no more native than RDF graphs.
These are two slightly different approaches to building graphs. Ultimately, the question is not all that interesting. A more interesting question is: When should you use a graph as opposed to something else? If you do decide to use a graph, there are a lot of considerations and modeling decisions before you even come to the question of RDF vs. property graphs.
Of course, RDF is better suited to some situations (e.g., when you use external data, or have to merge graphs from different sources). Try using property graphs there and you merely end up re-inventing mechanisms that are already part of RDF. On the other hand, property graphs often appeal more to software developers, thanks to available access mechanisms and programming language support (e.g., Gremlin).
Q8. How can enterprises successfully adopt knowledge graphs to integrate data and knowledge, without boiling the ocean?
Ora Lassila and Juan Sequeda: First of all, you can’t build enterprise knowledge graphs in a “boil the ocean” approach. No chance in hell. You first need to break the problem in smaller pieces, by business units and use cases. This ultimately is a people and process problem. The tech is already here.
That said, there is a certain “build it and they will come” aspect to knowledge graphs. You should think of them more as a platform rather than as an application. Start by knowing some use cases, and gradually generalize and widen your scope. But you need to be solving some pressing problems for the business. Spend time understanding the problems, the limitations of their current solutions (assuming they are somewhat viable) and finding a champion (i.e. “if you can solve this problem better/faster/etc, I’m all ears!”). Also try to avoid educating on the technology: Business units don’t care if their problem is solved with technology A, B or C… all they want is for their problem to be solved.
Q9. Knowledge graphs and AI. Is there any relationships between them?
Ora Lassila and Juan Sequeda: Yes. Knowledge Graphs are a modern solution to a long-time (and in some ways, “ultimate”) goal in computer science: to integrate data and knowledge at scale. For at least the past half century, we’ve seen independent and integrated contributions coming from the AI community (namely knowledge representation, a subfield of classical AI) and the data management community. See section 1.3 of the book.
Qx Anything else you wish to add?
Ora Lassila and Juan Sequeda: We see a lot of what Albert Einstein gave as the definition of insanity: Doing the same thing over and over, and expecting different results. We need to do something truly different. But this is challenging for many reasons, not least because of this:
“The limits of my language mean the limits of my world.” – Ludvig Wittgenstein
For example, if SQL is your language, it may be very hard for you to see that there are some completely different ways of solving problems (case in point: graphs and graph databases).
Another challenge is that there are hard people and process issues, but as technologists we are wired to focus on technology, and to seek how to scale and automate.
Finally, we think the “graph industry” needs to evolve past the RDF vs. property graphs issue. Most people do not care. We need graphs. Period.
Dr. Ora Lassila, Principal Graph Technologist in the Amazon Neptune team at AWS, mostly focusing on knowledge graphs. Earlier, he was a Managing Director at State Street, heading their efforts to adopt ontologies and graph databases. Before that, he worked as a technology architect at Pegasystems, as an architect and technology strategist at Nokia Location & Commerce (aka HERE), and prior to that he was a Research Fellow at the Nokia Research Center Cambridge. He was an elected member of the Advisory Board of the World Wide Web Consortium (W3C) in 1998-2013, and represented Nokia in the W3C Advisory Committee in 1998-2002. In 1996-1997 he was a Visiting Scientist at MIT Laboratory for Computer Science, working with W3C and launching the Resource Description Framework (RDF) standard; he served as a co-editor of the RDF Model and Syntax specification.
Juan Sequeda, Principal Scientist at data.world. He holds a PhD in Computer Science from The University of Texas at Austin. Juan’s goal is to reliably create knowledge from inscrutable data. His research and industry work has been on designing and building Knowledge Graph for enterprise data integration. Juan has researched and developed technology on semantic data virtualization, graph data modeling, schema mapping and data integration methodologies. He pioneered technology to construct knowledge graphs from relational databases, resulting in W3C standards, research awards, patents, software and his startup Capsenta (acquired by data.world). Juan strives to build bridges between academia and industry as the current co-chair of the LDBC Property Graph Schema Working Group, past member of the LDCB Graph Query Languages task force, standards editor at the World Wide Web Consortium (W3C) and organizing committees of scientific conferences, including being the general chair of The Web Conference 2023. Juan is also the co-host of Catalog and Cocktails, an honest, no-bs, non-salesy podcast about enterprise data.
Designing and Building Enterprise Knowledge Graphs Synthesis Lectures on Data, Semantics, and Knowledge August 2021, 165 pages, (https://doi.org/10.2200/S01105ED1V01Y202105DSK020) Juan Sequeda, data.world; Ora Lassila, Amazon
Follow us on Twitter: @odbmsorg