On Graph Databases: Interview with Daniel Kirstenpfad.
I wanted to know more about Sones, a relatively new graph database company based in Germany.
I therefore interviewed Daniel Kirstenpfad, their CTO.
Q1. Sones was founded in 2007 in Germany, with the aim of developing an object oriented database with its own file system technology. What motivated you to start the company?
Daniel Kirstenpfad: Building a product and a new technology with a great team surely was the main motivation to start sones. What started as a private project took off in 2007 when a business angel believed in our idea and started to fly with the substantial investments by T-Venture, TGFS and KfW. A great team and great opportunities are the things that keep that motivation high for everyone at sones.
Q2. Your database, ‘GraphDB’, is a graph database. Why did you develop a graph database? How does it differ from an object oriented database? Is GraphDB a NoSQL database?
Daniel Kirstenpfad: Well once upon a time sones got it’s name from the words “SOcial NEtwork Systems”. That time we mainly were focused on tools and technologies for social networks. We started to develop those technologies and found out that a graph-theoretical approach is the most natural one for typical problems in social networks.
When we drilled deeper into graph theory we found out that there are a bazillion real problems people are having which can be solved faster and more elegant with a graph-theory based database management system – not mentioning the new opportunities such graph-theory database brings.
Taking the knowledge about the tools and use cases we did previously we started the work on the multi purpose graph dbms that is today the open source “sones GraphDB”.
The sones GraphDB is a graph database management system which includes many important features object oriented databases typical have. Typically object oriented databases lack specific graph features like graph algorithms and easy to use property graph data structures (vertices, edges (directed/undirected), hyper edges,…). More importantly we think that an easy to learn and use query language which is derived from things the users know already like SQL needs to be included into a database these days – in sones GraphDB we have those features and more.
We think of NoSQL as “not only SQL”. When we think of SQL as relational databased think that this RDBMS technology is still very important for the field of use cased that RDBMSs are tailored to solve. It’s just that people found a huge number of problems that cannot be solved in a scalable and performant way with the SQL approach.
When we think of SQL as a query language we think that having an easy to use functionality to do ad-hoc queries on a large set of structured is probably one of the things that made those RDBMSs so popular. It’s the reason we at sones believe that it’s very important to have an easy to use query language which you can use to do ad-hoc queries on large sets of semi-structured and unstructured data. It’s the reason why the query language you use to query the sones GraphDB is solely based on the syntax of SQL: it’s easy to use and easy to learn. A typical user can write and run rather complex queries within minutes.
Q3. One feature of a graph database is to allow linking data from various SQL databases. How does it work in practice? And who needs such a feature?
Daniel Kirstenpfad: Data in most cases does not stand alone like a list. In most cases data comes with links and relations between the data. SQL is highly limited when working with complex linked data sets. Even worse with RDBMSs the query performance is significantly impacted when schemes are getting more complex. To overcome those performance and scalability limitations it can be helpful to establish a graph database layer which handles those complex linking needs. So in practice the user keeps his RDBMs databases and establishes what we call a “metadata repository” across his different RDBMs databases. This metadata repository is basically a large graph of all the edges that link data sets. Giving the user the ability to do ad-hoc graph queries globally on a virtually unlimited number of different RDBMs databases (or data silos as we call them).
That also answers the question which users benefit the most of such a feature: Everyone who has mutliple relational databases which each store another aspect of the data can benefit by establishing a globally available graph metadata repository.
Q4. How do you handle semistructured and unstructured data?
Daniel Kirstenpfad: Semistructured data is a compromise between structured and unstructured data without sacrificing important things. To handle semistructured data the sones GraphDB comes with a dynamic data scheme and consistency criteria. This data scheme is an extension of the OOP data model allowing inheritance, abstract types, data streams (binary – unstructured data), undefined Attributes, Editions and Revisions on Object Namespaces and Object Instances. Having a dynamic scheme means that the user can change the data scheme anytime without performance penalties. Unstructured data is handled either using undefined attributes on edges and vertices or using the binary data streams which can be handled by the sones GraphDB without any mapping to external storages – basically binary data is stored with data objects.
Q5. How do ensure scalability and performance?
Daniel Kirstenpfad: Ensuring scalability and performance was a major design goal through all development steps of the sones GraphDB. Because there are so many possible ways in enhancing the scalability and performance aspects of a graph database we have been busy in implementing some of those ways and we will be busy implementing more in the future. Currently Master-Slave replication is one key factor to scale the query performance of a data set. By replicating a data set onto multiple machines and running queries on read slaves the number of possible queries per seconds scales nearly linear. In the future the sones GraphDB will have an easy to use and extend framework to partition a graph onto multiple machines – coupled with the Master-Slave Replication this allows virtually unlimited sizes of data sets and virtually unlimited number of simultaneous queries.
Q6. How do you see the market for Cloud computing and open source?
Daniel Kirstenpfad: We see a very diverse cloud computing market with some bigger and some smaller cloud platform providers on the one hand and many customers who want to utilize these cloud platforms, either as a software vendor or a platform user, on the other hand.
Basically the cloud for us is another opportunity to host the services the sones GraphDB provides. It’s important for us to support the major cloud platforms like Microsofts Windows Azure and Amazons EC2. There are some cool technologies and frameworks that can be utilized by a database to store and access data in a fast and very scalable way.
What most of those cloud platforms lack these days is an actually useable system to allow software vendors like sones to publish their applications and services. Currently many users need to take the not-so-scenic route until they get their instance of something hosted on a cloud platform. It would be great to have a better software vendor integration and therefore allow the users to take the scenic route to service.
Q7. sones offers its product using a dual licensing scheme: open source software under AGPLv3, and a full enterprise version. Why did you choose a dual licensing scheme and how do handle the two license models?
Daniel Kirstenpfad: We started as a closed source company – mainly because we wanted to have something to share before we start going open source. Making the whole sones GraphDB open source was always an option – which we finally chose to take in Mid-2010. Since then there is an Open Source Edition of the sones GraphDB available which shares it’s code with the enterprise version.
What differentiates the Open Source Edition and the Enterprise Edition are feature plugins which extend the functionality in certain ways. There are plans to gradually publish the previously closed source plugins under open source licenses however. We chose this dual licensing scheme because first of all it gives us and the software the most flexibility. We can combine closed source technology – from partners for example – and open source technology using dual-licensing. And we think that this will suit most of our users needs best.
Q8. Sones has recently concluded a round of financing. What is your strategy for the next months?
Daniel Kirstenpfad: First of all we want to broaden our presence. One key factor to enable a great community is to start an open dialogue with the community. We want to bring our team to the next level – that means expanding the development, customer support, press activities and management team. We furthermore are going to expand our cooperations with universities and partners. Beyond anything partners are another key factor which will lead to the success of the sones GraphDB.