{"id":2743,"date":"2013-10-28T09:21:59","date_gmt":"2013-10-28T09:21:59","guid":{"rendered":"http:\/\/www.odbms.org\/blog\/?p=2743"},"modified":"2013-10-28T09:21:59","modified_gmt":"2013-10-28T09:21:59","slug":"on-multi-model-databases-interview-with-martin-schonert-and-frank-celler","status":"publish","type":"post","link":"https:\/\/www.odbms.org\/blog\/2013\/10\/on-multi-model-databases-interview-with-martin-schonert-and-frank-celler\/","title":{"rendered":"On multi-model databases. Interview with Martin Sch\u00f6nert and Frank Celler."},"content":{"rendered":"<blockquote><p><strong> <em>\u201cWe want to prevent a deadlock where the team is forced to switch the technology in the middle of the project because it doesn\u2019t meet the requirements any longer.\u201d<\/em>&#8211;Martin Sch\u00f6nert and Frank Celler.<\/strong>\n<\/p><\/blockquote>\n<p>On &#8220;multi-model&#8221; databases, I have interviewed <strong>Martin Sch\u00f6nert<\/strong> and <strong>Frank Celler<\/strong>, founders and creators of the open source <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.arangodb.org');\"  href=\"http:\/\/www.arangodb.org\">ArangoDB<\/a>.<\/p>\n<p>RVZ<\/p>\n<p><strong>Q1. What is ArangoDB and for what kind of applications is it designed for?<\/strong><\/p>\n<p><strong>Frank Celler:<\/strong> ArangoDB is a multi-model mostly-memory database with a flexible data model for documents and graphs. It is designed as a \u201cgeneral purpose database\u201d, offering all the features you typically need for modern web applications. <\/p>\n<p>ArangoDB is supposed to grow with the application\u2014the project may start as a simple single-server prototype, nothing you couldn\u2019t do with a relational database equally well. After some time, some geo-location features are needed and a shopping cart requires transactions. ArangoDB\u2019s graph data model is useful for the recommendation system. The smartphone app needs a lean API to the back-end\u2014this is where Foxx, ArangoDB\u2019s integrated Javascript application framework, comes into play.<br \/>\nThe overall idea is: \u201cWe want to prevent a deadlock where the team is forced to switch the technology in the middle of the project because it doesn\u2019t meet the requirements any longer.\u201d<\/p>\n<p>ArangoDB is open source (Apache 2 licence)\u2014you can get the source code at GitHub or download the precompiled binaries from our  <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.arangodb.org');\"  href=\"http:\/\/www.arangodb.org\">website<\/a>. <\/p>\n<p>Though ArangoDB as a universal approach, there are edge cases where we don\u2019t recommend ArangoDB. Actually, ArangoDB doesn\u2019t compete with massively distributed systems like <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Apache_Cassandra');\"  href=\"http:\/\/en.wikipedia.org\/wiki\/Apache_Cassandra\">Cassandra<\/a> with thousands of nodes and many terabytes of data. <\/p>\n<p><strong>Q2. What\u2019s so special about the ArangoDB data model?<\/strong><\/p>\n<p><strong>Martin Sch\u00f6nert:<\/strong> ArangoDB is a multi-model database. It stores documents in collections. A specialized binary data file format is used for disk storage. Documents that have similar structure (i.e., that have the same attribute names and attribute types) can share their structural information. The structure (called \u201cshape\u201d) is saved just once, and multiple documents can re-use it by storing just a pointer to their \u201cshape\u201d.<br \/>\nIn practice, documents in a collection are likely to be homogenous, and sharing the structure data between multiple documents can greatly reduce disk storage space and memory usage for documents.<\/p>\n<p><strong>Q3. Who is currently using ArangoDB for what?<\/strong><\/p>\n<p><strong>Frank Celler: <\/strong> ArangoDB is open source. You don\u2019t have to register to download the source code or precompiled binaries. As a user, you can get support via Google Group, GitHub\u2019s issue tracker and even via Twitter. We are very amenable, which is an essential part of the project. The drawback is that we don\u2019t really know what people are doing with ArangoDB in detail. We are noticing an exponentially increasing number of downloads over the last months.<br \/>\nWe are aware of a broad range of use cases: a CMS, a high-performance logging component, a geo-coding tool, an annotation system for animations, just to name a few. Other interesting use cases are single page apps or mobile apps via Foxx, ArangoDB\u2019s application framework. Many of our users have in-production experience with other NoSQL databases, especially the leading document stores. <\/p>\n<p><strong>Q4. Could you motivate your design decision to use Google\u2019s V8 JavaScript engine?<\/strong><\/p>\n<p><strong>Martin Sch\u00f6nert:<\/strong> ArangoDB uses <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/code.google.com\/p\/v8\/');\"  href=\"http:\/\/code.google.com\/p\/v8\/\">Google\u2019s V8 engine<\/a> to execute server-side JavaScript functions. Users can write server-side business logic in JavaScript and deploy it in ArangoDB. These so-called \u201cactions\u201d are much like stored procedures living close to the data.<br \/>\nFor example, with actions it is possible to perform cascading deletes\/updates, assign permissions, and do additional calculations and modifications to the data.<br \/>\nArangoDB also allows users to map URLs to custom actions, making it usable as an application server that handles client HTTP requests with user-defined business logic.<br \/>\nWe opted for Javascript as it meets our requirements for an \u201cembedded language\u201d in the database context:<br \/>\n\u2022\tJavascript is widely used. Regardless in which \u201cback-end language\u201d web developers write their code, almost everybody can code also in Javascript.<br \/>\n\u2022\tJavascript is effective and still modern.<br \/>\nJust as well, we chose Google V8, as it is the fastest, most stable Javascript interpreter available for the time being.<\/p>\n<p><strong>Q5. How do you query ArangoDB if you don\u2019t want to use JavaScript? <\/strong><\/p>\n<p><strong>Frank Celler: <\/strong> ArangoDB offers a couple of options for getting data out of the database. It has a REST interface for CRUD operations and also allows \u201cquerying by example\u201d. \u201cQuerying by example\u201d means that you create a JSON document with the attributes you are looking for. The database returns all documents which look like the \u201cexample document\u201d.<br \/>\nExpressing complex queries as JSON documents can become a tedious task\u2014and it\u2019s almost impossible to support joins following this approach. We wanted a convenient and easy-to-learn way to execute even complex queries, not involving any programming as in an approach based on map\/reduce. As ArangoDB supports multiple data models including graphs, it was neither sufficient to stick to SQL nor to simply implement UNQL. We ended up with the \u201cArangoDB query language\u201d (AQL), a declarative language similar to SQL and <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.jsoniq.org');\"  href=\"http:\/\/www.jsoniq.org\">Jsoniq<\/a>. AQL supports joins, graph queries, list iteration, results filtering, results projection, sorting, variables, grouping, aggregate functions, unions, and intersections.<br \/>\nOf course, ArangoDB also offers drivers for all major programming languages. The drivers wrap the mentioned query options following the paradigm of the programming language and\/or frameworks like <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/rubyonrails.org');\"  href=\"http:\/\/rubyonrails.org\">Ruby on Rails<\/a>.<\/p>\n<p><strong>Q6. How do you perform graph queries? How does this differ from systems such as Neo4J?<\/strong><\/p>\n<p><strong>Frank Celler: <\/strong> SQL can\u2019t cope with the required semantics to express the relationships between graph nodes, so graph databases have to provide other ways to access the data.<br \/>\nThe first option is to write small programs, so called \u201cpath traversals.\u201d In ArangoDB, you use Javascript; in <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.neo4j.org');\"  href=\"http:\/\/www.neo4j.org\">neo4j<\/a> Java, the general approach is very similar.<br \/>\nProgramming gives you all the freedom to do whatever comes to your mind. That\u2019s good. For standard use cases, programming might be too much effort. So, both ArangoDB and neo4j offer a declarative language\u2014neo4j has \u201cCypher,\u201d ArangoDB the \u201cArangoDB Query Language.\u201d Both also implement the blueprints standard so that you can use \u201c<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/github.com\/tinkerpop\/gremlin\/wiki');\"  href=\"https:\/\/github.com\/tinkerpop\/gremlin\/wiki\">Gremlin<\/a>\u201d as query-language inside Java. We already mentioned that ArangoDB is a multi-model database: AQL covers documents and graphs, it provides support for joins, lists, variables, and does much more.<\/p>\n<p>The following example is taken from the neo4j website: <\/p>\n<p>\u201cFor example, here is a query which finds a user called John in an index and then traverses the graph looking for friends of John\u2019s friends (though not his direct friends) before returning both John and any friends-of-friends that are found.<\/p>\n<p>START john=node:node_auto_index(name = &#8216;John&#8217;)<br \/>\nMATCH john-[:friend]->()-[:friend]->fof<br \/>\nRETURN john, fof \u201d<\/p>\n<p>The same query looks in AQL like this:<\/p>\n<p>FOR t IN TRAVERSAL(users, friends, &#8220;users\/john&#8221;, &#8220;outbound&#8221;,<br \/>\n{minDepth: 2}) RETURN t.vertex._key <\/p>\n<p>The result is:<br \/>\n[ &#8220;maria&#8221;, &#8220;steve&#8221; ]<\/p>\n<p>You see that Cypher describes patterns while AQL describes joins. Internally, ArangoDB has a library of graph functions\u2014those functions return collections of paths and paths or use those collections in a join.<\/p>\n<p><strong>Q7. How did you design ArangoDB to scale out and\/or scale up? Please give us some detail.<\/strong><\/p>\n<p><strong>Martin Sch\u00f6nert:<\/strong> Solid state disks are becoming more and more a commodity hardware. ArangoDB\u2019s append-only design is a perfect fit for such SSD, allowing for data-sets which are much bigger than the main memory but still fit unto a solid state disk.<br \/>\nArangoDB supports master\/slave replication in version 1.4 which will be released in the next days (a beta has been available for some time). On the one hand this provides easy fail-over setups. On the other hand it provides a simple way to scale the read-performance.<br \/>\nSharding is implemented in version 2.0. This enables you to store even bigger data-sets and increase the write-performance. As noted before, however, we see our main application when scaling to a low number of nodes. We don\u2019t plan to optimize ArangoDB for massive scaling with hundreds of nodes. Plain key\/value stores are much more usable in such scenarios.<\/p>\n<p><strong>Q8. What is ArangoDB\u2019s update and delete strategy?<\/strong><\/p>\n<p><strong>Martin Sch\u00f6nert:<\/strong> ArangoDB versions prior to 1.3 store all revisions of documents in an append-only fashion; the objects will never be overwritten. The latest version of a document is available to the end user. <\/p>\n<p>With the current version 1.3, ArangoDB introduces transactions and sets the technical fundament for replication and sharding. In the course of those highly wanted features comes \u201creal\u201d MVCC with concurrent writes.<\/p>\n<p>In databases implementing an append-only strategy, obsolete versions of a document have to be removed to save space. As we already mentioned, ArangoDB is multi-threaded: The so-called compaction is automatically done in the background in a different thread without blocking reads and writes.<\/p>\n<p><strong>Q9. How does ArangoDB differ from other NoSQL data stores such as Couchbase and MongoDB and graph data stores such as Neo4j, to name a few?<\/strong><\/p>\n<p><strong>Frank Celler: <\/strong> ArangoDB\u2019s feature scope is driven by the idea to give the developer everything he needs to master typical tasks in a web application\u2014in a convenient and technically sophisticated way alike.<br \/>\nFrom our point of view it\u2019s the combination of features and quality of the product which accounts for ArangoDB: ArangoDB not only handles documents but also graphs.<br \/>\nArangoDB is extensible via Javascript and Ruby. Enclosed with ArangoDB you get \u201cFoxx\u201d. Foxx is an integrated application framework ideal for lean back-ends and single page Javascript applications (SPA).<br \/>\nMulti-collection transactions are useful not only for online banking and e-commerce but they become crucial in any web app in a distributed architecture. Here again, we offer the developers many choices. If transactions are needed, developers can use them.<br \/>\nIf, on the other hand, the problem requires a higher performance and less transaction-safety, developers are free to ignore multi-collections transactions and to use the standard single-document transactions implemented by most NoSQL databases.<br \/>\nAnother unique feature is ArangoDB\u2019s query language AQL\u2014it makes querying powerful and convenient. For simple queries, we offer a simple query-by-example interface. Then again, AQL enables you to describe complex filter conditions and joins in a readable format.<\/p>\n<p><strong>Q10. Could you summarize the main results of your benchmarking tests?<\/strong><\/p>\n<p><strong>Frank Celler: <\/strong> To quote <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/twitter.com\/janl');\"  href=\"https:\/\/twitter.com\/janl\">Jan Lenhardt<\/a> from <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/couchdb.apache.org');\"  href=\"http:\/\/couchdb.apache.org\">CouchDB<\/a>: \u201cNosql is not about performance, scaling, dropping ACID or hating SQL\u2014it is about choice. As nosql databases are somewhat different it does not help very much to compare the databases by their throughput and chose the one which is fasted. Instead\u2014the user should carefully think about his overall requirements and weight the different aspects. Massively scalable key\/value stores or memory-only system[s] can archive much higher benchmarks. But your aim is [to] provide a much more convenient system for a broader range of use-cases\u2014which is fast enough for almost all cases.\u201d<br \/>\nAnyway, we have done a lot of performance tests and are more than happy with the results. ArangoDB 1.3 inserts up to 140,000 documents per second. We are going to publish the whole test suite including a test runner soon, so everybody can try it out on his own hardware.<\/p>\n<p>We have also tested the space usage: Storing 3.5 millions AQL search queries takes about 200 MB in <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.mongodb.org');\"  href=\"http:\/\/www.mongodb.org\">MongoDB<\/a> with pre-allocation compared to 55 MB in ArangoDB. This is the benefit of implementing the concept of shapes. <\/p>\n<p><strong>Q11. ArangoDB is open source. How do you motivate and involve the open source development community to contribute to your projects rather than any other open source NoSQL?<\/strong><\/p>\n<p><strong>Frank Celler: <\/strong> To be honest: The contributors come of their own volition and until now we didn\u2019t have to \u201cpush\u201d interested parties. Obviously, ArangoDB is fascinating enough, even though there are more than 150 NoSQL databases available to choose from. <\/p>\n<p>It all started when Ruby inventor <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Yukihiro_Matsumoto');\"  href=\"http:\/\/en.wikipedia.org\/wiki\/Yukihiro_Matsumoto\">Yukihiro \u201cMatz\u201d Matsumoto<\/a> tweeted on ArangoDB and recommended it to the community. Following this tweet, ArangoDB\u2019s first local fan base was established in Japan\u2014and we learned a lot about the limits of automatic translation from Japanese tweets to English and the other way around ;-).<\/p>\n<p>In our daily \u201cwork\u201d with our community, we try to be as open and supportive as possible. The core developers communicate directly and within short response times with people having ideas or needing help through Google Groups or GitHub. We take care of a community, especially for contributors, where we discuss future features and inform about upcoming changes early so that API contributors can keep their implementations up to date. <\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\n<strong>Martin Sch\u00f6nert<\/strong><br \/>\n<em>Martin is the origin of many fancy ideas in ArangoDB. As chief architect he is responsible for the overall architecture of the system, bringing in his experience from more than 20 years in IT as developer, architect, project manager and entrepreneur.<br \/>\nMartin started his career as scientist at the technical university of Aachen after earning his degree in Mathematics. Later he worked as head of product development (Team4 Systemhaus), Director IT (OnVista Technologies) and head of division at Deutsche Post.<br \/>\nMartin has been working with relational and non-relations databases (e.g. a torrid love-hate relationsship with the granddaddy of all non-relational databases: Lotus Notes) for the largest part of his professional life.<br \/>\nWhen no database did what he needed he also wrote his own, one for extremely high update rate and the other for distributed caching<\/em>.<\/p>\n<p><strong>Frank Celler<\/strong><br \/>\n<em>Frank is both entrepreneur and backend developer, developing mostly memory databases for two decades. He is the lead developer of ArangoDB and co-founder of triAGENS. Besides Frank organizes Cologne\u2019s NoSQL user group, NoSQL conferences and is speaking at developer conferences.<br \/>\nFrank studied in Aachen and London and received a PHD in Mathematics. Prior to founding triAGENS, the company behind ArangoDB, he worked for several German tech companies as consultant, team lead and developer.<br \/>\nHis technical focus is C and C++, recently he gained some experience with Ruby when integrating Mruby into ArangoDB.<\/em><\/p>\n<p><strong>Resources<\/strong><\/p>\n<p>&#8211; The stable version (1.3 branch) of ArangoDB can be downloaded <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.arangodb.org\/download');\"  href=\"http:\/\/www.arangodb.org\/download\">here<\/a>.<br \/>\n&#8211; <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/twitter.com\/arangodb');\"  href=\"https:\/\/twitter.com\/arangodb\">ArangoDB on Twitter<\/a><br \/>\n&#8211; <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/groups.google.com\/forum\/#!forum\/arangodb');\"  href=\"https:\/\/groups.google.com\/forum\/#!forum\/arangodb\">ArangoDB Google Group<\/a><br \/>\n&#8211; <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/stackoverflow.com\/questions\/tagged\/arangodb');\"  href=\"http:\/\/stackoverflow.com\/questions\/tagged\/arangodb\">ArangoDB questions on StackOverflow<\/a><br \/>\n&#8211; <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/github.com\/triagens\/arangodb\/issues?state=open');\"  href=\"https:\/\/github.com\/triagens\/arangodb\/issues?state=open\">Issue Tracker at Github<\/a><\/p>\n<p><strong>Related Posts<\/strong><\/p>\n<p>&#8211; <strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/2013\/10\/on-geo-distributed-data-management-interview-with-adam-abrevaya\/');\"  href=\"http:\/\/www.odbms.org\/blog\/2013\/10\/on-geo-distributed-data-management-interview-with-adam-abrevaya\/\">On geo-distributed data management \u2014 Interview with Adam Abrevaya. October 19, 2013<\/a><\/strong><\/p>\n<p>&#8211; <strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/2013\/10\/on-big-data-and-nosql-interview-with-renat-khasanshyn\/');\"  href=\"http:\/\/www.odbms.org\/blog\/2013\/10\/on-big-data-and-nosql-interview-with-renat-khasanshyn\/\">On Big Data and NoSQL. Interview with Renat Khasanshyn. October 7, 2013<\/a><\/strong><\/p>\n<p>&#8211; <strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/2013\/08\/on-nosql-interview-with-rick-cattell\/');\"  href=\"http:\/\/www.odbms.org\/blog\/2013\/08\/on-nosql-interview-with-rick-cattell\/\">On NoSQL. Interview with Rick Cattell. August 19, 2013<\/a><\/strong><\/p>\n<p>&#8211; <strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/2012\/08\/on-big-graph-data\/');\"  href=\"http:\/\/www.odbms.org\/blog\/2012\/08\/on-big-graph-data\/\">On Big Graph Data. August 6, 2012<\/a><\/strong><\/p>\n<p><strong>Follow ODBMS.org on Twitter: <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/twitter.com\/odbmsorg');\"  href=\"https:\/\/twitter.com\/odbmsorg\">@odbmsorg<\/a><\/strong><\/p>\n<p>##<\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>\u201cWe want to prevent a deadlock where the team is forced to switch the technology in the middle of the project because it doesn\u2019t meet the requirements any longer.\u201d&#8211;Martin Sch\u00f6nert and Frank Celler. On &#8220;multi-model&#8221; databases, I have interviewed Martin Sch\u00f6nert and Frank Celler, founders and creators of the open source ArangoDB. RVZ Q1. What [&hellip;]<!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[46,92,155,203,224,298,365,391,402,412,413,446,549],"_links":{"self":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/2743"}],"collection":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/comments?post=2743"}],"version-history":[{"count":0,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/2743\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/media?parent=2743"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/categories?post=2743"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/tags?post=2743"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}