On Eventual Consistency– An interview with Justin Sheehy.
“I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense” –Justin Sheehy.
On the subject of new data models and eventual consistency I did interview Justin Sheehy Chief Technology Officer, Basho Technologies.
Q1. What are in your opinion the main differences and similarities of a key-value store (ala Dynamo), a document store (ala MongoDB), and an “extensible record” store (ala Big Table) when using them in practice?
Justin Sheehy: Describing the kv-store, doc store, and column family data models in general is not the same as describing specific systems like Dynamo, MongoDB, and BigTable. I’ll do the former here as I am guessing that is the intention of the question. Since the following couple of questions ask for differences, I’ll emphasize the similarity here.
All three of these data models have two major things in common: values stored in them are not rigidly structured, and are organized mainly by primary key. The details beyond those similarities, and how given systems expose those details, certainly vary. But the flexibility of semi-structured data and the efficiency of primary-key access generally apply to most such systems.
Q2. When is a key-value store particularly well suited and when is a a document store instead preferable? For which kind of applications and for what kind of data management requirements?
Justin Sheehy: The interesting issue with this question is that “document store” is not well-established as having a specific meaning. Certainly it seems to apply to both MongoDB and CouchDB, but those two systems have very different data access semantics. The closest definition I can come up with quickly that covers the prominent systems known as doc stores might be something like “a key-value store which also has queryability of some kind based on the content of the values.”
If we accept that definition then you can happily use a document store anywhere that a key-value store would work, but would find it most worthwhile when your querying needs are richer than simply primary key direct access.
Q3. What is Riak? A key-value store or a document store? What are the main features of the current version of Riak?
Justin Sheehy: Riak began as being called a key-value store before the current popularity of the term “document store” term, but it is certainly a document store by any reasonable definition that I know — such as the one I gave above. In addition to access by primary key, values in Riak can be queried by secondary key, range query, link walking, full text search, or map/reduce.
Riak has many features, but the core reasons that people come to Riak over other systems are Availability, Scalability, and Predictability. For people whose business demands extremely high availability, easy and linear scalability, or predictable performance over time, Riak is worth a look.
Q4. How do you achieve horizontal scalability? Do you use a “shared nothing” horizontal scaling – replicating and partitioning data over many servers?
What performance metrics do you have for that?
Justin Sheehy: We use a number of techniques to achieve horizontal scalability. Among them is consistent hashing, an approach invented at Akamai and successfully used by many distributed systems since then. This allows for constant time routing to replicas of data based on the hash of the data’s primary key.
Data is partitioned to servers in the cluster based on consistent hashing, and replicated to a configurable number of of those servers. By partitioning the data to many “virtual nodes” per host, growth is relatively easy as new hosts simply (and automatically) take over some of the virtual nodes that has previously owned by existing cluster hosts.
Yes, in terms of data location Riak is a “shared nothing” system.
One (of many) demonstrations of this scalability was performed by Joyent here.
That benchmark is approximately 2 years old, so various specific numbers are quite outdated, but the important lesson in it remains and is summed up by this graph late in this post.
It shows that as servers were added, the throughput (as well as the capacity) of the overall system increased linearly.
Q5. How do you handle updates if you do not support ACID transactions? For which applications this is sufficient, and when this is not?
Justin Sheehy: Riak takes more of the “BASE” approach, which has become accepted over the past several years as a sensible tradeoff for high-availability data systems. By allowing consistency guarantees to be a bit flexible during failure conditions, a Riak cluster is able to provide much more extreme availability guarantees than a strictly ACID system.
Q6. You said that Riak takes more of the “BASE” approach. Did you use the definition of eventual consistency by Werner Vogels?
Reproduced here: “Eventual consistency: The storage system guarantees that if no new updates are made to the object, eventually (after the inconsistency window closes) all accesses will return the last updated value”. You would not wish to have an “eventual consistency” update to your bank account. For which class of applications is eventual consistency a good system design choice?
Justin Sheehy: That definition of Eventual Consistency certainly does apply to Riak, yes.
I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense. Traditional accounting is done in an eventually-consistent way and if you send me a payment from your bank to mine then that transaction will be resolved in an eventually consistent way. That is, your bank account and mine will not have a jointly-atomic change in value, but instead yours will have a debit and mine will have a credit, each of which will be applied to our respective accounts.
This question contains a very commonly held misconception. The use of eventual consistency in well-designed systems does not lead to inconsistency. Instead, such systems may allow brief (but shortly resolved) discrepancies at precisely the moments when the other alternative would be to simply fail.
To rephrase your statement, you would not wish your bank to fail to accept a deposit due to an insistence on strict global consistency.
It is precisely the cases where you care about very high availability of a distributed system that eventual consistency might be a worthwhile tradeoff.
Q7. Why is Riak written in Erlang? What are the implications for the application developers of this choice?
Justin Sheehy: Erlang’s original design goals included making it easy to build systems with soft real-time guarantees and very robust fault-tolerance properties. That is perfectly aligned with our central goals with Riak, and so Erlang was a natural fit for us. Over the past few years, that choice has proven many times to have been a great choice with a huge payoff for Riak’s developers. Application developers using Riak are not required to care about this choice any more than they need to care what language PostgreSQL is written in. The implications for those developers are simply that the database they are using has very predictable performance and excellent resilience.
Q8. Riak is open source. How do you engage the open source community and how do you make sure that no inconsistent versions are generated?
Justin Sheehy: We engage the open source community everywhere that it exists. We do our development in the open on github, and have lively conversations with a wider community via email lists, IRC, Twitter, many in-person venues, and more.
Mark Phillips and others at Basho are dedicated full-time to ensuring that we continue to engage honestly and openly with developer communities, but all of us consider it an essential part of what we do. We do not try to prevent forks. Instead, we are part of the community in such a way that people generally want to contribute their changes back to the central repository. The only barrier we have to merging such code is about maintaining a standard of quality.
Q9. How do you optimize access to non-key attributes?
Justin Sheehy: Riak stores index content in addition to values, encoded by type and in sorted order on disk. A query by index certainly is more expensive than simply accessing a single value directly by key, as the indices are distributed around the cluster — but this also means that the size of the index is not constrained by a single host.
Q10. How do you optimize access to non-key attributes if you do not support indexes in Riak?
Justin Sheehy: We do support indexes in Riak.
Q11 How does Riak compare with a new generation of scalable relational systems (NewSQL)?
Justin Sheehy: The “NewSQL” term is, much like “NoSQL”, a marketing term that doesn’t usefully define a technical category. The primary argument made by NewSQL proponents is that some NoSQL systems have made unnecessary tradeoffs. I personally consider these NewSQL systems to be a part of the greater movement generally dubbed NoSQL despite the seemingly contradictory names, as the core of that movement has nothing to do with SQL — it is about escaping the architectural monoculture that has gripped the commercial database market for the past few decades. In terms of technical comparison, some systems placing themselves under the NewSQL banner are excellent at scalability and performance, but I know of none whose availability and predictability can rival Riak.
Q12 Pls give some examples of use cases where Riak is currently in use. Is Riak in use for analyzing Big Data as well?
Justin Sheehy: A few examples of companies relying on Riak in their business can be found here.
While Riak is primarily about highly-available systems with predictable low-latency performance, it does have analytical capabilities as well and many users make use of map/reduce and other such programming models in Riak. By most definitions of “Big Data”, many of Riak’s users certainly fall into that category.
Q Anything you wish to add?
Justin Sheehy: Thank you for your interest. We’re not done making Riak great!
Chief Technology Officer, Basho Technologies
As Chief Technology Officer, Justin Sheehy directs Basho’s technical strategy, roadmap, and new research into storage and distributed systems.
Justin came to Basho from the MITRE Corporation, where as a principal scientist he managed large research projects for the U.S. Intelligence Community including such efforts as high assurance platforms, automated defensive cyber response, and cryptographic protocol analysis.
He was central to MITRE’s development of research for mission assurance against sophisticated threats, the flagship program of which successfully proposed and created methods for building resilient networks of web services.
Before working for MITRE, Justin worked at a series of technology companies including five years at Akamai Technologies, where he was a senior architect for systems infrastructure giving Justin a broad as well as deep background in distributed systems.
Justin was a key contributor to the technology that enabled fast growth of Akamai’s networks and services while allowing support costs to stay low. Justin performed both undergraduate and postgraduate studies in Computer Science at Northeastern University.
ODBMS.org — Free Downloads and Links
In this section you can download resources covering the following topics:
Big Data and Analytical Data Platforms
Cloud Data Stores
NoSQL Data Stores
Graphs and Data Stores
Entity Framework (EF) Resources
Object-Relational Impedance Mismatch
XML, RDF Data Stores,