On SQL and NoSQL. Interview with Dave Rosenthal
“Despite the obvious shared word ‘transaction’ and the canonical example of a database transaction which modifies multiple bank accounts, I don’t think that database transactions are particularly relevant to financial applications.”–Dave Rosenthal.
On SQL and NoSQL, I have interviewed Dave Rosenthal CEO of FoundationDB.
Q1. What are the suggested criteria for users when they need to choose between durability for lower latency, higher throughput and write availability?
Dave Rosenthal: There is a tradeoff in available between commit latency and durability–especially in distributed databases. At one extreme a database client can just report success immediately (without even talking to the database server) and buffer the writes in the background. Obviously, that hides latency well, but you could lose a suffix of transactions. At the other extreme, you can replicate writes across multiple machines, fsync them on each of the machines, and only then report success to the client.
FoundationDB is optimized to provide good performance in its default setting, which is the safest end of that tradeoff.
Usually, if you want some reasonable amount of durability guarantee, you are talking about a commit latency of small constant factor times the network latency. So, the real latency issues come with databases spanning multiple data centers. In that case FoundationDB users are able to choose whether or not they want durability guarantees in all data centers before commit (increasing commitment latencies), which is our default setting, or whether they would like to relax durability guarantees by returning a commit when the data is fsync’d to disk in just one datacenter.
All that said, in general, we think that the application is usually a more appropriate place to try to hide latency than the database.
Q2. Justin Sheehy of Basho in an interview said  “I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense”. What is your opinion on this?
Dave Rosenthal: Yes, we totally agree with Justin. Despite the obvious shared word ‘transaction’ and the canonical example of a database transaction which modifies multiple bank accounts, I don’t think that database transactions are particularly relevant to financial applications. In fact, true ACID transactions are way more broadly important than that. They give you the ability to build abstractions and systems that you can provide guarantees about.
As Michael Cahill says in his thesis which became the SIGMOD paper of the year: “Serializable isolation enables the development of a complex system by composing modules, after verifying that each module maintains consistency in isolation.” It’s this incredibly important ability to compose that makes a system with transactions special.
Q3. FoundationDB claim to provide full ACID transactions. How do you do that?
Dave Rosenthal: In the same basic way as many other transactional databases do. We use a few strategies that tend to work well in distributed system such as optimistic concurrency and MVCC. We also, of course, have had to solve some of the fundamental challenges associated with distributed systems and all of the crazy things that can happen in them. Honestly, it’s not very hard to build a distributed transactional database. The hard part is making it work gracefully through failure scenarios and to run fast.
Q4. Is this similar to Oracle NoSQL?
Dave Rosenthal: Not really. Both Oracle NoSQL and FoundationDB provide an automatically-partitioned key-value store with fault tolerance. Both also have a concept of ordering keys (for efficient range operations) though Oracle NoSQL only provides ordering “within a Major Key set”. So, those are the similarities, but there are a bunch of other NoSQL systems with all those properties. The huge difference is that FoundationDB provides for ACID transactions over arbitrary keys and ranges, while Oracle NoSQL does not.
Q5. How would you compare your product offering with respect to NoSQL data stores, such as CouchDB, MongoDB, Cassandra and Riak, and NewSQL such as NuoDB and VoltDB?
Dave Rosenthal: The most obvious response for the NoSQL data stores would be “we have ACID transactions, they don’t”, but the more important difference is in philosophy and strategy.
Each of those products expose a single data model and interface. Maybe two. We are pursuing a fundamentally different strategy.
We are building a storage substrate that can be adapted, via layers, to provide a variety of data models, APIs, and true flexibility.
We can do that because of our transactional capabilities. CouchDB, MongoDB, Cassandra and Riak all have different APIs and we talk to companies that run all of those products side-by-side. The NewSQL database players are also offering a single data model, albeit a very popular one, SQL. FoundationDB is offering an ever increasing number of data models through its “layers”, currently including several popular NoSQL data models and with SQL being the next big one to hit. Our philosophy is that you shouldn’t have to increase the complexity of your architecture by adopting a new NoSQL database each time your engineers need access to a new data model.
Q6. Cloud computing and open source: How does it relate to FoundationDB?
Dave Rosenthal: Cloud computing: FoundationDB has been designed from the beginning to run well in cloud environments that make use of large numbers of commodity machines connected through a network. Probably the most important aspect of a distributed database designed for cloud deployment is exceptional fault tolerance under very harsh and strange failure conditions – the kind of exceptionally unlikely things that can only happen when you have many machines working together with components failing unpredictably. We have put a huge amount of effort into testing FoundationDB in these grueling scenarios, and feel very confident in our ability to perform well in these types of environments. In particular, we have users running FoundationDB successfully on many different cloud providers, and we’ve seen the system keep its guarantees under real-world hardware and network failure conditions experienced by our users.
Open source: Although FoundationDB’s core data storage engine is closed source, our layer ecosystem is open source. Although the core data storage engine has a very simple feature set, and is very difficult to properly modify while maintaining correctness, layers are very feature rich and because they are stateless, are much easier to create and modify which makes them well suited to third-party contributions.
Q7 Pls give some examples of use cases where FoundationDB is currently in use. Is FoundationDB in use for analyzing Big Data as well?
Dave Rosenthal: Some examples: User data, meta data, user social graphs, geo data, via ORMs using the SQL layer, metrics collection, etc.
We’ve mostly focused on operational systems, but a few of our customers have built what I would call “big data” applications, which I think of as analytics-focused. The most common use case has been for collecting and analyzing time-series data. FoundationDB is strongest in big data applications that call for lots of random reads and writes, not just big table scans—which many systems can do well.
Q8. Rick Cattel said in an recent interview  “there aren’t enough open source contributors to keep projects competitive in features and performance, and the companies supporting the open source offerings will have trouble making enough money to keep the products competitive themselves”. What is your opinion on this?
Dave Rosenthal: People have great ideas for databases all the time. New data models, new query languages, etc.
If nothing else, this NoSQL experiment that we’ve all been a part of the past few years has shown us all the appetite for data models suited to specific problems. They would love to be able to build these tools, open source them, etc.
The problem is that the checklist of practical considerations for a database is huge: Fault tolerance, scalability, a backup solution, management and monitoring, ACID transactions, etc. Add those together and even the simplest concept sounds like a huge project.
Our vision at FoundationDB is that we have done the hard work to build a storage substrate that simultaneously solves all those tricky practical problems. Our engine can be used to quickly build a database layer for any particular application that inherits all of those solutions and their benefits, like scalability, fault tolerance and ACID compliance.
Q9. Nick Heudecker of Gartner, predicts that  “going forward, we see the bifurcation between relational and NoSQL DBMS markets diminishing over time” . What is your take on this?
Dave Rosenthal: I do think that the lines between SQL and NoSQL will start to blur and I believe that we are leading that charge.We acquired another database startup last year called Akiban that builds an amazing SQL database engine.
In 2014 we’ll be bringing that engine to market as a layer running on top of FoundationDB. That will be a true ANSI SQL database operating as a module directly on top of a transactional “NoSQL” engine, inheriting the operational benefits of our core storage engine – scalability, fault tolerance, ease of operation.
When you run multiple of the SQL layer modules, you can point many of them at the same key-space in FoundationDB and it’s as if they are all part of the same database, with ACID transactions enforced across the separate SQL layer processes.
It’s very cool. Of course, you can even run the SQL layer on a FoundationDB cluster that’s also supporting other data models, like graph or document. That’s about as blurry as it gets.
Dave Rosenthal is CEO of FoundationDB. Dave started his career in games, building a 3D real-time strategy game with a team of high-school friends that won the 1st annual Independent Games Festival. Previously, Dave was CTO at Visual Sciences, a pioneering web-analytics company that is now part of Adobe. Dave has a degree in theoretical computer science from MIT.
Follow ODBMS.org on Twitter: @odbmsorg