Re-thinking Relational Database Technology. Interview with Barry Morris, Founder & CEO NuoDB.
The landscape for data management products is rapidly evolving. NuoDB is a new entrant in the market. I asked a few questions to Barry Morris, Founder & Chief Executive Officer NuoDB.
Q1. What is NuoDB?
Barry Morris: NuoDB combines the power of standard SQL with cloud elasticity. People want SQL and ACID transactions but these have been expensive to scale-up or down. NuoDB changes that by introducing a breakthrough “emergent” internal architecture.
NuoDB scales elastically as you add machines and is self-healing if you take machines away. It also allows you to share machines between databases, vastly increases business continuity guarantees, and runs in geo-distributed active/active configurations. It does all this with very little database administration.
Q2. When did you start the company?
Barry Morris: The technology has been developed over a period of several years but the company was funded in 2010 and has been operating out of Cambridge MA since then.
Q3. Big data, Web-scale concurrency and Cloud: how can NuoDB help here?
Barry Morris: These are some of the main themes of modern system software.
The 30-year-old database architecture that is universally used by traditional SQL database systems is great for traditional applications and in traditional datacenters, but the old architecture is a liability in web-facing applications running on private, public or hybrid clouds.
NuoDB is a general purpose SQL database designed specifically for these modern application requirements.
Massive concurrency with NuoDB is easy to understand: If the system is overloaded with users you can just add as many diskless “transaction managers” as you need. NuoDB handles big data by using redundant Key-value stores for it’s storage architecture.
Cloud support boils down to the Five Key Cloud requirements: Elastic scalability, sharing machines between databases, extreme availability geo-distribution and “zero” DBA costs. You get these for free from a database with an emergent architecture, such as NuoDB.
Q4. What kind of data (structured, non structured), and volumes of data is NuoDB able to handle?
Barry Morris: NuoDB can handle any kinds of data, in a set-based relational model.
Naturally we support all standard SQL types. Additionally we have rich BLOB support that allows us to store anything in an opaque fashion. A forthcoming release of the product will also support user-defined types.
NuoDB extends the traditional SQL type model in some interesting ways. We store arbitrary-length strings and numbers because we store everything by value not by type. Schemas can change easily and dynamically because they are not tightly coupled to the data.
Additionally NuoDB supports table inheritance, a powerful feature for applications that traditionally have wide, sparse tables.
Q5. How do you ensure data availability?
Barry Morris: There is no single point of failure in NuoDB. In fact it is quite hard to stop a NuoDB database from running, or to lose data.
There are three tiers of processes in a NuoDB solution (Brokers, Transaction Managers, and Archive Managers) and each tier can be arbitrarily redundant. All NuoDB Brokers, Transaction Managers and Archive Managers run as true peers of others in their respective tiers. To stop a database you have to stop all processes on at least one tier.
Data is stored redundantly, in as many Archive Managers as you want to deploy. All Archive Managers are peers and if you lose one the system just keeps going with the remaining Archive Managers.
Additionally the system can run across multiple datacenters. In the case of losing a datacenter a NuoDB database will keep running in the other datacenters.
Q6. How do you orchestrate and guarantee ACID transactions in a distributed environment?
Barry Morris: We do not use a network lock manager because we need to be asynchronous in order to scale elastically.
Instead concurrency and consistency are managed using an advanced form of MVCC (multi-version concurrency control). We never actually delete or update a record in the system directly. Instead we always create new versions of records pointing at the old versions. It is our job to do all the bookkeeping about who sees which versions of which records and at what time.
The durability model is based on redundant storage in Key-value stores we call Archive Managers.
Q7. You say you use a distributed non-blocking atomic commit protocol. What is it? What is it useful for?
Barry Morris: Until changes to the state of a transactional database are committed they are not part of the state of the database. This is true for any ACID database system.
The Commit Protocol in NuoDB is complex because at any time we can have thousands of concurrent applications reading, writing, updating and deleting data in an asynchronous distributed system with multiple live storage nodes. A naïve design would “lock” the system in order to commit a change to the durable state, but that is a good way of ensuring the system does not perform or scale. Instead we have a distributed, asynchronous commit protocol that allows a transaction to commit without requiring network locks.
Q8. What is special about NuoDB Architecture?
Barry Morris: NuoDB has an emergent architecture. It is like a flock of birds that can fly in an organized formation without having a central “brain”. Each bird follows some simple rules and the overall effect is to organize the group.
In NuoDB there is no one is in charge. There is no supervisor, no master, and central authority on anything. Everything that would normally be centralized is distributed. Any Transaction Manager or Archive Manager with the right security credentials can dynamically opt-in or opt-out of a particular database. All properties of the system emerge from the interactions of peers participating on a discretionary basis rather than from a monolithic central coordinator.
Q9. Do you support SQL?
Barry Morris: Yes. NuoDB is a relational database, and SQL is the primary query language. We have very broad and very standard SQL support.
Q10. How do you reduce the overhead when storing data to a disk? Do you use in-memory cache?
Barry Morris: Our storage model is that we have multiple redundant Archive Nodes for any given database. Archive Nodes are Key-value stores that know how to create and retrieve blobs of data. Archive Nodes can be implemented on top of any Key-value store (we currently have a file-system implementation and an Amazon S3 implementation). Any database can use multiple types of Archive Nodes at the same time (eg SSD based alongside disk-based).
Archive Nodes allow trade-offs to be made on disk write latency. For an ultra-conservative strategy you can run the system with fully journaled, flushed disk writes on multiple Archive Nodes in multiple datacenters; on the other end of the spectrum you can rely on redundant memory copies as your committed data, with disk writes happening as a background activity. And there are options in between. In all cases there are caching strategies in place that are internal to the Archive Nodes.
Q11. How do you achieve that query processing scales with the number of available nodes?
Barry Morris: Queries running on multiple nodes get the obvious benefit of node parallelism at the query-processing level. They additionally get parallelism at the disk-read level because there are multiple archive nodes. In most applications the most significant scaling benefit is that the Transaction Nodes (which do the query-processing) can load data from each others memory if it is available there. This is orders-of-magnitude faster than loading data from disk.
Q12. What about storage? How does it scale?
Barry Morris: You can have arbitrarily large databases, bounded only by the storage capacity of your underlying Key-value store. The NuoDB system itself is agnostic about data set size.
Working set size is very important for performance. If you have a fairly stable working set that fits in the distributed memory of your Transaction Nodes then you effectively have a distributed in-memory database, and you will see extremely high performance numbers. With the low and decreasing cost of memory this is not an unusual circumstance.
Scalability and Elasticity
Q13. How do you obtain scalability and elasticity?
Barry Morris: These are simple consequences of the emergent architecture.
Transaction Managers are diskless nodes. When a Transaction Manager is added to a database it starts getting allocated work and will consequently increase the throughput of the database system. If you remove a transaction manager you might have some failed transactions, which the application is necessarily designed to handle, but you won’t lose any data and the system will continue regardless.
Q14. How complex is for a developer to write applications using NuoDB?
Barry Morris: It’s no different from traditional relational databases. NuoDB offers standard SQL with JDBC, ODBC, Hibernate, Ruby Active Records etc.
Q15. NuoDB is in a restricted Beta program. When will it be available in production? Is there a way to try the system now?
Barry Morris: NuoDB is in it’s Beta 4 release. We expect it to ship by the end of the year, or very early in 2012. You can try the system by going to our download page.
– ODBMS.org: Free Downloads and Links on various data management technologies:
*Analytical Data Platforms.
*Cloud Data Stores,
*Databases in general
*Entity Framework (EF) Resources,
*Graphs and Data Stores
*NoSQL Data Stores,
*Object-Relational Impedance Mismatch,