NuoDB: A Comparison of Distributed, ACID-Compliant Databases
NuoDB: A Comparison of Distributed, ACID-Compliant Databases
Several years after publishing papers on Spanner and F1, Google last week announced the Beta availability of what they’re calling Cloud Spanner. Essentially, this is opening the core transactional, relational service they’ve been using internally to users on the Google Cloud Platform. My colleague Barry wrote a great piece on what this means, so I won’t go into detail about the history and why this is exciting news. But, you know, it’s pretty big news.
One of the more interesting aspects of this announcement is exactly that they’re making available something they’ve used for years as core infrastructure. Spanner is fundamental to much of what Google does. For users of this complex, distributed system, this is great because it means that Spanner has been battle-tested. It also, however, means that the service was designed for a specific set of application needs. As with any distributed system, understanding the trade-offs they’ve made in their design is critical to understanding where this new database can be used most effectively.
In many ways, Cloud Spanner is similar to NuoDB. Both are elastic SQL databases designed for scale and cloud adoption. Both provide consistency and ACID transactions. Yet there are also some substantial differences, which highlight the distinct goals of our approaches and the expectations of how our respective software will be used.
WHAT’S SIMILAR OR THE SAME BETWEEN NUODB AND CLOUD SPANNER?
Like I said above, the most obvious similarity is that both services are elastic. Essentially, this means that they can scale on-demand with no impact on the application logic. More specifically, both databases achieve this by scaling compute capacity on demand separately from storage locations & replicas: with NuoDB, there are separate peer processes dedicated to replicated durability whereas Cloud Spanner builds on a replicated storage service (called Colossus). Being elastic also means that both databases can sustain node failure or upgrade without affecting availability of the service as a whole to an application.
The next common element is that both NuoDB and Cloud Spanner are transactional. They provide ACID semantics and support standard SQL Isolation levels. In both cases, some form of versioning is relied upon: Cloud Spanner uses global timestamps via TrueTime while NuoDB builds on logical ordering and Multi-Version Concurrency Control (MVCC). Both systems mediate update conflicts via a chosen leader, although as we’ll see below, the mechanisms are different. Elastic scale with transactional consistency is critical for any existing applications migrating from single-instance databases to distributed, virtualized deployments.
Both NuoDB and Cloud Spanner build around SQL schema and DDL commands for defining structure. This in turn supports indexing, rich query capability, JOINs, etc. Both databases have implemented syntax and features directly from the SQL standard, making it simpler to support standard query tools out of the box. As we’ll see below however, there are significant differences in DML choices. In both cases, DBAs can use schema structure to optimize for query and storage patterns: in NuoDB this is supported viastandard Table Partitioning and in Cloud Spanner a new concept called Interleaving is introduced.
For many users of traditional RDBMSs, there is a desired progression from single instance to failover to active-active deployment. Distributed data management systems tend to have this progression in mind, as well as the logical next step of supporting global-scale deployment so that dispersed users can all experience low-latency access to their data. At that scale, latency, failure semantics, and recovery models get complicated. Both Cloud Spanner and NuoDB have been designing for these kinds of deployments, giving users options about how to react to failure and where data should be stored to optimize for a given application’s needs around latency, recoverability, or consistency. Google will be revealing more details of their approach later this year, so we’ll have to revisit this topic.
Of course, it’s hard to talk about global deployment without discussing The CAP Theorem. Indeed, this has been a pretty hot topic with the Cloud Spanner release, and Eric Brewer, originator of the theorem and current Google employee, has written about his views on why Cloud Spanner is in something of a new category. Without going into the details, his argument is that the high availability and reliability of Google’s infrastructure means that partitions are unlikely, and even when they occur, the service as a whole will continue to be available to running clients, even if specific queries may have to get re-run. Translating, I’d say this means that CAP is an observation in the reality of systems, and what you should focus on is understanding the tradeoffs that matter to you on failure, which is exactly the same way we look at the design of NuoDB.
Finally, it’s worth looking at the similarities of how storage is managed in both systems. Beyond what I said above about separating storage and service, both databases take the approach of mapping data into collections that affect not just storage but also coordination efficiency. Table Partitioning in NuoDB or Interleaving with Stored Indices in Cloud Spanner effectively group data that is likely to be used together. That grouping in both cases can be used to optimize locality, caching, and contention management. In NuoDB, this clustering of data is mapped into a concept called a Storage Group; in Cloud Spanner, the analogous concept is called a Split. These similar abstractions between structure and storage make a lot of sense in elastic systems. As we’ll see below, there are also some substantial differences in the way these two concepts are applied.
Rounding this out, both systems implement standard identity mechanisms, encrypt data on the wire by default, provide logical backup & recovery models and do many of the other things that are baseline requirements in an enterprise environment. So who do all these similarities appeal to? I believe it’s traditional SQL users migrating to modern architectures, startups that need distribution & robust operation out of the box in something easy to manage, and software vendors moving to hosting or SaaS offerings.
That’s a pretty broad group, so next let’s look at some key differences in design goals to better understand where these two databases should be applied.
CUSTOM INFRASTRUCTURE VERSUS COMMODITY DEPLOYMENT
Let’s start with perhaps the most obvious difference. Cloud Spanner is tied deeply to Google’s infrastructure, relying on atomic clocks for TrueTime support and Google Cloud Platform services for storage, compute etc. It’s this tight relationship that empowers the scaling capabilities of Cloud Spanner, but it also means that users of the services must themselves be running on Google’s cloud. NuoDB must do more in its implementation to support scale, consistency, recovery, and snapshot but by doing so enables scale on commodity infrastructure. This allows deployments to run on-prem, in a cloud (including Google’s), and on a wide range of heterogeneous infrastructure.
Reliance on Google’s infrastructure has other effects. For instance, by using a global time service for ordering and consistency, Google has chosen to place a minimum latency(typically only a few milliseconds) on all transactions. There’s a great discussion over at Quizlet that quantifies early experiences here. The powerful benefits from this choice are serializability, linear throughput scaling, and a simple model for point-in-time snapshot. For NuoDB and our customer requirements, we’ve taken the approach to focus first on low latency by exploiting on-demand caches and protocols optimized around an assumption of good locality. This means that the system scales throughput very well when locality is maintained. Locality of reference is also important in Cloud Spanner but for different reasons. We’ll come back to that in a minute.
PRIMARY-KEY APIS VERSUS STANDARD SQL (DML)
Another important difference is also driven (in part) by application requirements. Google has implemented a system that uses standard SQL syntax for reading data but requires INSERTs or UPDATEs be done via a custom API, and be associated with a primary key. This API is one they have standardized with other offerings in their cloud. It makes sense as an interface to web-style applications that are used to KV or Document stores but want SQL for analysis and discovery. Indeed, transactions can be declared either read-write or read-only, and the Cloud Spanner team has provided really good documentation on how to think about the trade-offs and implications to your application. NuoDB, by contrast, supports standard DML syntax for INSERTs and UPDATEs and allows arbitrary interaction within a transaction. To a developer inexperienced with SQL, this may seem more complex than a simple KV API, but it’s a critical capability for anyone with existing applications and tools migrating to cloud models.
CO-LOCATION OF EMBEDDED STRUCTURE VERSUS CO-LOCATION OF CACHED DATA
Besides simplicity, there’s another reason that Google has chosen to make writes to the system simpler, primary-key associations. There must be a primary key on each table, among other reasons, to define key-ranges for Splits. Each Root Table (and any of its Interleaved children) is broken into consecutive key-range spans that define the contents of each Split. Note this means that a secondary index will span Splits non-consecutively so direct advice is offered about how to use secondary indices effectively. NuoDB also has a physical mapping of data to internal structure, where each row is mapped into one object containing a collection of other rows. That object is called an Atom, which unlike Splits tend to be small (<50k). Separately, there are also Catalog Atoms that maintain the registry about how to find each row. While there is overhead in maintaining a Catalog as opposed to hashing shards, the benefit is that data can be replicated and brought into cache, on-demand, at any processing node because the Catalog and not a rigid hash function is the lookup and coordination mechanism. In other words, NuoDB trades-off some complexity for flexibility about how and where data is used.
COORDINATION VIA PHYSICAL LAYOUT VERSUS ON-DEMAND CACHES
Here’s where the topic of locality gets interesting. For each Split in Cloud Spanner, there’s a Leader (chosen via Paxos) and several followers that all have access to the data. If all the work of a given transaction stays within the data set of a given Split then coordination is done by a single (possibly local) Leader. If the transaction spans Splits then a coordination protocol must be run to coordinate between Leaders. This is why Interleaving is offered as a feature and why there is careful advice about how to use indices in practice. In NuoDB, data is brought into cache as needed so you don’t define co-location in your schema. Amongst the NuoDB database processes that have a given object in-cache, agreement is made about who the “leader” for that object is. When an object is in-cache at only one location, all coordination is guaranteed to be local. As a given object is replicated to many caches, there is overhead in coordinating updates, so the NuoDB focus is on minimizing cache replication.
AUTOMATED VERSUS MANUAL DATA PARTITIONING
Interestingly, Cloud Spanner gives you no control over the size or number of Splits in your system, only the ability to co-locate data into the same Split. As a workload runs, heuristic decisions are made about how to separate, join, or otherwise re-align Splits. In essence, there is an optimizer applied to data storage. NuoDB takes the opposite approach, placing little meaning in which Atoms are used to store your data but giving you complete control over how Tables are Partitioned and then mapped to Storage Groups. This makes it very simple to think about placement of storage based on locality or resource allocation but requires more effort to automatically scale as data set sizes grow. These trade-offs (in my opinion) make sense given that Cloud Spanner is designed as a service that abstracts resources like storage, whereas NuoDB is software designed to give operators flexibility for their deployment and resource management requirements.
WHAT’S THE PUNCHLINE?
By opening Cloud Spanner to their customers, Google is validating a lot of what I believe the industry is trending towards. Looking at the similarities above, both NuoDB and Cloud Spanner are focused on elastic SQL, transactional consistency, availability, and Hybrid Transactions. The architectural similarities, like separating service from storage, are well-suited for on-demand scaling and transparent failure-handling. The data abstractions in both are a good fit to traditional partitioning uses as well as modern global operational models. These are all common motivators for application migration into the cloud.
On the other hand, the applications that motivate these two systems express different perspectives. Cloud Spanner has chosen to optimize for their in-cloud services, focused on primary-key application models which scale well by defining schema that maps to localized access patterns. In practice, this maps well (for example) to sharded applications looking for a logical management model or Key-Value/Document users that need consistency. NuoDB assumes no specific infrastructure and therefore builds on an on-demand caching model to optimize for commodity resources. It scales well for applications that can cache working sets and localize conflict based on active access patterns. This is designed for the requirements of traditional OLTP applications that need standard SQL in modern architectures.
In either case, these are both examples of the transformation that’s happening in the database space: distribution with strong performance and transactional consistency. I’m thrilled to see the amount of discussion that’s happened this past week around Cloud Spanner and all the questions that it’s making the community ask. I encourage you to go try both systems and understand for yourself the options that are out there, then let me know what you think in the comments below!