Oracle on NoSQL Architecture and Implementation.
Robert Greene – NoSQL Evangelist / Product Manager at Oracle
If you’ve been paying attention to the NoSQL market, you’ve likely realized that there is no “one size fits all” implementation. Nearly every vendor in the NoSQL space has a different API, architectural approach, fundamental data storage format. These variations have a tremendous impact on the programmability, performance, scale and manageability of the respective products and are often not well understood by new users of the technology.
This large variability is quite different than the relational database market, where it is conventional wisdom that programmability is standardized on SQL, performance is generally within some banded range, and management protocols (e.g. SNMP) exist for unified tooling. In the relational world, one can expect nearly identical programmability and while one product might be 10% or 15% faster or slower than another, it would not be an order of magnitude different. Of course, there is the occasional leap frog, but soon all the vendors catch up to parity and then decision making becomes much more about product stability, vertical scale ability, surrounding eco system of tools, consolidation cost savings and vendor alliances.
Choice of NoSQL on the other hand, can present large skew in programmatic complexity, performance, scale and manageability. Given that most of the time a NoSQL choice is already tied to an expectation of extreme performance and scale, it is especially important for new users to take time to understand product architecture and design choices as part of initiating an evaluation.
There will be NoSQL feature base lining over time, but some architectural choices are much harder to change and will hinder product evolution to a greater degree than basic feature alignment. Especially fundamental architectural design decisions such as Peer-to-Peer vs Paxos Groups, but also lower level decisions e.g. replacing a single lock implementation can take years of effort. Also, some aspects of product variation are easier to converge than others. API’s for example are already starting to converge, ironically, towards SQL’ish type languages.
This is apparent when taking a look at both the Key-Value and Document NoSQL categories.
Nearly all of the major “Key-Value” store vendors have converged over the last 24 months to a Table data model with an increasing number also introducing SQL type access on top of the tables. More will follow. Even the “Document” category, which is hyper focused on JSON storage, is consolidating towards an SQL API. A few years ago, there were precious few vendors with native JSON storage and access, but fast forward today and the SQL standard has taken on a new nested JSON query syntax and many of NoSQL and Relational vendors are now using SQL/SQL’ish languages to natively store and access nested JSON data.
Oracle’s NoSQL Database, an extreme scale Key-Value store, has been used in production now for 4 going on 5 years and as with all NoSQL products, its evolution and runtime characteristics have been heavily influenced by early design choices. In the early days of development, some fundamental design choices were made to support not only an eventually consistent operation, but proper transactional operations (Transactional Write Ahead Logging).
In addition, architectural choices were made to support an availability model (via Paxos Groups) that would guard against system wide cascading failures. A concurrency model (topology aware client drivers with asynchronous operational ordering in the server) was devised that provided for massively concurrent insertion and update with flexible choices for consistency on read, based on: sync time, data versioning, and absolute consistency read. Finally, storage layer choices were made (multi-part hierarchical key spacing) that would be able to both optimize network and storage I/O while taking advantage of future direction in storage technologies towards solid state and persistent memory.
These design choices were not taken lightly by Oracle, striking a balance between the needs of a new generation of application / workload and lessons learned from decades of real world mission critical database system implementation. Today we can see others in the NoSQL market moving towards those same choices as early pure play CAP vendors move to support transactions, now validated for NoSQL systems of scale (e.g. Google Spanner /F1).
NoSQL vendors acutely aware of perils in a peer-to-peer (master-less) architecture with cluster wide crashes, move to more isolation via Paxos Grouping (multi-master) for availability and concurrency control (Facebook Apollo). Nearly all vendor products are now offering a “read what you’ve written” capability for applications that must have a consistent read.
Vendors (e.g. Amazon DynamoDB) are also talking about innovations in storage layers that are providing the granularity of data operation that maps well to next generation storage such as SSD and persistent memory.
Future users of NoSQL technology should take the time to understand these tradeoffs and look for a product that both meets their current requirements and has a foundation that simplifies application development, avoiding the pitfalls inherent in early era NoSQL design choices.