Skip to content

On Databases and Non-Volatile Memory technologies. Interview with Joy Arulraj and Andrew Pavlo

by Roberto V. Zicari on May 31, 2019

“When we started this project in 2013, it was a moonshot. We were not sure if NVM technologies would ever see the light of day, but Intel has finally started shipping NVM devices in 2019. We are excited about the impact of NVM on next-generation database systems.” — Joy Arulraj and Andrew Pavlo.

I have interviewed Joy Arulraj, Assistant Professor of Computer Science at Georgia Institute of Technology and Andrew Pavlo, Assistant Professor of Computer Science at Carnegie Mellon University. They just published a new book “Non-Volatile Memory Database Management Systems“. We talked about non-volatile memory technologies (NVM), and how NVM is going to impact the next-generation database systems.

RVZ

Q1. What are emerging non-volatile memory technologies?

Arulraj, Pavlo: Non-volatile memory (NVM) is a broad class of technologies, including phase-change memory and memristors, that provide low latency reads and writes on the same order of magnitude as DRAM, but with persistent writes and large storage capacity like an SSD. For instance, Intel recently started shipping its Optane DC NVM modules based on 3D XPoint technology [1].

Q2. How do they potentially change the dichotomy between volatile memory and durable storage in database management systems?

Arulraj, Pavlo: Existing database management systems (DBMSs) can be classified into two types based on the primary storage location of the database: (1) disk-oriented and (2) memory-oriented DBMSs. Disk-oriented DBMSs are based on the same hardware assumptions that were made in the first relational DBMSs from the 1970s, such as IBM’s System R. The design of these systems target a two-level storage hierarchy comprising of a fast but volatile byte-addressable memory for caching (i.e., DRAM), and a slow, non-volatile block-addressable device for permanent storage (i.e., SSD). These systems take a pessimistic assumption that a transaction could access data that is not in memory, and thus will incur a long delay to retrieve the needed data from disk. They employ legacy techniques, such as heavyweight concurrency-control schemes, to overcome these limitations.

Recent advances in manufacturing technologies have greatly increased the capacity of DRAM available on a single computer.
But disk-oriented systems were not designed for the case where most, if not all, of the data resides entirely in memory.
The result is that many of their legacy components have been shown to impede their scalability for transaction processing workloads. In contrast, the architecture of memory-oriented DBMSs assumes that all data fits in main memory, and it therefore does away with the slower, disk-oriented components from the system. As such, these memory-oriented DBMSs have been shown to outperform disk-oriented DBMSs. But, they still have to employ heavyweight components that can recover the database after a system crash because DRAM is volatile. The design assumptions underlying both disk-oriented and memory-oriented DBMSs are poised to be upended by the advent of NVM technologies.

Q3. Why are existing DBMSs unable to take full advantage of NVM technology?

Arulraj, Pavlo: NVM differs from other storage technologies in the following ways:

  • Byte-Addressability: NVM supports byte-addressable loads and stores unlike other non-volatile devices that only support slow, bulk data transfers as blocks.
  • High Write Throughput: NVM delivers more than an order of magnitude higher write throughput compared to SSD. More importantly, the gap between sequential and random write throughput of NVM is much smaller than other durable storage technologies.
  • Read-Write Asymmetry: In certain NVM technologies, writes take longer to complete compared to reads. Further, excessive writes to a single memory cell can destroy it.

Although the advantages of NVM are obvious, making full use of them in a DBMS is non-trivial. Our evaluation of state-of-the-art disk-oriented and memory-oriented DBMSs on NVM shows that the two architectures achieve almost the same performance when using NVM. This is because current DBMSs assume that memory is volatile, and thus their architectures are predicated on making redundant copies of changes on durable storage. This illustrates the need for a complete rewrite of the database system to leverage the unique properties of NVM.

Q4.With NVM, which components of legacy DBMSs are unnecessary?

Arulraj, Pavlo: NVM requires us to revisit the design of several key components of the DBMS, including that of the  (1) logging and recovery protocol, (2) storage and buffer management, and (3) indexing data structures.

We will illustrate it using the logging and recovery protocol. A DBMS must guarantee the integrity of a database against application, operating system, and device failures. It ensures the durability of updates made by a transaction by writing them out to durable storage, such as SSD, before returning an acknowledgment to the application. Such storage devices, however, are much slower than DRAM, especially for random writes, and only support bulk data transfers as blocks.

During transaction processing, if the DBMS were to overwrite the contents of the database before committing the transaction, then it must perform random writes to the database at multiple locations on disk. DBMSs try to minimize random writes to disk by flushing the transaction’s changes to a separate log on disk with only sequential writes on the critical path of the transaction. This method is referred to as write-ahead logging (WAL).

NVM upends the key design assumption underlying the WAL protocol since it supports fast random writes. Thus, we need to tailor the protocol for NVM. We designed such a protocol that we call write-behind logging (WBL). WBL not only improves the runtime performance of the DBMS, but it also enables it to recovery nearly instantaneously from failures. The way that WBL achieves this is by tracking what parts of the database have changed rather than how it was changed. Using this logging method, the DBMS can directly flush the changes made by transactions to the database instead of recording them in the log. By ordering writes to NVM correctly, the DBMS can guarantee that all transactions are durable and atomic. This allows the DBMS to write fewer data per transaction, thereby improving a NVM device’s lifetime.

Q5. You have designed and implemented a DBMS storage engine architectures that are explicitly tailored for NVM. What are the key elements?

Arulraj, Pavlo: The design of all of the storage engines in existing DBMSs are predicated on a two-tier storage hierarchy comprised of volatile DRAM and a non-volatile SSD. These devices have distinct hardware constraints and performance properties. The traditional engines were designed to account for and reduce the impact of these differences.
For example, they maintain two layouts of tuples depending on the storage device. Tuples stored in memory can contain non-inlined fields because DRAM is byte-addressable and handles random accesses efficiently. In contrast, fields in tuples stored on durable storage are inlined to avoid random accesses because they are more expensive. To amortize the overhead for accessing durable storage, these engines batch writes and flush them in a deferred manner. Many of these techniques, however, are unnecessary in a system with a NVM-only storage hierarchy. We adapted the storage and recovery mechanisms of these traditional engines to exploit NVM’s characteristics.

For instance, consider an NVM-aware storage engine that performs in-place updates. When a transaction inserts a tuple, rather than copying the tuple to the WAL, the engine only records a non-volatile pointer to the tuple in the WAL. This is sufficient because both the pointer and the tuple referred to by the pointer are stored on NVM. Thus, the engine can use the pointer to access the tuple after the system restarts without needing to re-apply changes in the WAL. It also stores indexes as non-volatile B+trees that can be accessed immediately when the system restarts without rebuilding.

The effects of committed transactions are durable after the system restarts because the engine immediately persists the changes made by a transaction when it commits. So, the engine does not need to replay the log during recovery. But the changes of uncommitted transactions may be present in the database because the memory controller can evict cache lines containing those changes to NVM at any time. The engine therefore needs to undo those transactions using the WAL. As this recovery protocol does not include a redo process, the engine has a much shorter recovery latency compared to a traditional engine.

Q6. What is the key takeaway from the book?

Arulraj, Pavlo: All together, the work described in this book illustrates that rethinking the key algorithms and data structures employed in a DBMS for NVM not only improves performance and operational cost, but also simplifies development and enables the DBMS to support near-instantaneous recovery from DBMS failures. When we started this project in 2013, it was a moonshot. We were not sure if NVM technologies would ever see the light of day, but Intel has finally started shipping NVM devices in 2019. We are excited about the impact of NVM on next-generation database systems.

————————–

joy

Joy Arulraj is an Assistant Professor of Computer Science at Georgia Institute of Technology. He received his Ph.D. from Carnegie Mellon University in 2018, advised by Andy Pavlo. His doctoral research focused on the design and implementation of non-volatile memory database management systems. This work was conducted in collaboration with the Intel Science & Technology Center for Big Data, Microsoft Research, and Samsung Research.

Pavlo

Andrew Pavlo is an Assistant Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. At CMU, he is a member of the Database Group and the Parallel Data Laboratory. His work is also in collaboration with the Intel Science and Technology Center for Big Data.

—————-

Resources

[1] Intel Announces Optane DIMMs, Aims to Disrupt Memory/Storage Hierarchy

– Non-Volatile Memory Database Management Systems. by Joy Arulraj, Georgia Institute of Technology, Andrew Pavlo, Carnegie Mellon University. Book, Morgan & Claypool Publishers, Copyright © 2019, 191 Pages.
ISBN: 9781681734842 | PDF ISBN: 9781681734859 , Hardcover ISBN: 9781681734866
DOI: 10.2200/S00891ED1V01Y201812DTM055

How to Build a Non-Volatile Memory Database Management System (.PDF), Joy Arulraj Andrew Pavlo

Related Posts

– On Learned Index Structures. Interview with Alex Beutel. ODBMS Industry Watch, December 24, 2018

– On in-database machine learning. Interview with Waqas Dhillon. ODBMS Industry Watch, November 17, 2018

Follow us on Twitter: @odbmsorg

##

 

 

 

From → Uncategorized

2 Comments Leave one →
  1. It is unfortunate that the authors did not include eXtremeDB in their research. If they had, they would know that eXtremeDB has had these capabilities since 2006, when we began supporting Curtiss-Wright boards with battery-backed RAM, and was proven again in 2013 with published papers demonstrating the superior performance and durability of NVDIMMs from Agigatech and Micron (they replace the battery with a super capacitor) versus PCIe SSDs, and in 2017 testing with eXtremeDB and Optane.

    While the terminology is different, the approach described in this article as ‘write behind logging’ is similar to eXtremeDB, in which data is updated in place and only information needed to undo the transaction in case of overt rollback or application/system crash is kept. Once the transaction is committed, the undo information is no longer needed and is overwritten by the next transaction. The philosophy of eXtremeDB has always been that commits are the norm and should be fast, while aborts are exceptional and can be slightly more time consuming (but still blazingly fast).

    As the authors describe in their case, the index structures used in eXtremeDB are modified for the in-memory use case to be faster and smaller. (eXtremeDB was originally conceived, designed and implemented as an in-memory database in 2001.)

    The authors are absolutely correct that a DBMS must be designed for this; not just any ol’ in-memory database can get maximum leverage from persistent memory. That said, there’s absolutely no difference from the DBMS’ perspective between 2006’s battery-backed RAM, 2013’s NVDIMM, and modern Optane or other persistent memory technology.

    Steven Graves, CEO, McObject

    • Joy Arulraj permalink

      Steven — Thanks for sharing the architecture of eXtremeDB!

      The write-behind logging protocol is indeed inspired by several prior research efforts in both memory-centric and disk-centric database systems. We recently came to know that a paper advocating a no-redo/undo recovery protocol was published in SIGMOD’75 [1]. The author states that they were developing a database system for the Puerto Rican DOT and that this design decision was geared towards handling frequent power outages during the summer. Write-behind logging differs from prior no-redo/undo recovery protocol in the manner in which it is tightly integrated with the multi-versioned concurrency control protocol.

      [1] File structure design to facilitate on-line instantaneous updating, Robert L. Rappaport

Leave a Reply

Note: HTML is allowed. Your email address will not be published.

Subscribe to this comment feed via RSS