On Predictable Real-Time Data Management Over Persistent Storage Media. Q&A with Andrei Gorine and Steve Graves

by Roberto Zicari · October 18, 2024

Q1. As real-time systems grow in complexity, they require managing increasingly large and intricate data. What are the main challenges in managing the predictability aspects of real-time databases?

Several challenges must be addressed.

In a real-time system, overall predictability hinges on the predictability of each system component. Even simple embedded systems now consist of multiple hardware and software components, all contributing to response time unpredictability. For instance, when a database’s storage media is RAM, various factors—such as CPU cache subsystems, Translation Look-aside Buffers (TLBs), and, in some cases, Non-Uniform Memory Access (NUMA)—influence memory access times depending on database access patterns. The situation becomes even more complex with persistent storage. Hard disks introduce mechanical delays, and flash media management adds further unpredictability due to the use of complex, indeterministic algorithms, which can be exacerbated by specialized flash controllers.

Q2. The volume of data or regulatory requirements for retention may necessitate the use of persistent storage. However, storage technologies such as hard disks, NAND flash, and SSDs introduce unpredictable latency. How can timing constraints be met in such cases?

As discussed earlier, achieving real-time guarantees requires all system components, including storage, to exhibit deterministic behavior. To ensure predictability, the entire storage stack—from the application’s database access interface, through the database kernel, to the OS-level components such as the file system (if applicable), high-level storage drivers, low-level media drivers, and finally the storage medium—must provide strict upper bounds on access times.

Many components in this stack are inherently nondeterministic. For example, hard disk drives (HDDs) are mechanical devices where access times depend on the positioning of the read/write heads, platters, and actuators. Widespread fully managed NAND flash devices, such as embedded MultiMediaCards (eMMC), Universal Flash Storage (UFS), and Solid State Drives (SSDs), incorporate built-in controllers that optimize performance and extend media lifespan. However, these controllers introduce variability in access times, particularly through software layers like flash translation layers (FTLs), which manage flash-specific operations—such as garbage collection and bad block management—in often unpredictable ways. File systems further compound this complexity by leveraging caching and buffering strategies that, while improving performance, reduce the predictability of access timing. These combined factors make determining the worst-case execution time (WET) highly challenging, and even when it is quantifiable, the WET may still be unacceptably high.

Since full control over the entire stack is rarely feasible, a practical approach involves selecting storage devices designed without nondeterministic components and minimizing the number of uncontrolled variables in the stack. Unmanaged NAND flash devices, combined with a custom flash translation layer (FTL) that is finely tuned to database system access patterns, has proven effective, especially in real-time embedded applications.

By leveraging a copy-on-write (COW) based transaction management system alongside an FTL that incorporates deterministic garbage collection, bit error correction, bad block management, and read disturbance mitigation, real-time deadlines can be met. Low-level NAND operations (read, program, erase) typically have well-defined upper limits on execution times, specified by the manufacturer. These constraints can be factored into both design-time planning and runtime deadline management.

Q3. What are the limitations of Log-Based Transaction Processing Policies in Achieving Predictable Transaction Execution?

Traditional log-based transaction processing policies, such as Immediate Modification Logging and Write-Ahead Logging (WAL), are designed to enforce database integrity but fall short in delivering predictable transaction execution times. This arises from several inherent design limitations.

A fundamental property of database management systems (DBMS) is their ability to restore the database to a consistent state following a failure. Log-based recovery algorithms maintain comprehensive records of all transaction modifications, preserving two states: the “last consistent state” before the current transaction and the “current state,” which reflects the modifications made by that transaction.

In WAL, the system maintains the previous state in the database while concurrently logging modifications. Conversely, Immediate Modification Logging updates the database state immediately, allowing for changes to be rolled back through UNDO operations. However, both approaches introduce significant and unknown in advance CPU and I/O overhead, adversely affecting the predictability of transaction timing. Operations such as transaction commit and rollback require transferring data between the log and the database, and the volume of data involved can vary widely based on transaction complexity. This variability leads to unpredictable I/O processing times.

An alternative approach is the use of an implementation of the copy-on-write (CoW) algorithm, which can significantly reduce transaction rollback times and provide the predictability required in real-time systems. Although CoW may not outperform log-based methods in terms of raw performance, it aligns more effectively with the predictability requirements of real-time systems. This method allows for a transition from the current transaction state to the last consistent state by writing a fixed number of bytes, regardless of transaction complexity. As a result, it enables transaction commit and transaction recovery processes to be completed within an anticipated deadline.

In summary, while traditional log-based policies maintain database integrity and are optimized for higher average transactions per second (TPS), their inability to guarantee predictable transaction execution times stems from inherit indeterminable I/O and processing overheads. In contrast, CoW algorithms present a viable alternative, offering the reliability and predictability necessary for real-time applications, while enforcing the requisite database integrity.

Q4. What are the fundamentals of Shadow Copy (Copy-on-Write) Based Transaction Manager Implementation.

The core principle of the shadow copy (copy-on-write) technique is that the database kernel tracks data modifications at the physical page level rather than the logical record level. This approach relies on mapping logical blocks to physical addresses, allowing the database kernel to reference logical elements such as tables, records, and indexes through logical page addresses. These logical addresses are then mapped to physical storage (e.g. RAM or NAND flash).

The transactional guarantees are achieved by maintaining two “page maps” – a “current” page map that reflects the modifications made by the active transaction and a “shadow” page map that represents the state of the database before the transaction began.

When a logical database element is modified, the changes are written to a new physical page rather than overwriting the original page. The new physical address is updated in the “current” page map. Any subsequent modifications to the same logical page during the transaction are written to this new physical page, ensuring the original page remains unchanged.

At the point of transaction commit or rollback, the database kernel performs a simple, atomic page map swap. This operation either commits the changes by switching to the new page map or rolls back by retaining the shadow map, making the process efficient and predictable while minimizing complexity.

Q5. Why do you believe that CoW-based transaction managers are better suited for embedded real-time applications?

Transaction managers in real-time systems differ from traditional transaction managers by requiring transactions to be completed within strict deadlines. One approach to achieving this is through continuous progress monitoring by the transaction manager, which can initiate a rollback if it determines the transaction will not meet the specified time constraint. The transaction manager anticipates the optimal point to abort the current transaction’s activity, thereby ensuring the transaction’s overall timeline is maintained.

The timing guarantee of the rollback process is determined by two key factors: the volume of data to be copied or moved and the time required to write that data to storage. Unlike log-based algorithms, where the amount of rollback data can vary due to higher-level transaction logic, the lower-level nature of copy-on-write (CoW) algorithms offers a deterministic approach. At any given point during the transaction, the exact amount of data — the number of “physical” pages, to be copied is known, making the rollback process time-bound—an essential requirement for real-time systems.

Embedded systems primarily utilize two types of storage media: RAM and NAND flash devices. When the database storage resides in RAM, and the database kernel tracks the number of in-memory pages required for a rollback, it becomes possible to guarantee the rollback time. This is achieved by accounting for the time required to move or copy each page, ensuring a precise rollback duration. When the storage media is a NAND flash device, copy-on-write algorithms become even more critical. Access to flash media is managed by three key components in the storage stack: low-level device drivers, flash translation layers (FTLs), and, in some cases, flash file systems. For predictable transaction execution, each component in this storage stack must be capable of providing real-time guarantees.

To reduce uncertainties and simplify the storage stack, flash file systems can and should be omitted. File systems introduce variability in worst-case latencies by caching reads, buffering writes, and generating additional I/O to manage physical layout metadata.

While low-level flash media drivers provided by manufacturers typically offer predictable latency specifications, flash translation layers (FTLs) still introduce several sources of latency. Key flash management processes—such as background garbage collection, wear leveling, and bad block management—are significant contributors. Flash memory has a limited number of write cycles per sector, and wear-leveling mechanisms are employed to extend the device’s lifespan by evenly distributing writes across memory sectors. Garbage collection reclaims unused or stale data, freeing up space, while bad block management identifies and marks defective blocks, ensuring data is not written to compromised areas.

Copy-on-write (CoW) techniques employed by some FTLs help reduce unpredictability, enabling higher-level transaction processing to meet real-time guarantees.

Q6. Do you have any experiment results to share?

We compared the performance and predictability of log-based and Copy-on-Write (CoW)-based transaction policies. As expected, factors such as storage media (RAM vs. NAND flash), workload sizes, and data access patterns led to differing results. Two key performance differences emerged between log-based (“logical”) and CoW-based (“physical”) transaction policies: the first is the guaranteed rollback timeframes, and the second is the stability of database storage access.

When evaluating these measurements, it is essential to remember that in real-time systems, stable latencies take precedence over traditional performance metrics, which are treated as secondary considerations. Ensuring that database transactions consistently meet their deadlines is critical, even if this requires aborting a transaction.

With that in mind and given equivalent data record and transaction sizes (i.e., the number of records modified per transaction), log-based policies were, on average, about 10% faster during transaction workload phases. However, copy-on-write (CoW) policies significantly outperformed during the rollback phase—sometimes by an order of magnitude—but more importantly, they offer predictability. This makes CoW policies particularly valuable in real-time systems, where rollback times must be both predictable and known in advance.

Although database transactions typically commit, meaning the database kernel rarely identifies a transaction at risk of missing its deadline, when an abort becomes necessary, it must be deterministic. CoW policies guarantee this. Furthermore, as transaction sizes increase, the performance benefits and deterministic nature of CoW-based rollbacks become even more pronounced.

When using flash storage, the most notable difference is the performance variability introduced by log-based policies, with unpredictable dips in performance. While I/O variability plays a role, this is primarily due to the nature of log-based algorithms, where the amount of data that needs to be committed to storage can be uncertain and unbounded. Although CoW-based transaction managers may be slower on average, their performance remains consistent, without the variability seen in log-based systems. CoW-based policies also limit the data required to be committed to storage during transaction workload, commit, and rollback phases, making them particularly suited for real-time applications.

Q7. Due to the inherent characteristics of flash media, achieving hard real-time capabilities is challenging. What do you see as the path forward?

Flash memory is primarily available in two types: NOR and NAND. Both store data using memory cells based on floating gate transistors. While NOR flash can effectively serve as a medium for read-only real-time databases, NAND flash, with its higher storage density, smaller cell size, and typically larger capacity, is the preferred choice for data storage in many applications. However, certain fundamental issues with NAND flash pose significant obstacles to real-time processing.

Key challenges include wear-leveling, garbage collection, and bad block management. Without proper handling, these characteristics can render flash devices unsuitable for real-time operations. Typically, these issues are addressed through a Flash Translation Layer (FTL) and/or are incorporated into a file system.

To overcome these challenges, integrating an FTL designed with real-time guarantees, directly into the real-time database kernel is beneficial. This approach allows the database kernel to manage both real-time transaction requirements and flash memory handling. By doing so, the database kernel oversees transaction properties, monitors transaction deadlines, and manages logical mappings of flash blocks and pages—eliminating the need for an intermediate software layer, such as a file system or standalone FTL. This direct control enables better deadline management for real-time database transactions.

Granted, this solution requires the database kernel to interact directly with low-level manufacturer-specific flash interfaces, which is not trivial. However, the benefits for applications requiring predictable real-time data management on NAND flash are significant.

Q8. Where did you implement and deploy this new product / technique?

We implemented the copy-on-write based transaction managers in our eXtremeDB/rt product line. The RAM-based solution has been deployed across various hardware platforms and real-time operating systems, including STM32 and NXP hardware, FreeRTOS, Deos (DDC-I), Microsoft Azure ThreadX, Segger’s embOS RTOS, INTEGRITY OS (GreenHills Software), and WindRiver’s VxWorks.

Support for NAND storage is currently in the Beta stage, and the product is under review by our partners. Our first release is expected to include support for STM32Fx ARM Cortex boards equipped with standard SLC NAND components running FreeRTOS, with additional hardware setups to follow soon.

The software has been adopted by companies across multiple industries, including a tier-1 automotive company for an ADAS pilot project, a motorcycle manufacturer for integration into their Telematics Control Unit (TCU), and a supply chain company for a track-and-trace solution to monitor real-time inventory locations.

Q9. Anything else you wish to add?

In conclusion, our approach to real-time data management over persistent flash media integrates a custom flash translation layer directly into the database kernel. This integration enables the real-time database kernel to internally manage processes such as wear-leveling and garbage collection. By tracking transaction deadlines and streamlining flash-specific functions, the system reduces redundancies and overhead by utilizing the same copy-on-write mapping for both FTL and transaction management. This specialized, flash-optimized real-time database kernel ensures that real-time deadlines are consistently met while efficiently managing flash storage.

The reality is that developers of existing FTLs haven’t prioritized real-time requirements because there has been no need. However, as real-time systems are increasingly required to handle more complex operations, the demand for advanced real-time data solutions in sectors like Aerospace & Defense, Automotive, Industrial Automation, Medical, and Robotics can no longer be ignored. “Real fast” is no longer enough—true real-time data management has become essential.

……………………………………

Andrei Gorine, Chief Technology Officer at McObject

McObject co-founder Andrei Gorine leads the company’s product engineering. As CTO, he has driven the growth of the eXtremeDB embedded database system, from the product’s conception to its current wide usage in virtually all embedded systems market segments. Mr. Gorine’s strong background includes senior positions with leading embedded systems and database software companies; his experience in providing embedded storage solutions in such fields as industrial control, industrial preventative maintenance, satellite and cable television, and telecommunications equipment is highly recognized in the industry. Mr. Gorine has published articles and spoken at many conferences on topics including real-time database systems, high availability, and memory management. Over the course of his career he has participated in both academic and industry research projects in the area of real-time database systems. Mr. Gorine holds a Master’s degree in Computer Science from the Moscow Institute of Electronics and Mathematics, and is a member of IEEE and ACM.

Steve Graves, CEO McObject

Mr. Graves co-founded McObject in 2001. As the company’s president and CEO, he has both spearheaded McObject’s growth and helped the company attain its goal of providing embedded database technology that makes embedded systems smarter, more reliable and more cost-effective to develop and maintain. Prior to McObject, Mr. Graves was president and chairman of Centura Solutions Corporation, and vice president of worldwide consulting for Centura Software Corporation (NASDAQ: CNTR); he also served as president and chief operating officer of Raima Corporation. Mr. Graves is a member of the advisory board for the University of Washington’s certificate program in Embedded and Real-time Systems Programming. For Steve’s updates on McObject, embedded software and the business of technology, follow him on Twitter or LinkedIn.

On Predictable Real-Time Data Management Over Persistent Storage Media. Q&A with Andrei Gorine and Steve Graves

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

InterSystems

MySQL/Oracle

Supporters

McObject

Raima

Scality

TIAA

Undo

Volt Active Data