On Real-time Databases. Q&A with Andrei Gorine and Steve Graves
“Another rapidly evolving concept is real-time mixed criticality systems that integrate components with different levels of criticality onto a single computing platform.”
Q1. Why is there a need for real-time databases?
Data has become the greatest asset in today’s world. Real-time systems have become more complex and need to deal with a high volume of more complex data. Although not exclusively, these systems are often characterized as sensor data fusion. Sensor data fusion is the process of combining sensor data or data derived from disparate sources such that the resulting information is more accurate, more complete, or more dependable than would be possible when these sources were used individually. The data sources for a fusion process don’t have to originate from identical sensors.
Consider a modern automobile. It is typically equipped with LIDAR, RADAR and optical cameras, GPS, maps and other data inputs to functions such as adaptive cruise control, lane keeping, collision avoidance, advanced driver assistance systems (ADAS) and more. These data inputs from the different sensors must be collected in a central location (a database), correlated, analyzed, and acted upon within real-time constraints.
A similar premise drives avionic and navigation systems: if data collected from different sources and fed into the avionic application can’t be stored and processed in time it is infinitely better to discard the old data than to make control decisions based on “old” outdated values. Again, any complex system that falls under the broad heading of being a sensor data fusion challenge is a ripe candidate for a real-time database management system.
Another rapidly evolving concept is real-time mixed criticality systems that integrate components with different levels of criticality onto a single computing platform. The term “criticality” often refers to the functional safety, which is built upon concepts of enforcing tasks’ spatial and temporal independence. To that end, safety-critical tasks impose timing deadline requirements on systems’ critical functions. Meanwhile, mixed criticality systems’ components are routinely data-driven and share data repositories between tasks of different criticality. A database management system integrated with such components must conform to tasks’ criticality levels and guarantee integrity and temporal and logical consistency of shared data.
Q2. What are the considerations of real-time applications working with databases?
First, let’s briefly clear up a common misconception. The term “real-time” is quite often equated with “fast” or “high-throughput”. Let’s not confuse the two. Real-time systems draw a clear line between “on-time” and late response, with consequences, sometimes severe if the line is crossed. Late response is an invalid response. For high-throughput systems speed is a figure of merit, but there is not a hard separation between “acceptable” and“unacceptable” and a late answer is still a valid answer (whatever late means in the absence of deadlines).
Both real-time and non-real-time database systems are data repositories that provide services for data storage and retrieval. There are several key differences between the two, but for the purposes of this discussion let’s emphasize three:
The first difference lies in the consistency models. Non-real-time databases use transactions to enforce logical consistency which presents a view on data consistent with some predefined constraints of the database. These are often referred to as ACID properties of transactions. In addition to logical consistency, real-time transactions are assigned timing deadlines to make sure that data used by them reflects the current physical environment. These deadlines can be successfully satisfied, or missed (in which case the transaction is aborted). Yet real-time transactions must never pass their assigned deadlines: the database kernel must return control to the calling application within the timeframe assigned to the transaction.
Secondly, the performance goals are different. Non-real-time database systems strive to optimize for the number of transactions per second or the average query time. A real-time DBMS performance is judged by its worst-case transactions time. This is a key differentiator between real-time and non-real-time databases. For instance, the worst-case query time for Oracle is something that most folks don’t care about (and frankly Oracle doesn’t even specify). On the other hand, the average number of queries per second is a banner spec for any traditional DBMS.
Another factor to consider is the overall complexity of a database management system. Traditional, non-real-time DBMS routinely employ complex algorithms, use background processing, and schedule tasks to promote the average database access performance. A trivial example is a simple B-tree index with its logarithmic complexity — one or another implementation of a B-tree index is present in all database managements systems.
In general, it is often impossible to know in advance the complexity of various database operations. Consequently, making an upfront accurate estimate of a database query execution is not possible either. Even when possible, calculating worst-case scenarios could be “too pessimistic” to be practical. Therefore, a real-time database system must prioritize simpler deterministic algorithms and scheduling policies that enable deadline management.
Q3. Is it possible for a real-time database to use non-volatile media?
As discussed, the key differentiator for real-time database systems is their deterministic, predictable nature. Even volatile memory architectures can be non-deterministic. Examples include NUMA architectures or L1/2/3 cache subsystems that can have a profound impact on the worst-case main memory access time. Persistent storage devices routinely specify maximum read/write latencies. Yet, in reality, media access is orders of magnitude less deterministic in nature than any RAM-based architectures. Persistent media I/O is algorithmically complex, often highly uncontrollable, and full of subtle differences. Moreover, I/O optimizations are spread across many hardware implementations and software storage stacks.
Despite all the complexity of the storage stack, a real-time database management system can offer deadline management algorithms that promote transactions’ predictability. While it is not possible to guarantee transaction deadlines, it is quite possible to detect broken deadlines, and to preserve logical integrity. What’s more, minimizing the frequency of broken deadlines is an absolutely realistic objective.
Using a real-time DBMS for transient storage does not automatically guarantee that real-time tasks will never miss their deadlines. It can guarantee deterministic scheduling if the storage media access is deterministic. Persistent storage media is by design less deterministic than any volatile media. When the storage media is non-volatile the application requirements must account for the possibility of a missed database deadline. A DBMS will catch the missed deadline, alert the application and preserve data integrity. When used properly a real-time database can help developers fulfill the real-time requirements of their systems.
Q4. Is it possible to make use of a normal “Linux-like” file system?
Firstly, it helps to remember that, by definition, a real-time application must complete real-time tasks within specific time constraints. In practice, real-time applications allow different degrees of leniency for breaking these constraints — anywhere from never (the “hard real time”) to tolerable to a degree (the “firm” or “soft real-time”). However, if any real-time requirements exist, these applications run in the context of a real-time operating system (RTOS) that provides assurance that no single operation exceeds certain timing constraints.
Linux is not a real-time operating system. Linux alone cannot be used to serve as a hard real real-time platform. A “Linux-like file system” is in fact a combination of a particular file system architecture and a set of lower-level media drivers and services, which can vary quite widely. Ext4, for instance, can have over 50 unique configuration parameters (e.g., block size, inode size, journal options, etc.), each combination of which can have a significant impact on I/O latency. XFS — another popular choice on Linux, has over 35 parameters. In combination with the non real-time Linux scheduler and storage devices’ variety, it is impractical to expect any sort of timing guarantees.
For a real-time DBMS that rests its deadline management on the underlying storage stack timing guarantees, any use of these file systems seems too far-fetched.
Q5. Why is flash storage unable to provide real-time guarantees?
Flash memory has many benefits and is a primary choice for embedded systems that need high speed data transfer, reliability, and low power consumption. That said, real-time processing is all about stable latencies. Most modern flash devices each have distinctive (and often peculiar) design. However, all of them must address intrinsic flash issues. Just to mention a few — garbage collection mechanisms providing a means for recycling invalid space. Cache subsystems (buffering) are designed to maximize the lifetime as well as the performance of flash devices. Wear leveling mechanisms that allow balancing the wear out on flash memory blocks. Without astute handling of these matters, flash devices can’t be used reliably (if at all). Normally, these operations are managed in the background, presenting an imminent threat to latency predictability.
Q6. Version 2.0 of eXtremeDB/rt will support persistent storage as well as transient databases. How do you achieve real-time performance for persistent databases when file I/O is inherently non-deterministic?
Firstly, and we have already touched on this topic, the database kernel is able to guarantee the transactions’ timing deadlines only if the underlying storage stack adheres (mostly low-level I/O) to its posted latencies. However, the kernel can be trapped in a blocking I/O call. To compensate, the eXtremeDB/rt deadline management is built with the understanding that the I/O call can be “late”. eXtremeDB/rt then ensures the integrity of the database and notifies the application of the “broken” real-time deadline, allowing the application to adjust execution.
Then, support for real-time transactions requires transaction recovery algorithms to be deterministic, in other words, to fit into know-in-advance timeframes. Specifically, the eXtremeDB/rt real-time kernel is based on the assumption that transactions’ time to reverse modifications to the database up to any point in the transaction does not exceed the time required to apply those modifications. The deadline management algorithms are built to maintain this strategy.
Traditional databases that store data on persistent media often make use of the log-based recovery mechanisms, such as the “immediate modification” technique (UNDO LOG) or the “write ahead logging” (REDO LOG). Both algorithms maintain a log file that keeps data used to restore the database to a consistent state in the event the transaction needs to be rolled back. The use of log-based techniques is justified by their advanced conventional performance and space optimizations.
However, both policies must synchronize (transfer) the content of the internal operating system buffer and/or file system cache to the persistent media (this operation is referred to as “flushing”). The time and the number of transfer operations depends on the size of data being written to the database by the transaction, which makes it difficult to evaluate the transactions’ rollback time upfront.
eXtremeDB/rt uses a recovery algorithm based on copy-on-write (CoW), sometimes referred to as “Shadow Copy”. In contrast to the log-based recovery approach, the alternative CoW-based method of rollback uses media write access patterns that are short and predictable. That allows the database kernel to enforce transactions’ deadlines.
Q7. How does the copy-on-write algorithm influence the ability to provide soft or hard real-time determinism?
Copy-on-write algorithms have many advantages, particularly considering flash-based storage access patterns. However, our rationale of employing CoW-based recovery algorithms is primarily the predictability aspect. eXtremeDB/rt’s storage layout is page-based. Database “storage devices” – memory segments for transient databases, raw media partitions or files for persistent databases, are logically divided into fixed size segments called “pages”. Persistent devices’ pages are typically large (1024 bytes, 4K, etc.), while eXtremeDB/rt transient pages are typically small (100 bytes to 1K). Database logical layout entities — objects, indexes, etc., are built with either large or small pages.
While log-based recovery algorithms are concerned with logical database entities, the CoW-based algorithms work on the page level. With the CoW-based approach, a “page” is saved prior to any modifications and restored when necessary, without regard to any logical database structure. When a transaction modifies any data that belongs to a page, the original page is saved to a new page. All modifications are then applied to this newly created page, while the original page remains “untouched”. The kernel also maintains a “map” that associates the physical page offsets and the logical representation of those offsets to the rest of the database runtime. The crux of the CoW method lays in the simplicity and predictability of the commit and rollback processes: during transaction commit the kernel simply “declares” a new set of pages “current” forgetting the “old” set; in the case of rollback, the “old” set of pages is declared “current” and the “new” one is discarded. Unlike the log-based policies, until the time of the “declaration”, the database has two consistent sets of data and it is always possible to tell how much time is necessary to roll back the transaction’s modifications. The switchover time between the “old” and the “new” set of pages is also predictable (and short).
Q8. Who are your typical customers and what do they use eXtremeDB/rt for?
As we mentioned at the outset, eXtremeDB/rt is well-suited to any embedded/real-time system that is grappling with sensor data fusion. In such systems, the ability to enforce temporal consistency (i.e., that the content of the database reflects the real world within set timeframes) is crucial. For example, returning to the example of advanced driver assistance systems (ADAS), mayhem would ensue if the ML/AI component was retrieving GPS data from one timeframe and LIDAR and optical data from another timeframe.
Accordingly, eXtremeDB/rt has been licensed by a tier 1 automotive company for an ADAS pilot project. A major electric motorcycle company has licensed it to be embedded in the telematics control unit (TCU) to ingest and act upon data captured by the TCU. The TCU collects telemetry data from the motorcycle, such as GPS position (GNSS data), speed, engine data, and connectivity information through interfacing with various sub-systems over data and control buses on the motorcycle. A leader in the supply chain industry has adopted eXtremeDB/rt for a track and trace solution that provides the ability to identify the past and present locations of all product inventory, as well as a history of product custody. We are also fielding interest from power generation/distribution system vendors, rail/locomotive system vendors, and more.
Q9. Anything else you wish to add?
Since the introduction of the eXtremeDB/rt in 2020, meaningful technical improvements to the product have been made. Aside from the upcoming support for persistent storage, the transaction scheduling has evolved from a simple Earliest Deadline First (EDF) algorithm that favors transactions which are closer to their deadlines to a more sophisticated (and closer to reality) algorithm that takes into account transaction priority (the explicit measure of the transaction “importance”). Preemption has been added to the real-time database kernel — it now allows a “more important” transaction to kick the “less important” transaction out and perhaps reschedule it.
Yet more practical improvements are called for. Use-cases call for real-time scheduling improvements as the current transactions’ scheduling is not very efficient in overload conditions. It is also true that sometimes transactions reflecting real-world events is more important than the databases’ internal consistency, so considerations ought to be made for adding external consistency provisions at the expense of relaxing the ACID properties.
Andrei Gorine, Chief Technology Officer McObject
Andrei Gorine leads the company’s product engineering. As CTO, he has driven the growth of the eXtremeDB embedded database system, from the product’s conception to its current wide usage in virtually all embedded systems market segments. Mr. Gorine’s strong background includes senior positions with leading embedded systems and database software companies; his experience in providing embedded storage solutions in such fields as industrial control, industrial preventative maintenance, satellite and cable television, and telecommunications equipment is highly recognized in the industry. Mr. Gorine has published articles and spoken at many conferences on topics including real-time database systems, high availability, and memory management. Over the course of his career he has participated in both academic and industry research projects in the area of real-time database systems. Mr. Gorine holds a Master’s degree in Computer Science from the Moscow Institute of Electronics and Mathematics, and is a member of IEEE and ACM.
Steven Graves, Chief Executive Officer, McObject
Mr. Graves co-founded McObject in 2001. As the company’s president and CEO, he has both spearheaded McObject’s growth and helped the company attain its goal of providing embedded database technology that makes embedded systems smarter, more reliable and more cost-effective to develop and maintain. Prior to McObject, Mr. Graves was president and chairman of Centura Solutions Corporation, and vice president of worldwide consulting for Centura Software Corporation (NASDAQ: CNTR); he also served as president and chief operating officer of Raima Corporation. Mr. Graves is a member of the advisory board for the University of Washington’s certificate program in Embedded and Real-time Systems Programming. For Steve’s updates on McObject, embedded software and the business of technology, follow him on Twitter.
Sponsored by McObject