On In-memory Databases and Raima Database. Q&A with Jeff Parsons
Q1. What are the advantages and disadvantages of in-memory databases?
In-memory databases, by definition, handle the data they are processing in main memory. There is no need to deal with secondary storage which can be orders of magnitude slower than accessing data held in main memory. Eliminating the requirement of accessing the slower secondary storage allows for the use of algorithms in an in-memory database that would not be feasible for a disk-based database. As an example, a disk-based database commonly uses a b-tree based index to limit the number of disk access required to locate a row. An in-memory database can use an AVL tree instead of a b-tree which reduces (or eliminates) the need to duplicate data, but increases the number of rows accessed during traversal.
Without the requirement for secondary storage an in-memory database can be used in systems without any secondary storage.
This enables the use a database engine in a lot of embedded systems that could not support a traditional disk-based engine.
Typically, the type of memory used with an in-memory database system is not persistent. When an application closes, whether cleanly or unexpectedly, the data stored in an in-memory database will be lost. Certain application domains do not require data persistence between runs, but others may. Those applications that require a high-degree of persistence may not be suitable for using an in-memory database.
An in-memory database stores all data in main memory which can severely limit the amount of data that can be stored. Most database systems can handle allocating memory ad-hoc for storing database objects or can be given a chunk of memory to use as storage. Either way the volume of data that can be stored in an in-memory database tends to be much smaller than that stored in a disk-based system.
Q2. What is the difference between an in-memory database and simply storing data in shared memory segments?
The biggest difference between in-memory databases and storing data in shared memory segments deals with a structured approach to data access. In-memory databases maintain components of the “ACID” attributes of data storage engines. The ACID properties include Atomicity, Consistency, Isolation and Durability. While in-memory databases may relax the durability property based on non-persistent storage, they commonly support the other properties:
• Atomicity – multiple changes are committed to the database as an all or nothing operation
• Consistency – structural rules and relationships are maintained for all users
• Isolation – transactional changes cannot be seen by other users until they are committed
It can be time-consuming and error-prone to implement this functionality on top of shared memory segments.
In addition to transaction properties of in-memory databases, there are many other
out of the box advantages including:
Use of common well-defined data definition languages (SQL DDL statements) Use of common well-defined data query languages (SQL DML statements) Remote access to data via a network communication protocol
Ability to store data in a platform independent format
Tools available for import/export through CSV, XML, JSON, etc. Ability to persist in-memory data to disk
Q3. An in-memory database can be subject to data loss if something stops working: How do you cope with this?
An application developer needs to understand that data loss is a possibility when working with an in-memory database. There are several approaches, such as persistence and replication, used to mitigate data loss scenarios, but data loss is still a possibility.
Raima supports two modes for opening a database in-memory.
• Volatile – the database is empty when first opened and all contents are discarded when the database is closed
• Persistent – on open the database is populated from content on secondary storage, on close changed data (inserts, updates, deletes) is written out to secondary storage
If a database is opened using the persistent in-memory mode changed content will automatically be written to secondary storage when the database is closed. In addition, the developer can persist change to secondary storage on demand. Using
persistence can limit data loss to what has happened since the last persistence operation.
Many database systems support replicating changes to another database instance (or another database system). Using replication allows data that may have been lost from the in-memory copy to be recovered from the replicated copy.
Q4. Are all embedded databases also in-memory databases?
An embedded database can be defined as database engine that runs within an application and does not require other processes to be installed, configured, or accessed. The storage media for the data managed by the engine is implementation dependent.
Originally most database engines used a hard-disk for data storage as the amount of memory available would not allow for sufficient data volume for useful data sets. As the size of memory increase many vendors began adding in-memory capabilities. Today many database engines, embedded or traditional, have some support for in- memory.
Q5. How does Raima implement an in-memory database system in RDM?
The Raima RDM database engine is divided into two components: the runtime engine and the storage engine.
The runtime is responsible for transaction management, rule enforcement, query processing, and maintaining local, uncommitted, changes to the database. Whenever the runtime engine needs a database object, it requests the object from the storage engine. In addition, when local changes in the runtime are committed, they are given to the storage engine to be stored in a location shared between all users of the database. The runtime engine does not know or care if the storage engine is using main memory or secondary storage.
The storage engine can be thought of as a key/value pair repository. The keys are database object identifiers and the values are the location of the database objects. For disk-based databases the values will be a file identifier, file offset, and size, for in- memory database the values will be the memory location of the database object.
The objects themselves may be encoded, compressed, and encrypted, but the runtime engine will know how to interpret the contents.
By maintaining this abstraction, an application developer doesn’t need to do anything other than specify a pre-open configuration parameter to use an in- memory RDM database. The application, runtime, and storage engine can all run in the same memory space providing very low latency access. If a developer knows that a database will be primarily used in-memory they can utilize methods such as AVL indexing to optimize for in-memory access.
Jeff Parsons, Software Engineer, Raima