Inaccurate perceptions about in-memory database systems
By Steve Graves, CEO McObject LLC
Having been providing in-memory database system (IMDS) technology to the market for over 16 years (since 2001), and witnessing the massive growth in the popularity of various forms of in-memory computing, we continue to be surprised by misperceptions about the advantages and, especially, the disadvantages of in-memory databases.
There are only two downsides to in-memory database systems, and they are 1) the cost of memory (DRAM) is substantially higher than persistent memory (SSD or HDD), and 2) cost notwithstanding, it is not possible to configure a server with as much DRAM as persistent storage, and that limits the size of an in-memory database on a single server. An in-memory database can be distributed (sharded) across many servers to overcome the size limitation (but that further exacerbates the cost discrepancy between in-memory and persistent DBMS).
Here are some of the more recent inaccurate perceptions about in-memory database systems that we’ve come across.
Very few workloads require an all-in-memory solution.
While ‘very few’ is subjective, it is certainly true that, as a percentage of all DBMS workloads, those requiring an in-memory database system are a minority. That said, on an absolute basis, there are ample use-cases that mandate an all-in-memory database solution. Our customers’ use-cases are mostly (but not exclusively) in network/telecom gear, consumer electronics, industrial control, and aerospace and defense systems.
A 500GB persistent database will need 500GB of memory for an in-memory database
That may be true for a solution that simply substitutes memory as the storage media (or if you just create a RAM-drive and store the database there). However, a true in-memory database system that is written and optimized to use memory as the storage media from the get-go will use far less “storage” (memory) to store any given amount of raw data ingested. For example, a disk-based database system might need between 2MB and 10MB of storage space to ingest 1MB of raw data, whereas a true in-memory database system might need only 1.15MB to 1.5MB to store the same 1MB of raw data. The wide range is somewhat arbitrary and largely depends on the number, type, and complexity of the indexes defined. The more indexes defined, the greater the gap in storage space requirements between in-memory and persistent storage.
In the event of a crash, you lose all the data, or chances of corruption are greater.
With respect to losing all the data, most in-memory database systems have measures to mitigate or eliminate this risk (albeit, at the cost of some of the performance advantage of an IMDS). I’ll give you four examples.
1. Most IMDS provide an ability to back up in the in-memory database, e.g. take a snapshot at a point in time. For some solutions, this includes a “hot back up” feature that can create the back up while the database is active. For others, the database must be forced to be quiescent.
2. An in-memory database can be created in shared memory so that an application software failure does not cause the loss of any data (N.B. this does not protect against a kernel panic or hardware failure).
3. An in-memory database can be replicated to a hot-standby instance (i.e. high availability).
4. A transaction logging mechanism can be employed to be able to recover an in-memory database after a kernel panic or hardware crash (while transaction logging relies on persistent media, an IMDS + transaction logging is still substantially faster than a database stored on persistent media).
With respect to the chances of database corruption being greater, this is simply wrong. An in-memory database is no more, or no less, susceptible to database corruption. Database corruption happens because of defects in the application or defects in the database system run-time, not because of the particular media used to store the database contents.
In summary, in-memory database systems have grown in popularity precisely because there are workloads for which IMDS technology is the best answer (sometimes, it is the only answer). True IMDS implementations understand that the storage is, in fact, DRAM, and are optimized accordingly. And, lastly, in-memory databases can be made to be just as resilient (AKA durable) as persistent databases.
Sponsored by McObject LLC