Storing Binary Data in kdb+

6 Feb 2020 | kdb+q language

by Steven Rathborne, Kx

kdb+ is a proven leader for processing time-series structured data, but why stop there? Its integrated programming language, q, has many useful capabilities for dealing with unstructured data too, making it possible to store data of many different formats including images, text documents, audio and video in a kdb+ database. This blog will demonstrate how kdb+ can be used to read and store binary data, outline a use case and discuss the pros and cons versus a regular file system approach.

Let’s start by demonstrating how kdb+ can handle this binary data.

Reading data

In this example we will read files of different formats and store them in a table on disk in a kdb+ database. To read the data in as a list of bytes we can use read1 with a single parameter of the file path.

With the help of a basic data loader which loads files of different formats from a directory we can construct a table as below:

{[file] 
    `filename`extension`data!(`$1_first fn;`$last fn:"." vs string file;read1 file)
 } 
filename extension data
-------------------------------------------------------------------------------------------------..
audio    oga       0x4f676753000200000000000000003e1c0000000000006c9ac021014066697368656164000300..
dummy    pdf       0x255044462d312e340a25c3a4c3bcc3b6c39f0a322030206f626a0a3c3c2f4c656e6774682033..
escaltor webm      0x1a45dfa3010000000000001f4286810142f7810142f2810442f381084282847765626d428781..
gtr      wav       0x52494646245f030057415645666d742010000000010001002256000044ac0000020010006461..
kx       png       0x89504e470d0a1a0a0000000d49484452000000c8000000c80806000000ad58ae9e00000eea49..
legoVid  webm      0x1a45dfa3010000000000001f4286810142f7810142f2810442f381084282847765626d428781..
tab      csv       0xfffe43006f006c0075006d006e00200031002c0043006f006c0075006d006e00200032000a00..
text     txt       0x48656c6c6f20776f726c6420746578742066696c650a                                ..
twitter  jpg       0xffd8ffe000104a46494600010101006000600000ffe100684578696600004d4d002a00000008..Copy

Saving data 

There is nothing out of the ordinary for saving to disk so we can write the data using set or any other preferred method.

Viewing the data

To demonstrate that the above files have been read and then saved successfully into a kdb+ database, we can query the data with the help of some HTML5. 

When .z.ph (http get message handler) is called, by default a web page is returned to the browser which contains a list of available variables along with a snippet view of the currently selected one. In this case our table will be printed on the web page similar to above.

To display the data in a viewable format we can update this message handler to return a web page with the table including embedded viewers for all the different file formats. In this case the data URI (uniform resource identifier) scheme is used to include the data in-line within the HTML web page.

data:[<media type>][;base64],<data>Copy

As seen in the above syntax the data to be included should be in base64 format. To encode our binary data to base 64, .Q.btoa can be used.

Lastly, if we inject some more HTML tags to format the table we can achieve the below:

Downloading the data 

Another option here is to include a downloadable link in the data. This would look something like below and is achieved in a similar way as displaying the data but with a download HTML tag.

Advantages/disadvantages

The above example shows we can store and view binary data straight out of a kdb+ database using a q process and some HTML5 web pages. This, however, does have some drawbacks. Uploading the binary data to the database this way is a little more complex than a conventional filestore as it needs to be converted into its binary format and then converted to base 64 for HTML viewing. Downloading the data is also more complex than accessing a file in a file store using a URL.

When viewing the data directly out of the database as in the above example, if a large file such as a high-resolution video is included in the data, the entire file needs to be read into the q process and sent to the browser. This can be memory intensive on both the q process and the browser. In an ideal setup this would be streamed in chunks directly from disk.

Despite these disadvantages, there are still some advantages to storing binary data in a kdb+ database including:

  • There is no complex linking between a filestore and a database where links can often break if files are rearranged. 
  • Integrated security and permissions: With everything in one place any existing permissions applied to the q processes/directories and/or tables can be automatically scaled to include this data. 
  • There is no need for an extra backup mechanism to account for another filestore database. 
  • Built-in q file compression can be used to reduce the disk footprint of these files within the kdb+ database.

Use case 

This method for storing binary data may come to use when working on implementations with small file sizes and where linking additional filestores is impractical. If large files are to be saved in the database it may be best to restrict queries from automatically selecting the binary data unless specified.

One use case would be in surveillance solutions in financial markets, where files such as traders’ audio phone conversations, emails/messenger chats and other files could be included directly alongside alert data for investigating trading misconduct.

Filestore solution:

Since using a filestore is a popular solution for including binary data in a database, we will also demonstrate this can also be achieved with kdb+.

This time the files can be left unaltered in a directory on disk, and we can add an extra column to our existing table with the file url. Again some html embedding tags can be added to allow viewing straight from the browser.

An advantage of this solution is when dealing with large video files. The files are not loaded into the q process occupying memory and when viewing through the browser, the data is automatically buffered in chunks with some built-in browser javascript. A download link can also be set up as mentioned previously.

To summarise, even though the filestore solution is a popular option, it may not always be the best one. We have shown that it’s possible to store binary data within a kdb+ database with the bonus of being able to view the data directly from it. Depending on the implementation this method could be the most effective solution.

Full source code used in this blog can be found here

Sponsored by Kx

You may also like...