Comments on: Scaling MySQL and MariaDB to TBs: Interview with Martín Farach-Colton.

By: Vilho Raatikka

Vilho Raatikka — Thu, 22 Nov 2012 16:51:43 +0000

@Martin, thanks. I found good material about it.

By: Jan Alanco

Jan Alanco — Mon, 29 Oct 2012 06:35:50 +0000

I don’t know why you are asking my opinion. I’m not familiar with databases. My focus is on digital image. But any way I can give you some of my thoughts.

Normally seems to be so that digital image databases are quite locked and they work only in the archive house. Often is so that if you take picture out of database the image there is only the image as it self and nothing more. No information over anything. Digital database program has wiped out all information. The picture file contains only minor data such as:

name plus extender=format
dimensions
resolution
file size

Customer might want to know at least:

color profile
deep of compressing if any
description
photographer name
date and time of creation

The other way is to use metadata features of an modern digital image. It is not so common knowledge that there is a possibleties to store information in side the file. This is ASCII so its doesn’t take room. The best part is that biggest amount of this kind of information can be automatically saved. There is several parts of metadata:

Exif – technical information, which contains basic image data, names and version of used programs, thumbnale picture and so on (automatically)

IPTC – archive data, which contains photographers name, descriptions, copyright notes, owner and storing place (must be write down)

XMP – partly both, but also you can store image handling process (Adobe’s own data format)

this is constantly growing area – geodata (now adays also shooting direction), face recognition and so on (automatically partly=must be proved by human)

In this new system metadata will ride with image all over the world. You don’t need heavy database system. Information is possible to read from images. But if archive authorities will they might want keep storing database. Also original images needs place to keep.

Now a days I would keep raw-dng store. It is possible save metadata already in raw file. You need only develop pictures out from digital negatives. Tiff store is not necessarily needed. In that way you spare a lot of storing room.

By: Martin Farach-Colton

Martin Farach-Colton — Mon, 22 Oct 2012 01:19:04 +0000

@Vilho,

In a traditional OLTP approach, if you need to modify part of a B-tree, but the part you need to modify is on disk, then you end up doing a full round-trip to disk for a single modification. On the other hand, approaches like OLAP batch up changes and apply them in a group. This makes the updates much more memory efficient, but the changes aren’t available for a while.

The advantage of fractal trees is that the changes to tree are scheduled so that they are in bunches, and therefore fast (like OLAP) but the changes are on the path of queries, so they affect the queries immediately (like OLTP).

So you get fast updates but also immediate updates. High throughput and lower latency, if you will.

The most dramatic illustration is for messages that change a schema. For us, it’s a broadcast message that gets injected into the root and makes its way down every path to every leaf, eventually. We can schedule these message movements when we have lots of updates to make, and so the cost to do update is small, yes, since it’s just a message injection into the root to notify all queries of the change, the effect is immediate. Using traditional approaches, every leaf would be fetched into memory and changed, even leaves that are never read again. This is why InnoDB can have hours or even days of down time to add a column, whereas TokuDB has no down time and no noticeable performance hit for adding a column

There are several videos of varying lengths available at http://www.tokutek.com/resources/technology/.

The quick version is at http://www.youtube.com/watch?v=DFT3DyUEVJU (8 minutes) and a longer presentation can be found at http://vimeo.com/26471692.

Hope these help.
Martin

By: Vilho Raatikka

Vilho Raatikka — Tue, 16 Oct 2012 10:11:16 +0000

“In a Fractal Tree Index, all changes — insertions, deletions, updates, schema changes — are messages that get injected into the tree.”

Is there a research or white paper on which Fractal Tree Index is based on? Personally, it is too demanding to understand how immediate changes due to message injections differ from immediate changes due mutating contents of a memory location.