Image Recognition at the Speed of Memory Bandwidth

We often hear from our customers that they want to do various types of artificial intelligence (AI) and machine learning (ML) model evaluations for IoT data, as well as imagery, in real time.

A good example of this is when you need to find similar images in a large corpus of image data. For instance when you point a camera at a person and are quickly able to determine if that person is in a database. This is what is referred to as real-time facial recognition.

From Images to Feature Vectors

Facial recognition is a subject of ongoing research to efficiently extract feature vectors from images using deep learning. Here is a reference to a modern approach: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/.

For the purpose of this post, we will assume that this is a somewhat solved problem and we can efficiently extract feature vectors from any incoming image. Once those feature vectors are produced, all you need to do is insert them into a MemSQL table with the following simple schema.

CREATE TABLE features (
id bigint(11) NOT NULL AUTO_INCREMENT,
feature_vector binary(4096) DEFAULT NULL,
KEY id (id) USING CLUSTERED COLUMNSTORE
)

A typical way to insert the vectors is to use Apache Spark, which enables quick parallel data transfer into MemSQL.

Similarity Search

There are two frequently used approaches to measuring the similarity between vectors: cosine similarity (cosine of the angle between the vectors) and Euclidean distance. Cosine similarity is defined as the dot product of the vectors, divided by the product of the vector norms (length of the vectors). If the vectors are normalized, the cosine similarity is simply the dot product of the vectors (since the product of the norms is 1).

To search using cosine similarity we can simply run this query to find similar images.

SELECT
id
FROM
features
WHERE
DOT_PRODUCT(feature_vector, <Input>) > 0.9

Input is a feature vector extracted from an incoming image, and 0.9 is a constant that was experimentally tuned, which corresponds to an angle of less than 26 degrees between the feature vector and the input.

Euclidean distance is also frequently used to measure similarity. It is defined as the norm of the vector resulting from the subtraction of two input vectors. The EUCLIDEAN_DISTANCE built-in can also be used to efficiently measure the similarity between vectors.

This query performs a full table scan, which seems like it might be slow, but we will share our approach to perform this computation at memory bandwidth speed.

Performance

Here is our set of assumptions:

Memory Speed: 50GB/sec
Each image feature vector contains 1024 features, resulting in 4KB/vector

So, if we are limited by memory bandwidth, that means we can search 12.5 million images a second per node or 1 billion images a second on a 100 node cluster. Let’s verify that’s actually true. I developed a simple test by creating a MemSQL columnstore table with the schema above and populated it with 12.5 million random 4KB normalized feature vectors. The machine I used has a 6-core Xeon E5 processor. When I ran the search query, I got a 0.25 second response time.

How can MemSQL run this faster than memory bandwidth? The answer is compression of columnstore tables. Because the random vectors were normalized, they were able to be compressed from 50GB down to a size that can be read from memory in less than 0.25 seconds.

This shows that the DOT_PRODUCT computation can be done faster than 50GB/sec, and if no compression is applied, memory bandwidth is the limiting factor.

MemSQL uses a fast vectorized table scan leveraging Intel’s latest instruction sets: AVX2 and AVX512. MemSQL also uses these instruction set extensions to compute DOT_PRODUCT itself.

Conclusion

Because you can perform image recognition at in-memory speed, your bottleneck for similarity computation is not necessarily compute. We realize that there are other algorithms that gain efficiency by avoiding the full table scan and only lose a small amount of accuracy. However, you can achieve good practical results with a very straightforward implementation.

Future Work

Currently, we are adding more primitives to enable more machine learning use cases. We are also exploring GPUs, which have much higher memory bandwidth(up to 1TB/sec) to enable real-time scoring for more complex AI/ML problems.

Try It For Yourself

If you want to try real-time image recognition out for yourself, you can download the newest version of the MemSQL 6 beta, and look at the documentation for the DOT_PRODUCT function.

Sponsored by MemSQL

Image Recognition at the Speed of Memory Bandwidth

From Images to Feature Vectors

Similarity Search

Performance

Conclusion

Future Work

Try It For Yourself

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

HPCC Systems from LexisNexis Risk Solutions

KX

InterSystems

MySQL/Oracle

SingleStore

Supporters

McObject

NEXTGRES

Raima

Scality

Volt Active Data