The full potential of such an offering comes when customers are allowed to arbitrarily segment and calculate flexible aggregates. To do that, they need to be able to query raw unaggregated data. This way your company does not have to guess what the customer wants to aggregate, as all choices remain available. If raw data access is not provided, then data must be precomputed, at least on some dimensions, which limits flexibility and the extent of the insights users can get from data.

Cost and technology constraints have led most companies to build analytics with this precompute approach for customers, because they need to serve analytics to many customers concurrently. The scale required to offer raw data access remained untenable. It was unthinkable to perform computations on raw data points on the scale of billions of rows per request concurrently for thousands of customers.

Today, MemSQL is changing that conventional wisdom and offering companies the ability to serve raw unaggregated data performance to a range of customers.

To explain this capability further, there are three major pieces of technology in this use case:

  • Scale-out
  • Columnstore query execution
  • Efficient data isolation


Knowing system performance characteristics on a per-core basis, users can calculate how much compute and storage is needed to serve analytics at scale. Once that calculation is done, the key is to utilize a distributed system allowing enough dedicated compute power to meet demand. MemSQL can be used to run one to hundreds of nodes, which lets users scale the performance appropriately.

For example, if you have a million customers with one million data points each, you can say that you have one trillion data points. Imagine that at the peak, one thousand of those customers are looking at the dashboard simultaneously – essentially firing off one thousand concurrent queries against the database. Columnstore compression can store these trillion rows on a relatively small MemSQL cluster with approximately 20 nodes. Conservatively, MemSQL can scan 100 million rows per second per core, which mean that just one core can service 100 concurrent queries scanning one million rows each, and deliver sub-second results for analytical queries over raw data – below we will provide a benchmark for a columnstore query execution performance.

Columnstore Query Execution

A simple query over a columnstore table, such as a GROUP BY, can run at a rate of hundreds of millions to over a billion data points per second per core.

To demonstrate this, we loaded a public dataset about every airline flight in the United States from 1987 until 2015. As the goal was to understand performance per core, we loaded this into a single node MemSQL cluster running on a 4 core, 8 thread Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz.

To repeat this experiment, download the data using the following bash script:

Create this table:

Then load data into the table:

Once the data is loaded, run a simple group by command. The following query performs a full table scan:

On a machine with 4 cores, a 164 million row dataset query runs in 0.04 seconds which is 1 billion rows per second per core. No, that’s not a typo. That’s a billion rows per second per core. More complex queries will consume more CPU cycles, but with this level of baseline performance there is a lot of room across a cluster of 8, 16, or even hundreds of machines to handle multi-billion row datasets with response times under a quarter of a second. At that speed, queries appear to be instantaneous to users, leading to great user satisfaction.

Try this example using MemSQL 6 Beta 2. New vectorized query execution techniques in MemSQL 6, using SIMD and operations directly on encoded (compressed) data, make this speed possible.

Efficient Data Isolation Per Customer

Data warehouses such as Redshift and Big Query support large scale, but may not sufficiently isolate different queries in highly concurrent workloads. On top of that, both have a substantial fixed overhead on a per query basis. Redshift in particular does not support many concurrent queries:

Depending on the analytical requirements, MemSQL allows for an ordered and partitioned physical data layout to ensure only scanned data belongs to a single customer. In our example, the columnstore was clustered on AirlineID.

MemSQL supports clustered columnstore keys that allow global sorting of columnstore tables. In this case, if you have a predicate on AirlineID a user will only scan the subset of data belonging to that airline. This allows MemSQL to deliver on very high concurrency (in the thousands of concurrent queries) with each query scanning and aggregating millions of data points.

More on Query Execution

At MemSQL, we are continuously innovating with new query processing capabilities. This is a list of recent innovations in our shipping product:

Bringing it All Together

Going back to our original example, though our dataset is one trillion rows, because of the clustered columnstore key, each customer only needs to scan through one million rows. For a simple query like the above, scanning 500 million rows per second per core means that a single CPU core could support 500 concurrent queries and deliver sub-second performance.

To recreate the work mentioned in this blog, try out our beta for MemSQL 6: Or to get started with MemSQL, visit for our production release.

Sponsored by MemSQL