How to Make a Believable Benchmark

by Roberto Zicari · July 15, 2015

How to Make a Believable Benchmark

By Carlos Bueno on July 13th 2015 in Engineering

originally published in the memsql blog.

believable benchmark

A benchmark asks a specific question, makes a guess about the expected result, and confirms or denies it with experiment. If it compares anything, it compares like to like and discloses enough details so that others can plausibly repeat it. If your benchmark does not do all of these things, it is not a benchmark.

Today’s question comes from one of our engineers, who was talking to a customer about new features in MemSQL 4. We added support for SSL network encryption between clients and the cluster, and also between nodes in the cluster. The customer wanted to know how performance would be impacted.

slack question

The original question was about replication, but in MemSQL, replication happens over the same protocol as everything else. So we’ll generalize it a little to “What is the performance hit when SSL is turned on?” and then make it more specific, i.e. “what is the difference in sustained query throughput and latency between SSL on and SSL off for a given workload?”

Meet Herp and Derp

herp and derp

These are servers that used to be part of our continuous testing suite. Now they sit around doing benchmarks and party tricks. They are both Dell T610 mid-tower servers with two sockets, eight 2.5 GHz CPU cores, and 48GB of RAM. Not shown is burp, which is a 4-core box I scavenged out of a mess of parts at the local computer shop. We will set up the cluster with MemSQL 4, one aggregator / traffic generator (burp), and two leaves (herp and derp) and high availability replication turned on between the leaves. I’m being a little cavalier about mixing the server and client workload, but it is unlikely that the bottleneck will be on the aggregator.

The dataset is a table with 200 million records. Each record has a geospatial point, a few numbers, and an index. They will be loaded with 8 write threads in batches of 10,000 each. Then we will start 16 read threads, which will continuously fire queries that select all points that lay inside randomly-generated geospatial polygons drawn on the surface of the Earth. The polygons will be sized such that about 80 points are retrieved from the index, filtering down to 40-50 points returned to the aggregator.

geospatial points

The read query load will be increased until the leaves hit 95% CPU utilization. Then we will measure sustained throughput and median latency. Finally, we will do it all over again with SSL and measure the difference.

Loading a batch of records into MemSQL involves three kinds of network connections: client to aggregator, aggregator to leaves, and (asynchronous) replication between leaves for high availability. All of these happen over the same protocol; replication clients simply connect like regular SQL clients then run a command that starts the firehose of data. The read queries require two network hops: client to aggregator, and aggregator to the leaves.

When SSL is on, we should expect higher latencies and higher demand for CPU. We should also see a reduction in overall throughput for the same number of threads. That is the bet, anyway. Let’s watch what happens.

Non-SSL Benchmark

non ssl benchmark

Loading data without SSL moved at a good clip; about 550,000 to 600,000 rows per second. As the table grew in size, this throughput dropped off gently. MemSQL in-memory tables are based on skiplists, which have log(N) insertion behavior. As N grows larger and larger, a little more work is needed per insert. Graphing insert volume over long periods of time would show a pretty catenary curve. CPU load was at decent levels during loading, but not at the saturation point.

$ time ./benchmark write-spatial
[...snip...]
200000000 total,  488849 per sec

real    6m49.066s
user    14m39.200s
sys     0m21.392s

(NB: though the Ops screenshots imply “600GB of Memory”, that is physical RAM plus swap.)

Reading was also straightforward. 16 threads was about right to pin the CPUs on the leaves, but left plenty of headroom on the aggregator/client machine.

16 threads

Read throughput on these geospatial queries was a steady 76,000 rows per second, or about 1,700 queries per second. One of the nice things about an in-memory database is no “warm up” or caching time on benchmarks; you reach cruising altitude immediately. The median latency was 7.7 milliseconds, p95 was 14 msec. Not bad for a pile of junk computers.

junk computers rows read

SSL Benchmark

I dropped the database, then configured SSL certificates so that all intra-cluster communication was secured, and restarted. Just to check whether SSL was on, I connected with a MySQL client and the certificate I’d generated.

$ mysql -uroot -h burp --ssl-ca=ca-cert.pem -e "show variables like '%ssl%'"
+---------------+--------------------------+
| Variable_name | Value                    |
+---------------+--------------------------+
| have_openssl  | ON                       |
| have_ssl      | ON                       |
| ssl_ca        | ../certs/ca-cert.pem     |
| ssl_capath    |                          |
| ssl_cert      | ../certs/server-cert.pem |
| ssl_cipher    |                          |
| ssl_key       | ../certs/server-key.pem  |
+---------------+--------------------------+

Now, on to the benchmark. I modified the benchmark script, which uses the memsql.common Python convenience library, to connect via SSL.

pool = connection_pool.ConnectionPool()
db = pool.connect('127.0.0.1', '3306', 'root', '', '', {'ssl': '../certs/ca-cert.pem'})

And off we go! Writes initially clocked in at 425,000 rows per second, trending down to 350K or so by the end. The aggregator’s CPU was higher with SSL on, since it was handling three separate encrypted links (client to aggregator, aggregator to client, aggregator to leaves). Note that in both write tests, we did not achieve CPU saturation on the leaves, so these are lower bounds.

rows written 368

master aggregator 368

$ time ./benchmark write-spatial
[...snip...]
200000000 total,  387458 per sec

real    8m36.118s
user    14m29.685s
sys     0m14.852s

The overall insert time for 200 million rows was 8:36, compared to 6:49 for the non-SSL config. Overall, 25% less write throughput. Now, on to reads.

master agg 64

Read query throughput clocked in at 64,000 rows and just under 1,500 queries per second under SSL. Compared to the 76K from the non-SSL config, that’s a 16% drop in read throughput. Median latency was 10 msec with a p95 of 18 msec. Both reads tests reached full CPU saturation.

And that is a decent benchmark. We asked a specific question and answered it with an experiment, comparing the same metrics against one and only one difference. Even though I lost the bet that the aggregator + client load would not bottleneck burp, the skew was not too large. Lastly, there is enough detail that you can repeat the test and be reasonably sure you will get similar results.

table benchmarking

But do not generalize everything you read. You cannot flatly say that “SSL is a 16% latency hit on MemSQL”. Larger polygons would probably give different results. Other kinds of queries, like distributed joins, would perform extra communication between leaves. We can make any number of guesses at what the result may be, but only a good benchmark can answer it.

Try this at home

The script I used is up on Github (it is ugly; be kind). Below is the schema, some sample data, and queries.

CREATE TABLE terrain_points (
  location geographypoint DEFAULT 'Point(0 0)',
  elevation int(10) unsigned NOT NULL,
  ent_id int(10) unsigned NOT NULL,
  time_sec int(10) unsigned NOT NULL,
  SHARD KEY location (location, ent_id, time_sec)
);

memsql> select * from perf.terrain_points limit 5;
+----------------------------------+-----------+----------+------------+
| location                         | elevation | ent_id   | time_sec   |
+----------------------------------+-----------+----------+------------+
| POINT(-89.00991728 7.95282675)   |     28822 | 20333901 | 1432681250 |
| POINT(177.53195551 -32.35287545) |     64465 | 25248309 | 1432681250 |
| POINT(-150.17150895 10.02000130) |     43688 | 66406185 | 1432681250 |
| POINT(77.94888837 19.91037133)   |     57739 | 26693242 | 1432681250 |
| POINT(-125.70522605 45.54260644) |     41195 | 24162850 | 1432681250 |
+----------------------------------+-----------+----------+------------+
5 rows in set (0.02 sec)

memsql> SELECT * FROM perf.terrain_points with (index=location, resolution=6) WHERE geography_intersects(location, 'POLYGON((166.07059780 -13.90684205, 166.06868866 -13.90694224, 166.02333020 -13.93129089, 166.01865319 -13.99425334, 166.02876188 -14.00779541, 166.03506682 -14.01331439, 166.03979604 -14.01657349, 166.10317282 -14.01829710, 166.10531622 -14.01706579, 166.11815640 -14.00676434, 166.11893834 -14.00592391, 166.13045244 -13.98737615, 166.07059780 -13.90684205))');
+----------------------------------+-----------+----------+------------+
| location                         | elevation | ent_id   | time_sec   |
+----------------------------------+-----------+----------+------------+
| POINT(166.09984350 -14.00177767) |     36606 | 22886130 | 1432768116 |
| POINT(166.06090626 -13.95221591) |     70244 | 19588890 | 1432767950 |
| POINT(166.11221990 -13.97041329) |     32433 | 70087923 | 1432768053 |

   [...snip...]

| POINT(166.05442264 -13.96318868) |     84910 | 74432027 | 1432767899 |
+----------------------------------+-----------+----------+------------+
38 rows in set (0.01 sec)

You can download MemSQL Community Edition for free and try this for yourself.

Carlos is a Product Manager at MemSQL. Previously, he was an engineer at Facebook, Yahoo, and several startups. Carlos is the author of “Lauren Ipsum”, a popular children’s novel about computer science, and “Mature Optimization”, Facebook’s manual on performance measurement and optimization.

Sponsored by memsql.

How to Make a Believable Benchmark