One Million NOTPM DBT2 Benchmark on NuoDB 2.3

One Million NOTPM DBT2 Benchmark on NuoDB 2.3

By Dai Klegg, NuoDB, Sr Director of Product Marketing

Throughout the development of our new Robins release (NuoDB 2.3), we’ve been benchmarking performance against our previous Larks release (NuoDB 2.2). We wanted to demonstrate the improvements to scale-out performance that we’ve been working on for Robins.

To do this we needed a realistic, or at least pseudo-realistic, test of transaction processing to make the benchmarking worthwhile. So we used DBT2 as a credible, public domain scenario (the detail of DBT2 is at the end of this post) and configured it to increase the number of warehouses as the number of clients increased.

The tests were run on about $50,000 of standard equipment in our test labs: no fancy processors, massively overloaded memory or super-fast network; just commodity hardware. More detail on the setup below, meanwhile…

Cut to the Chase: One Million + DBT2 NOTPM

But let’s cut to the chase and come back for the details later. We broke the one-million New Order Transactions Per Minute (NOTPM) barrier, as we had hoped we would.

But more important was the scale-out curve. As we added Transaction Engines (TEs – they run the transactions) and Storage Managers (SMs – they maintain the persistent storage of the data) the throughput continued to rise and at a pretty consistent increment. As you see from the graph below, we’re still adding throughput at 50 (40 TE and 10 SM) nodes.

Figure: Results running Robins (v2.3) with different TE / SM configurations. For reference, all results are also compared against Larks (v2.2) running up to 8 TEs and one SM.

A NuoDB recap

If you’re not familiar with NuoDB and how it works, the best place to start is with the NuoDB Technical White Paper. But all you really need to know, to interpret the chart, is that TEs run transactions. Client applications connect to TEs, and TEs maintain an in-memory cache of all the data their clients’ transactions need. TEs and SMs each maintain their local, on-demand caches that know how to talk to each other and synchronize. But clients don’t connect to SMs, SMs manage the persistence of updated data in cache and serve up data from the backing store (spinning disk or SSD) into the cache, when requested by a TE.

In Robins SMs can be configured to each only persist a subset of the data. This isn’t sharding, which just creates multiple separate sub-set databases. This is still, transparent to the application, a single logical database. But users now have the option to segment the data storage. There are a number of benefits to this, which are discussed in the Robins Release Technical Blog post. The essential benefit here is that not every SM has responsibility for every record, so the read/write load is shared out and the distributed cache is shared. As we tested a growing cluster, adding TEs and SMs to keep the balance, the overall client transactions throughput rate grew.

What do the numbers say?

The numbers say that between v2.2 and v2.3, even without using multiple SMs, we significantly increased throughput (the purple line vs. the green line) for this test. And we sustained the increment as the cluster grows. But that was really just a baseline. By adding SMs – and segmenting the data between them – we can sustain a higher rate, and one that continues to rise for much longer. And that’s how we got from a throughput in v2.2, of 140,000 NOTPM to 1,020,000 NOTPM in v2.3 on an equivalent test environment.

For the varying SM test (the orange line), we configured 1 SM for every 4 TEs, with 1 SM for the 2 TE test, of course. For the tests that deployed more than 3 SMs, that’s indicated in the x-axis legend on the chart, along with the TE count.

What do the numbers mean?

They mean we have made great progress. Performance has been a key focus for v2.3 so we’ve been running these tests all along the way. We wanted to see how the newly available segmented data storage would allow us to exploit more SMs to support higher transaction rates and to continue to add throughput with scale-out.  Which is exactly what we observed. We achieved a seven-fold increase in throughput, and none of our customers were struggling for performance at the previous levels. This means we can deliver them more headroom than they need to expand their current use and bags of capacity as they migrate more apps onto NuoDB.

And all this was achieved on $50,000 of kit.

The set up

Hardware configuration:

Ranging from 3 to 50 Rack-Mounted SuperMicro MicroCloud Units, (tests with different configurations) consisting of:

  • Intel Xeon E3-1270V2 Quad-Core 8 hyper-threaded Server Processor
  • 4x8GB DDR3-1333MHz ECC memory
  • 1 or 4 TB HDD
  • Crucial 512GB M4 SSD (SMs)
  • Dual-port 10 Gigabit LAN cards
  • 24 port 10 Gigabit LAN switch (Single port)

Software configuration:

  • 2 – 40 Transaction Engines, configured with –mem 24GB
  • 1 – 10 Storage Managers

DBT2 Test Configuration:

  • DBT2 benchmark client and driver processes executed on the TE machines
  • NuoDB setup with ChainableLocalityBalancer so each DBT2 process connected to the local TE
  • 64 database connections per DBT2 driver
  • 10 warehouses per TE
  • 300 second benchmark duration

DBT2 Definitions:

DBT2 is an implementation of TPC-C specification, originally created for MySQL and later extended to support PostgreSQL, Drizzle and SQLite. It is written in C and some integration at the source code level is required to add support for a new database.

TPC-C is an OLTP benchmark that emulates a real-world scenario where a number of terminal operators execute five different types of transactions against a database.  The business scenario is an order processing system, the five transactions are: New-order, Payment, Delivery, Order-status and Warehouse Stock-level.   The number of warehouses that each terminal is connected to defines the scaling factor of the database. The detailed TPC-C description, specification can be found at http://www.tpc.org/tpcc/.

DBT2 has a generic implementation that only needed some minor adjustments to work with NuoDB.  Having made those change, it was easy for us to add code to interface with our C++ driver to connect/disconnect, execute a query and iterate through a result set.

The NuoDB benchmark with all the minor changes can be found at https://github.com/nuodb/dbt2.

Summary

No benchmark is a substitute for testing of real-world applications. And we encourage our customers to do exactly that. But in-house benchmark testing has its place too, for establishing the effectiveness of performance fixes in a code base over builds and releases, and for checking that new features haven’t adversely affected performance. DBT2 at least attempts to replicate real-world processing so it’s as good a guide as we can get internally of how we’re doing for our customers – who are usually running TPMS workloads.

But one million NOTPM is an outstanding result!

It represents a 7x like-for-like improvement over our last release. Props to our engineers and QA folks, and I’m looking forward to writing up the results for our next release.

Next Steps

We’re already running the same tests against early builds of our next release, and for that release we plan to also publish multi-region, active-active performance results as well, because our customers are increasingly deploying in multiple data centers or multiple cloud regions. Geo-distribution is the future of database, so we plan to be on the crest of that wave – in capability and in performance.

 

Originally published at NuoDB DevCenter.

You may also like...