DataStax Proxy for DynamoDB™ and Apache Cassandra™ – Preview

DATE: September 13, 2019

Yesterday at ApacheCon, our very own Patrick McFadin announced the public preview of an open source tool that enables developers to run their AWS DynamoDB™ workloads on Apache Cassandra. With the DataStax Proxy for DynamoDB and Cassandra, developers can run DynamoDB workloads on premises, taking advantage of the hybrid, multi-model, and scalability benefits of Cassandra.

The Big Picture

Amazon DynamoDB is a key-value and document database which offers developers elasticity and a zero-ops cloud experience. However, the tight AWS integration that makes DynamoDB great for cloud is a barrier for customers that want to use it on premises.

Cassandra has always supported key-value and tabular data sets so supporting DynamoDB workloads just meant that DataStax customers needed a translation layer to their existing storage engine.

Today we are previewing a proxy that provides compatibility with the DynamoDB SDK, allowing existing applications to read/write data to DataStax Enterprise (DSE) or Cassandra without any code changes. It also provides the hybrid + multi-model + scalability benefits of Cassandra to DynamoDB users.

If you’re just here for the code you can find it in GitHub and DataStax Labs: https://github.com/datastax/dynamo-cassandra-proxy/

Possible Scenarios

Application Lifecycle Management: Many customers develop on premises and then deploy to the cloud for production. The proxy enables customers to run their existing DynamoDB applications using Cassandra clusters on-prem.

DataStax Proxy Overview

Hybrid Deployments: DynamoDB Streams can be used to enable hybrid workload management and transfers from DynamoDB cloud deployments to on-prem Cassandra-proxied deployments. This is supported in the current implementation and, like DynamoDB Global Tables, it uses DynamoDB Streams to move the data. For hybrid transfer to DynamoDB, check out the Cassandra CDC improvements which could be leveraged and stay tuned to the DataStax blog for updates on our Change Data Capture (CDC) capabilities.

DataStax Proxy Architecture

What’s in the Proxy?

The proxy is designed to enable users to back their DynamoDB applications with Cassandra. We determined that the best way to help users leverage this new tool and to help it flourish was to make it an open source Apache 2 licensed project.

The code consists of a scalable proxy layer that sits between your app and the database. It provides compatibility with the DynamoDB SDK which allows existing DynamoDB applications to read and write data to Cassandra without application changes.

How It Works

A few design decisions were made when designing the proxy. As always, these are in line with the design principles that we use to guide development for both Cassandra and our DataStax Enterprise product.

Why A Separate Process?

We could have built this as a Cassandra plugin that would execute as part of the core process but we decided to build it as a separate process for the following reasons:

  1. Ability to scale the proxy independently of Cassandra
  2. Ability to leverage k8s / cloud-native tooling
  3. Developer agility and to attract contributors—developers can work on the proxy with limited knowledge of Cassandra internals
  4. Independent release cadence, not tied to the Apache Cassandra project
  5. Better AWS integration story for stateless apps (i.e., leverage CloudWatch alarm, autoscaling, etc.)

Why Pluggable Persistence?

On quick inspection, DynamoDB’s data model is quite simple. It consists of a hash key, a sort key, and a JSON structure which is referred to as an item. Depending on your goals, the DynamoDB data model can be persisted in Cassandra Query Language (CQL) in different ways. To allow for experimentation and pluggability, we have built the translation layer in a pluggable way that allows for different translators. We continue to build on this scaffolding to test out multiple data models and determine which are best suited for:

  1. Different workloads
  2. Different support for consistency / linearization requirements
  3. Different performance tradeoffs based on SLAs

Conclusion

If you have any interest in running DynamoDB workloads on Cassandra, take a look at the project. Getting started is easy and spelled out in the readme and DynamoDB sections. Features supported by the proxy are quickly increasing and collaborators are welcome.

https://github.com/datastax/dynamo-cassandra-proxy/

All product and company names are trademarks or registered trademarks of their respective owner. Use of these trademarks does not imply any affiliation with or endorsement by the trademark owner.


1Often in the DynamoDB documentation, this key is referred to as a partition key, but since these are not one-to-one with DynamoDB partitions we will use the term hash key instead.

Subscribe to Our Blog Now

Sponsored by DataStax

You may also like...