On Amazon Keyspaces. Q&A with Meet Bhagdev
Q1. What is Amazon Keyspaces (for Apache Cassandra)?
Amazon Keyspaces (for Apache Cassandra) is a globally distributed, serverless, fully managed database service with up to 99.999% availability, giving you the performance, elasticity, and enterprise features you need to operate business-critical Cassandra workloads at scale. Keyspaces is an Apache Cassandra compatible database meaning you can run your Cassandra workloads on AWS using the same Cassandra application code, drivers, and developer tools that you use today. With Keyspaces, you don’t have to provision, patch, or manage servers, manage garbage collection and you don’t have to install, maintain, or operate software. Also, Keyspaces is serverless – so you pay for only the resources you use and the service can automatically scale tables up and down in response to application traffic without any human intervention. With this your application could seamlessly scale while providing consistent, single-digit millisecond performance at any scale. Data is encrypted by default and Amazon Keyspaces enables you to back up your table data continuously using point-in-time recovery. It is easy to manage Cassandra workloads with Keyspaces.
Q2. What are three main benefits of Amazon Keyspaces?
The first main benefit of Keyspaces is that it is designed to eliminate Cassandra database management tasks so developers can focus on building features for their application versus database management. With Keyspaces, you get benefits such as zero downtime patching (includes all database maintenance events), automatic no-impact backups, and encryption at rest and in transit. Moreover, customers migrating Apache Cassandra workloads don’t have to worry about managing tombstones, compaction strategies, read repair, and more.
The second main benefit of Keyspaces is that it is serverless and can scale to millions of reads and writes per second and petabytes of storage capacity automatically without any downtime. You can also scale to zero for periods when you don’t use your database. You have the ability to pay for exactly the number of reads and writes they process in their database and the amount of data they store. This is ideal for workloads of all sizes as you don’t have to worry about capacity planning, managing instances, or take downtime when scaling nodes.
The third main benefit of Keyspaces is that it integrates with other Apache open-source projects such as Apache Spark and Apache Kafka. To integrate with Spark, you can use the Spark-Cassandra Connector. Apache Spark provides a connector library called “spark-cassandra-connector” that allows Spark applications to read from and write to Cassandra. You can use the same connector to connect Spark with Keyspaces. Similarly, Apache Kafka has a feature called Kafka Connect, which enables scalable and reliable integration of Kafka with other data systems. The Kafka Connect Cassandra Sink connector allows you to stream data from Kafka topics into Cassandra, including Keyspaces.
Q3: You recently launched Multi-Region Replication? Please tell us more about this feature and why should I use it?
Absolutely! High availability and resiliency are top of mind for our customers. With Amazon Keyspaces Multi-Region Replication, you can replicate your data with automated, fully-managed, active-active replication across the AWS Regions of your choice and achieve 99.999% availability for your applications. With 99.999% availability, you can use Keyspaces for tier-0 applications. You can improve resiliency from the rare event of regional degradation while also benefiting from low latency local reads and writes for global applications. With Multi-Region Replication, Keyspaces asynchronously replicates data between Regions and data is typically propagated across in less than one second. Multi-Region Replication also eliminates the difficult work of resolving update conflicts and correcting for data divergence issues, enabling you to focus on your application. If you have a global application that needs low latency reads and writes or if you need to build business continuity such that you can recover from regional degradation, you should use Multi-Region Replication.
Q4. That’s exciting! What kind of Apache Cassandra–compatible do you offer?
That’s a great question and one we get a lot from customers. Casandra compatible means Amazon Keyspaces with Cassandra Query Language (CQL) compatible APIs. You can use the same Cassandra drivers, applications, and tools with Amazon Keyspaces with little or no changes. We don’t support 100 percent of the APIs today, but we do support the vast majority that customers actually use. We continue to work backwards from our customers’ needs. Therefore, our focus has been to deliver the capabilities that customers actually use and need. Since launch, we have continued to work backwards from customers and have delivered additional capabilities such as Cassandra compatible TTL, Multi-Region Replication, and Spark connector support.
Q5. Can you give us a simple example of how Amazon Keyspaces enables to use the Cassandra Query Language (CQL) API code, Cassandra drivers, and developer tools that are already in use?
Keyspaces appears as a nine-node, Apache Cassandra 3.11.2 cluster to clients and supports drivers and clients that are compatible with Apache Cassandra 3.11.2. With Amazon Keyspaces, you can run your Cassandra workloads on AWS using the same Cassandra application code, Apache 2.0–licensed drivers, and tools that you use today. For instance, you can use the following CQL command to create a table in Keyspaces:
CREATE TABLE IF NOT EXISTS “myGSGKeyspace”.employees_tbl (
id text,
name text,
region text,
division text,
project text,
role text,
pay_scale int,
vacation_hrs float,
manager_id text,
PRIMARY KEY (id,division))
WITH CLUSTERING ORDER BY (division ASC) ;
Similarly, you can use the following CQL commands to insert and read data from Keyspaces
INSERT INTO “myGSGKeyspace”.employees_tbl
(id, name, project, region, division, role, pay_scale,
vacation_hrs, manager_id)
VALUES (‘012-34-5678′,’Russ’,’NightFlight’,’US’,
‘Engineering’,’IC’,3,12.5, ‘234-56-7890’) ;
SELECT * FROM “myGSGKeyspace”.employees_tbl ;
Q6. How do you update applications to use Amazon Keyspaces?
If you are using Cassandra today, you can simply change your connection endpoint to a Keyspaces endpoint and follow the best practices documented here. We recommend that you review our supported list of APIs to ensure Keyspaces will be able to handle your application’s requirements. If you are new to Cassandra or Keyspaces, you can use your code samples from our GitHub repository.
Q7. Since there are no servers to manage, how does it work in practice? Where are the servers and who is managing them?
We take care of managing the servers for you! You can specify a scaling mode for each table and we will ensure that your application has enough throughput to handle its traffic. Moreover, we provision, manage, and patch the servers transparently to the user. Data is encrypted at rest and in transit by default and Amazon Keyspaces enables you to back up your table data continuously using point-in-time recovery.
Q8. Does Keyspaces have the concept of maintenance windows from patching? Can you please explain if applications experience downtime from maintenance events?
No, Keyspaces does not have a concept of maintenance windows. All database maintenance events such as hardware replacement, security updates, feature launches, and bug fixes are performed with zero downtime! As there is no maintenance windows, no version, and no upgrades – one of the many reasons why Keyspaces is easy to manage.
Q9. You mentioned that it is possible to create continuous table backups with hundreds of terabytes of data with no performance impact to applications. Can you please explain how does it work in practice?
With Keyspaces, you can backup tables with no impact on performance and availability to your production applications. Backups process in seconds regardless of the size of your tables, so you do not have to worry about backup schedules or long-running processes. In addition, all backups are automatically encrypted and easily discoverable. Here are two reasons why Keyspaces is able to perform backups at petabyte scale with no performance impact to applications. First, backups on Keyspaces are performed on the storage subsystem and does not share compute resources with your database queries. This is different from traditional databases where database queries experience performance impact during background backup jobs. Second, Keyspaces use incremental backups. With the incremental strategy, Keyspaces only backup data that has changed since the last backup occurred. When enabled, Keyspaces maintains backups of your table for the last 35 days until you explicitly turn it off.
Q10. Intuit migrated a 120TB workload from Cassandra to Keyspaces. Can you tell us a bit about the migration and how Amazon Keyspaces is used by Intuit?
The Intuit data exchange (IDX) platform team acquires and consolidates data from thousands of financial institutions to support data needs across Intuit products for millions of customers. The team previously relied on Apache Cassandra as its near-real-time data store and had a few key goals outlined around operational efficiency gains, effortless patching, and business agility for its target state. Additionally, it wanted the data store to scale elastically for additional workloads and peak traffic conditions with minimal latency to minimize performance impact for customers. Intuit evaluated several database services and chose Keyspaces for its database migration.
With a small team of engineers, Intuit seamlessly migrated more than 120 TB of data distributed across a 66-node Apache Cassandra cluster. On Keyspaces, Intuit was able to expand reliability and data availability for its customers, whereas the legacy system had experienced occasional time-outs. On Keyspaces, Intuit has delivered users a consistent experience without any production incidents. Furthermore, Intuit’s team members no longer need to dedicate time for Apache Cassandra maintenance and operations (including handling lower-level database management tasks, data compaction, read repair, and tombstone removal).
On Keyspaces, the Intuit team has enhanced the elasticity of its workloads, facilitating the ability to bring on additional workloads and get them to market quickly. “In our prior state, if we had to scale out our cluster for more capacity, we would need a lead time of a few weeks,” says Mohan. “Now, using Amazon Keyspaces, we can accomplish this in 1 day.” Intuit’s expanded scaling capabilities also make adding workloads simpler for the team, and it extends those growth benefits to users.
Also, to improve its recovery times, the data exchange team uses point-in-time recovery from Amazon Keyspaces. Using this feature, which provides continuous backups, the team can immediately restore its Amazon Keyspaces data to the second
Q11. Anything else you would like to add? –
As part of the AWS Free Tier (https://aws.amazon.com/free/), you can get started with Amazon Keyspaces for free. For the first three months, you are offered a monthly free tier of 30 million on-demand write request units, 30 million on-demand read request units, and 1 GB of storage (limit of one free tier per payer account). Your free tier starts from the first month when you create your first Amazon Keyspaces resource. Get started with Amazon Keyspaces (https://console.aws.amazon.com/keyspaces/).
…………………………………………….
Meet Bhagdev is a Principal Product Manager at Amazon Web Services. Meet is passionate about open-source, databases, and analytics and spends his time working with customers to understand their requirements and building delightful experiences. Meet has over a decade of experience as product manager on database and analytics services. At AWS, Meet leads the product team for Amazon Keyspaces and previously was a lead product manager on Amazon DocumentDB. Prior to his time at AWS, Meet worked on Azure databases at Microsoft.
Sponsored by AWS