Deep Dive Amazon DocumentDB Elastic Clusters. Q&A with Vin Yu
Amazon Web Services (AWS) just announced their latest feature for the Amazon DocumentDB (with MongoDB compatibility) suite that allows customers to use Amazon DocumentDB for mission-critical JSON workloads for enterprises of virtually any scale and size. Amazon DocumentDB Elastic Clusters was just launched at re:Invent and we connected with Vin Yu, senior technical product manager at AWS, to learn more.
Q1: What is Amazon DocumentDB Elastic Clusters? Why was it built?
Amazon DocumentDB Elastic Clusters lets customers scale their document databases in minutes with little to no downtime or performance impact, provides customers the ability to scale to petabytes of storage, and elastically scale to handle over a million writes and reads per second.
Elastic Clusters uses sharding to divide data across underlying compute instances called shards. Amazon DocumentDB enables customers to use MongoDB sharding APIs to create sharded collections that allow their data to be distributed across the shards, each with its own writer, expanding their write throughput with little to no application changes.
We built Elastic Clusters because over time, our customers’ businesses have grown with Amazon DocumentDB and require the ability to horizontally scale beyond a single instance to achieve millions of writes per second. We are inventing on their behalf as customers seek ease of operations, especially for advanced operational techniques like horizontal scaling.
Q2: What excites you about Amazon DocumentDB Elastic Clusters?
I’m excited to see all the amazing things our customers will be able to achieve with Amazon DocumentDB Elastic Clusters. For example, we have customers in the entertainment industry producing popular video games using Amazon DocumentDB that need to support a growing user base. With Elastic Clusters, they can support over a hundred million users simultaneously because it has the ability to scale to over a million writes and reads in minutes with little to no downtime.
In e-commerce or retail, our customers need to retain historical data beyond the storage limits of today’s standard instance-based clusters. Our customers in agriculture, manufacturing, and telecom require the ability to stream data from millions of IoT devices into Amazon DocumentDB. Elastic Clusters solves these problems by building on top of and expanding today’s Amazon DocumentDB architecture which dynamically scales storage to handle petabytes of data.
Q3: Why should customers consider Amazon DocumentDB Elastic Clusters?
Amazon DocumentDB is a fully managed, native JSON document database offering enterprise capabilities like high durability and scalability, and the ability to easily integrate with other AWS services. With standard instance-based clusters and elastic scale clusters, Amazon DocumentDB storage scales automatically up to 64 TiB (tebibyte) without any impact to the application, supports over a million of reads per second, and replicates six copies of data across three AWS Availability Zones (AZs).
Now with Amazon DocumentDB Elastic Clusters, a customer can scale to support millions of reads and writes and scale up to 2 PB of storage with little to no downtime or performance impact.
Like Amazon DocumentDB standard instance-based clusters, durability and availability is built-in by design. With Elastic Clusters, there are multiple compute nodes per shard placed across multiple AZs which provides our customers with high availability for the compute layer. Similar to standard instance-based clusters, when it comes to storage layer, Elastic Clusters stores 6 copies for each write operation across 3 AZs to ensure that the data is durable there as well. But instead of paying for 6 copies, customers only pay for 1!
Q4: How does Elastic Clusters work? What are the main benefits to highlight?
With Amazon DocumentDB Elastic Clusters, a customer can shard their collections without needing to manage instances or the cluster components.
To distribute data across multiple shards, a customer will need to create a sharded collection which involves selecting a good shard key. A shard key is a field in the JSON documents that Elastic Clusters uses to distribute data across the shards. A good shard key will evenly partition the data across the underlying shards, giving the workload the best throughput and performance. Elastic Clusters will redistribute the data as it increases or decreases the number of shards.
As the workload grows, our customers do not need to worry about provisioning, managing, tuning, or upgrading instances or servers. Elastic Clusters does this automatically.
Q5: What is unique about Amazon DocumentDB Elastic Clusters architecture?
Amazon DocumentDB Elastic Clusters has three unique architecture attributes that increase performance for customers. First, Elastic Clusters backend architecture has decoupled storage and compute layers. This enables the ability to handle over a million reads and writes per second, and scale to petabytes of storage with little to no impact to application availability.
Second, where possible, Elastic Clusters utilizes volume cloning rather than copying individual documents which makes scaling fast. If the cluster doubles in size then the operation can be performed in a fixed time regardless of the amount of data in the cluster.
Third, Elastic Clusters uses query routers in the backend to route queries to the shards. Query routers are automatically scaled and distributed across AZs and require no manual intervention.
Q6: Why is Amazon DocumentDB Elastic Clusters a game changer?
Before Amazon DocumentDB Elastic Clusters, horizontally scaling required advanced operational skills to partition applications across tens to hundreds of database nodes. It could require months to years of developer effort to build special application logic to route request to the correct database node or to orchestrate data changes across all database nodes. Moreover, constant monitoring and management of available capacity of each database node are high operational burden costs for customers. Routine tasks such as these become increasingly cumbersome as an application scales. Elastic Clusters solves two of the most critical challenges our customers face – scaling storage to petabytes and horizontally scaling reads and writes beyond a single instance.
Q7: In addition to Elastic Clusters, what other capabilities have been added to Amazon DocumentDB?
In 2022, we also added several capabilities to Amazon DocumentDB that our customers have been requesting. Customers asked for deeper insights into their Amazon DocumentDB cluster, the ability to quickly test new software against their production data, and roll out new software at faster rates. We added Performance Insights, a database tuning and monitoring feature that helps customers quickly assess the load on their database and determine when and where to act. We also added Data Manipulation Language (DML) auditing which is critical for financial service industries and healthcare organizations to record and analyze audit trails. We introduced dynamic volume resizing so our customers only pay for the storage they use. Lastly, we launched volume cloning which enables fast creation of a new cluster to get quick access to production data for development, troubleshooting and testing. You can learn how these features can help you manage your databases by reviewing our technical documentation.
Q8: Anything else you would like to add?
Modern applications require the ability to handle the unprecedented data growth that mandates a need to scale-out beyond the limits of a single database node. We are committed to innovating new scaling capabilities to handle the next generation requirements of modern applications. We look forward to having you review our technical documentation and contact us with any questions.
Vin Yu is a Senior Technical Product Manager at Amazon Web Services where he’s putting innovative database tech into products loved by developers, admins and DevOps. Before that, he was building database client tools and hybrid database products in containers and Kubernetes.
AWS positioned highest in execution and furthest in vision
Gartner has recognized Amazon Web Services (AWS) as a Leader and positioned it highest in execution and furthest in vision in the 2022 Magic Quadrant for Cloud Database Management Systems among 20 vendors evaluated. This Magic Quadrant report provides cloud data and analytics buyers with vendor insights based on Gartner research criteria. AWS has been a Leader in the report for eight consecutive years.
Magic Quadrant for Cloud Database Management Systems
Published 13 December 2022 – ID G00763557 – 71 min read
Figure 1: Magic Quadrant for Cloud Database Management Systems (source Gartner (December 2022)
Sponsored by AWS