On High-performance Applications at scale and Amazon DynamoDB. Q&A with Joseph Idziorek.
“When the COVID-19 pandemic began, there was an enormous demand for our voice and video services. In early 2020, we saw unprecedented usage grow from 10M to 300M Daily Meeting Participants from new and existing customers that needed to connect virtually. On the backend, we were able to manage this surge with Amazon DynamoDB for Zoom Meetings. Using DynamoDB global tables in conjunction with on-demand mode enabled us to scale nearly infinitely with no performance issues, even with our sudden spike in usage.“ [source]
Q1. In your opinion, what are the main challenges in implementing high-performance applications at scale?
One of the main challenges customers face when building high-performance applications is achieving consistent, fast performance at any scale with their database. Many database architectures can be acceptable at small scale but as the applications become successful and the workload grows, the performance of the database can degrade and the workload can exceed the limits of what the database architecture can provide. We built DynamoDB to provide customers like Zoom with consistent, single-digit millisecond performance at any scale.
Q2. What is Amazon DynamoDB?
Amazon DynamoDB is a fast, flexible NoSQL database service for single-digit millisecond performance at any scale. DynamoDB was built by working backwards from customers, like SmugMug and Amazon.com, to provide a fully-managed database service that could seamlessly scale while providing consistent, single-digit millisecond performance at any. As one example, for Amazon Prime Day 2022, DynamoDB powered multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 105.2 million requests per second.
Q3. Amazon DynamoDB is an AWS managed database service. What does it mean in practice?
As an AWS database service, DynamoDB is a fully-managed service. This means that customers do not have to concern themselves with operational tasks like provisioning hardware, install or patch software, configure and monitor for high availability, resilience, or durability, or implement backups or security controls. DynamoDB provides all these capabilities and more so that developers can quickly create a table and start developing against an API in minutes without needing to deal with undifferentiated heavy lifting.
Q4. What are the key benefits of a serverless database?
As a serverless database, the key benefits that customers love about DynamoDB is that there are no servers to manage and DynamoDB automatically and seamlessly scales to handle virtually any workload. There are also no versions of software to maintain, no maintenance windows, and DynamoDB provides zero downtime maintenance. Lastly, DynamoDB provides per-request pay-as-you-go pricing so that you can optimize for cost and only pay for exactly the resources you use. DynamoDB can scale to zero when the table is not being used or scale to millions of requests per second to react to customer demand.
Q5. What are the main properties of a key-value NoSQL database such as Amazon DynamoDB?
One of the key differences between DynamoDB and relational databases is that DynamoDB does not expose the concept of JOINs. As data sizes grow, the performance of using JOINs degrades and thus as an application scales, it gets slower. Instead, with DynamoDB, customers optimize their data model such that queries can be answered by a single request to the database to a primary key. Thus, as an application grows in the numbers of users, DynamoDB can seamlessly scale and provide consistent performance at any scale.
Q6. DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data export tools. Can you briefly explain what are the benefits of these features?
Beyond providing a fully-managed and serverless database experience to customers, DynamoDB enables customers to achieve a broader set of a requirements by providing capabilities like encryption-at and in-transit, continuous backups and point-in-time restore, active-active, multi-Region replication for high availability and improved performance for global applications, streams for event-based applications, read-through, write-through cache and more. Each of the these capabilities enable customers to meet their enterprise requirements and unlock more value from their data in DynamoDB.
Q7. Disney+ delivers its extensive library of digital content directly to the homes of over 60.5 million subscribers, and Amazon DynamoDB is one of the technologies that supports this global footprint. Can you tell us a bit about how Amazon DynamoDB is used for this use case?
Disney+ uses DynamoDB for several of their workloads including watchlists, bookmarks, and recommendations. One of the key DynamoDB features that Disney+ utilizes is global tables, which provides active-active cross-region replication so that customers can read and write to multiple regions for the purpose of high availability and improved performance. For Disney+’s content caching workload, they chose DynamoDB due to the uncertainty of the scale of the workload at launch and the number of subscriptions that would be active. With DynamoDB, Disney was able to set their DynamoDB table to on-demand mode and scale to the levels they needed on day one. [source]
Q8. Dropbox Saves Millions by Building a Scalable Metadata Store on Amazon DynamoDB and Amazon S3. Can you tell us a bit about how Amazon DynamoDB is used for this use case?
In the summer of 2018, Dropbox experienced a capacity crunch in its on-premises metadata store due to fast data growth in some of the partitions. Dropbox’s database team had three choices: double the on-premises storage capacity (which would cost millions of dollars), delete swaths of metadata, or find a new, highly scalable yet cost-effective solution. The third option was the best, but achieving it would be a challenge. Dropbox had less than 2 years until its on-premises system would reach maximum capacity, and the implementation team for the project was made up of just two employees.
Those circumstances led Dropbox to pursue a managed solution from Amazon Web Services (AWS). Using Amazon DynamoDB, a fully managed, flexible NoSQL database that delivers single-digit millisecond performance at any scale, and Amazon Simple Storage Service (Amazon S3), a cloud object storage service, Dropbox rapidly developed a new managed storage system called Alki. This made room for virtually unlimited user metadata, saved Dropbox from having to spend millions of dollars to increase on-premises storage and reduced the cost per gigabyte by a factor of 5.5.
The Alki team, aided by AWS Solutions Architects, constructed a log-structured merge-tree (LSM tree)–based metadata storage system, which has two layers of data storage: an upper layer for hot metadata and a lower layer for cold metadata. Amazon DynamoDB acts as the hot storage layer, ingesting audit logging data to six DynamoDB tables at 4,000–6,000 writes per second per table. Then each of these tables stores 50–80 GB daily. At the end of each day, the team offloads the metadata from these tables into Amazon S3 for permanent storage, after which the tables in Amazon DynamoDB are deleted.
By the beginning of 2019, less than 6 months after the Alki team chose Amazon DynamoDB and Amazon S3, Alki was in its beta stage of production, ingesting all data and serving a subset of the reads. By October 2019 about 300 TB of audit log data—representing a quarter of all data stored in Edgestore—had been migrated to Alki, which was now in full production.
The scalability of Amazon DynamoDB and Amazon S3 helped the Dropbox team complete that data migration in less than 2 weeks. “Normally you might design a system for 10 times the scale you would expect in steady state,” explains Lee. “But we could scale 100–1,000 times on AWS without designing the system ahead of time.” The Alki team expected steady state to be 4,000 queries per second, yet it was able to provision Amazon DynamoDB for 600,000 queries per second during the migration.
AWS Solutions Architects provided premium support to the Alki team throughout the migration, according to Lee. “We have nothing but positive things to say about our interaction with the AWS team working on Alki. They’ve always been very proactive with helping us find issues, pointing out how we might make things faster or identifying areas where we might want to be more careful operationally,” Lee says. The Alki team and the AWS Solutions Architects were able to stay in constant communication through real-time channels. And the Alki team will continue to reap the benefits of that collaboration through the managed services of AWS. “Running a system durably takes expertise, and we didn’t have that expertise,” says Stas Ilinskiy, software engineer on the Alki team. “But by using Amazon DynamoDB, we also gain the people with the expertise to run it.”
Alki saved Dropbox millions of dollars in expansion costs and significantly reduced per-user gigabyte costs by using Amazon DynamoDB and Amazon S3.
Q9. When is it preferable to use DynamoDB over RDS or Aurora?
The answer depends on the customer workload and objectives. Customers across virtually every industry and of every size, including start-ups, enterprises, and public sector organizations, are running every imaginable use case on DynamoDB and Aurora. With any database workload, it is important to work backwards from the customer requirements to choose the best database for the job. For customers developing new microservice architecture applications, DynamoDB offers a proven NoSQL database with single-digit millisecond performance at any scale, a known limitation of relational databases. If the goal of a customer is to adopt MySQL or PostgreSQL as their open-source database engine, Aurora provides a fully-managed, serverless solution. With 11 database services supporting over 15 database engines, AWS has a database for every application.
Q10. Anything else you would like to add?
Customers can try out DynamoDB today with the free tier and experience the simplicity of DynamoDB for building key-value and document model databases. The free tier for DynamoDB includes 25 GB of storage and up to 200 million read/write requests per month.
…………………………….
Joseph Idziorek is currently a Director of Product Management at Amazon Web Services. Joseph has over a decade of experience working in both relational and non-relational database services and holds a PhD in Computer Engineering from Iowa State University. At AWS, Joseph leads product management for DynamoDB and Keyspaces and previously led Amazon DocumentDB as well as many other purpose-built database initiatives.
Sponsored by Amazon Web Services.