The Advantages of AWS Purpose-Built Databases. Q&A with Vlad Vlasceanu
“A managed database service is a cloud database that is managed and maintained by AWS, allowing your DBA and development teams to focus exclusively on the application and schema management.”
Q1. What is a purpose-built database?
Vlad: Purpose-built databases are specialized for specific workloads and application requirements. The introduction of Internet-enabled applications has changed the demands we place on our databases. As a result, developers now reach for databases that can go faster and scale globally, more than ever before. Additionally, the rise of cloud computing has changed what’s technically possible because we can build more resilient, scalable applications in an economical way.
Relational databases are no longer the single best answer for all database workloads. While relational databases are still essential — in fact, they are still growing — a “relational only” approach no longer works in today’s world. With the rapid growth of data — in volume, velocity, variety, complexity, and interconnections—database requirements have changed. Many new applications, that have social, mobile, IoT, and global access requirements, are unable to scale using a central relational database alone.
Also, a key application architecture shift underlying the move to purpose-built databases is the shift to a microservices architecture. The current best practice for application development is to split large monolithic applications into a number of microservices. Each service is focused on its core domain and can be changed, scaled, and retired separately from the rest of the application. Microservices drive end-to-end ownership of each service, and decoupling and isolation from other services, as long as interfaces between microservices are maintained. This gives each service the flexibility to use the best database to support the data access patterns, scale, reliability and performance needs of that service. Some microservices need to store terabytes to petabytes of real-time data, others need sub-millisecond latency, yet others need ability to process millions of requests per second for users located anywhere in the world. It is hard to find one database that can do all of these things and more, or find a single database that can do all of these things well at the same time. More importantly, trying to consolidate all these service patterns in a single database reduces developer velocity as they have to coordinate service changes with each other. This reality has lead to an explosion of database types.
Q2: Do purpose-built databases add complexity because my architecture now requires more databases?
Vlad: While this may have been true with self-managed deployments, this is no longer an issue with AWS fully managed services. AWS does all the undifferentiated heavy lifting for you so the operational impact on IT organizations from adding more, and different types, of databases is minimal. You also get the benefit of the integration across the AWS portfolio. With fully managed services, the complexity of adding to your infrastructure increases in very small increments.
There is also another way to contemplate this complexity perception. We have all experienced the challenges with operating large scale monolithic databases that are cumbersome to effectively change, upgrade, maintain because of conflicting service and business priorities on different parts of them. Just keeping the lights on introduces significant complexity and risks to the business, not to mention slowing down time to market. Purpose-built database architectures mitigate those complexities.
Q3. What is a managed database?
Vlad: A managed database service is a cloud database that is managed and maintained by AWS, allowing your DBA and development teams to focus exclusively on the application and schema management. You can free your IT operations teams from time-consuming database tasks like server provisioning, high-availability, patching (including security vulnerabilities), scaling, backups, and hardware maintenance. AWS fully managed database services provide continuous monitoring, self-healing storage, and automated scaling to help you focus on application development. Many of the core components of AWS managed database services are fully automated through our streamlined control plane and advanced diagnostic mechanisms.
Q4: Do purpose-built databases add to the overall cost of the solution?
Vlad: To the contrary, based on the experience of our AWS customers, we find purpose-built databases reduce the costs of operating solutions at scale. For example, a video streaming service can try to use a relational database to support the movie catalog, content discovery features, user profiles, subscription components, user viewing session state updates, and so on. They will have to super-size the relational database to fit the peak load and different access patterns of all those workloads combined, not to mention coordinate the impact of any schema changes to support new features between all teams. Instead, the video streaming service may choose to use a graph database to support content discovery and recommendations, at the scale needed for that service — a low latency, high throughput, scalable key-value data store to handle viewing progress and viewing session state, or a globally distributed data store to manage user profiles and subscriptions for a global user base having comparatively less frequently changing data.
Other significant cost savings come from AWS automation of management tasks like provisioning, scaling, availability, updates, etc. We have hundreds of mechanisms in place to ensure everything is secure, reliable, and available. We can pass the savings from all this operational automation to customers. In addition, we have staffed and developed specialized operational skills across our global data centers.
Hundreds of thousands of customers have been able to save by moving to AWS managed database services, which merge the flexibility and low cost of open source databases with the robust, enterprise feature sets of commercial databases.
Q5. What are the top reasons to move to a managed database?
Vlad: In addition to lower cost, with a managed database, all of the undifferentiated heavy lifting is out of the way, so developers and IT operations teams can focus on higher-value initiatives. These may involve paying more attention to schema design and other tasks for optimizing the performance of applications. Other projects related to the organization’s digital transformation and evolution can commence without the shackles of database maintenance. You can innovate faster and pay more attention to core initiatives that truly differentiate your organization. The move to managed databases is a very liberating shift.
Q6. How many purpose-built databases does AWS have?
Vlad: AWS has the broadest set of managed database services of any cloud provider. Altogether, AWS offers more than 15 types of database engines, each built to uniquely address specific customer needs.
Q7. Which ones are relational and which ones are non-relational (NoSQL)?
Vlad: For relational databases, AWS provides fully managed services for both operational and analytical workloads. Operational workloads are handled using Amazon RDS, which has a number of commercial and open-source compatible database engines. The commercial database engines include Oracle and SQL Server. The open-source compatible options include Aurora, MySQL, PostgreSQL and MariaDB. Amazon Aurora itself supports two open-source compatible database engines, MySQL and PostgreSQL. In May 2022, Gartner Solution Scorecard awarded Amazon RDS, including Amazon Aurora, an industry high score of 95, including a 100 rating for required functionality and 94 rating for preferred functionality. For intensive analytical workloads, Amazon Redshift, a fully managed relational data warehouse, is your best option.
For non-relational databases, AWS offers multiple options that fit different data models or use cases: DynamoDB (key-value and document data models), DocumentDB (MongoDB compatible, document data model), Neptune (graph data model), Timestream (time series data model), QLDB (immutable ledger), Keyspaces (Cassandra compatible, wide-column data model), ElastiCache (in-memory Memcached and Redis data stores, key-value data model) and MemoryDB (in-memory Redis compatible persistent database, key-value data model).
Q8. Talking about non-relational (NoSQL) databases, what are the differences between the various options and their use cases?
Vlad: Key-value databases store data using simple “key equal value” constructs, or variations of it, and leverage simple data access methods (Get, Put, Delete). This simplicity in storage model and access patterns ensures data access operations always have a known, predictable processing time regardless of the amount of data stored. They are easy to scale horizontally into distributed systems that can sustain the highest levels of concurrent throughput with consistent performance. They are best fit for real-time bidding, shopping carts, storing and managing user sessions, ad servicing, recommendations, gaming (including game state, player data, session history, and leaderboards), IoT, product catalog, mobile apps, and low latency lookups.
A document database, such as Amazon DocumentDB, is a special type of key-value database, which stores values as a nested document (typically JSON documents). Documents assemble heterogeneous bits of data that are frequently accessed together rather than spreading this data across multiple normalized tables. Document databases are designed to store, organize, access, index and aggregate document data structures very efficiently at scale. While modern relational databases also support JSON data types, document databases typically provide better performance and scalability for document access patterns. You can search the database by any attribute in a document, and query on nested and array fields, aggregations or geospatial data. You can index on any key to accelerate performance. DocumentDB is API compatible with MongoDB. Typical use cases include content management, personalization and user preferences, user profile management, user authentication, and mobile applications.
A graph database, like Amazon Neptune, makes data relationships as important as the data itself. Nodes represent entities, and Edges store the relationship between entities. You can traverse relationships between objects to find hidden connections between data. Often, related data can be retrieved in just one operation. Nodes and Edges have properties that can be queried. Fraud detection is much easier when you have a graph that links people, transactions, and institutions. A graph database of your network can facilitate early identification and remediation of security breaches. Other common use cases include social networks, real-time recommendation engines, knowledge graphs, drug discovery, and top seller lists.
Time series databases, like Amazon Timestream, easily store and analyze trillions of events per day up to 1,000 times faster and at as little as 1/10th the cost of relational databases. Timeseries databases are also a special type of key-value store, where the key is a timestamp and the value can be an event or the status of a sensor or monitor. Timeseries databases, are often used for IoT applications, industrial telemetry, and event tracking. Timeseries databases also provide a wealth of contextual data to help financial analysts. Timeseries databases make it easy to cross-reference data, providing a richer, clearer picture.
Wide-column database, like Amazon Keyspaces (for Apache Cassandra), group data into separately stored columns instead of rows. Wide-column databases are highly flexible. They can efficiently store large amounts of data in a single column. This reduces the usage of disk resources and accelerates query response time. Wide-column databases are also easy to scale horizontally. Wide-column databases, are well suited for log data, IoT sensors, attribute-based data such as user preferences or equipment features, and real-time analytics. They are often used for large-scale industrial apps for equipment maintenance, fleet management, and route optimization.
Ledger databases, like Amazon Quantum Ledger Database (QLDB), maintain an immutable, cryptographically verifiable log of data changes. These databases are well suited for use cases that require a high degree of traceability like banking transactions, tracking items to minimize loss and theft and to make sure no counterfeits have entered a supply chain, and systems of record.
In-memory databases maintain the data sets in the computer memory. They support latency sensitive workloads where data access needs to occur in sub-millisecond range. They use memory-efficient key-value data models. Amazon ElastiCache is used for caching in front of slower back-end persistent data stores, but is also frequently used to store fast changing data sets, like aggregates or rankings that can be easily reconstructed. Amazon MemoryDB is a persistent Redis-compatible in-memory data store. It provides the lowest data access latency with full data durability of any of the AWS purpose-built database services. Databases that support microservices-based application architectures need to be ultra-fast because the response to a query may touch several microservices. Hence, the latency of each microservice is critically important to the overall application latency. Amazon MemoryDB is ideally suited as a backing store for microservices.
Q9. How to choose the right purpose-built engine? What practical tips can you offer?
Vlad: In choosing a purpose-built database, the first broad characteristic to consider is whether your workload is primarily transactional or analytical. A heavy analytical workload refers to use cases that aggregate and summarize large volumes of data, often consolidated from various sources. They usually process much fewer concurrent queries but they operate on many more rows per query. They are also called online analytical processing (OLAP) workloads. Analytical access patterns are supported by several AWS analytical services like Redshift, a relational data warehouse, Glue for data integration, QuickSight as a natural language query service, Athena for interactive queries, Elastic MapReduce for big data workloads, SageMaker for machine learning model development, and OpenSearch for performing interactive log analytics, real-time application monitoring, website search, and more.
Transactional workloads are characterized by a high number of concurrent operations and where each operation is reading or writing a small number of rows. This is also called OLTP for online transactional processing. If your workload needs a highly structured data model, with referential integrity and support for complex transactions, then a relational database may be the best choice for such an OLTP system. This is supported by Amazon RDS, which includes Amazon Aurora. In a relational database, you normalize your data into separate tables and assemble related entities together at query time. It is a good choice when you have multiple related entities with varying access and update patterns. The strict schema validation and normalized data model helps to implement referential integrity of the data across your application.
If your workload processes semi-structured data, you do not need referential integrity, have simpler transactional needs, or have flexibility in your data consistency, a non-relational (NoSQL) database may be a better fit. Additionally, if your data sets have certain defining characteristics, such as highly connected data, time series data, or require immutability and verifiability, this is also a strong indicator for using the relevant purpose-built database engine.
Performance requirements for a global deployment may require local instances of a database near end users, so as to eliminate the impact of a network lag. If your service is serving a critical workload for users that are awaiting a rapid response, speed is of the utmost importance to you, and you need a database with cross-region synchronization so that responses can be delivered from a local database instance. Also, adding an in-memory layer at select points in your infrastructure can boost throughput and latency.
Q10. What are your tips for choosing a database service that best fit for the job?
Vlad: If the workload characteristics don’t lead to a clear choice, then you can access additional information in this discussion on selecting databases, or this microsite dedicated to the same topic. You can also head over to the product page of each database from this consolidated database page.
Q11. What are your practical suggestions on how to move from a legacy database to a cloud-native database?
Vlad: Your best option depends on your current situation and your goals. We are seeing a common pattern of migrations from general purpose databases to the best fit, purpose built, fully managed databases. This is the most aspirational migration scenario. Modernization of your data layer involves some up front refactoring, but it gives you the performance that modern applications need, and it reduces your ongoing administrative costs. With the tooling, training, and professional services that AWS provides, this is within reach and compelling, considering the opportunity cost of not modernizing your data layer.
Increasingly, we are also seeing migrations from customers who want to break free from old-guard proprietary databases associated with lock-in, punitive licensing policies, and frequent, costly, and time-consuming audits. These customers are looking for open source options that offer deep functionality, and can help their business continue to grow at a much lower cost. Amazon Aurora (MySQL and PostgreSQL compatible) provides a good destination for these customers with its no compromise enterprise capabilities.
Other customers are eager to offload mundane database administration tasks and benefit from the cost advantages of the operational automation built into our services. For both these customers, RDS offers a rich set of database engines and operational options.
Whatever may be your situation, AWS offers tools and experts to help assess, plan, and build the right migration path for your company. The recommended approach is to use the native backup and restore capabilities of the source and target databases, in conjunction with the AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) to support any conversions and minimize downtime. This method is useful when you have to migrate the database code objects, including views, stored procedures, and functions, as part of the database migration, or have to convert between different database engines or data models. This solution is applicable to databases of any size. It keeps the database available for the application during migration and allows you to perform validation of the migrated data, while the data is getting replicated from source to target, thereby saving time on data validation.
We offer other programs and services, ranging from AWS Professional Services which taps into the deep expertise of tenured professionals for migration assistance to Database Migration Accelerator (DMA), where for a fixed fee, a team of AWS professionals handles the conversion of both the database and application for you. AWS also offers Optimization and Licensing Assessment (OLA) engagements to help you evaluate options to migrate to the cloud. You can start your cloud journey based on your specific needs. Click here to sign up so the AWS OLA team can help you.
Q12. Anything else you wish to add?
Vlad: Whether you are looking for compute power, database storage, content delivery, or other functionality, AWS has the services to help you build sophisticated applications with increased flexibility, scalability and reliability. Our continual investment in increasing levels of integration between the services accelerates your time to market, reduces cost, and empowers you to innovate through rapid iterations. You can unlock the world of possibilities and tap into strategic options that were previously only available to large organizations with massive IT budgets.
Vlad Vlasceanu is a Principal DB Solutions Specialist at AWS, based in Santa Monica, California. Vlad helps customers adopt cloud-native, purpose-built database solutions, and deploy large-scale, high-performance database architectures on AWS. His focus is on designing and implementing sustainable, cost effective and scalable database workloads that take advantage of the latest best practices and capabilities that AWS offers. Prior to joining AWS, Vlad’s career included over 15 years of designing and developing both consumer-focused web-based applications as well as data-driven applications for the Energy industry. Vlad holds a Masters of Science in Information Systems from Baylor University.
Sponsored by AWS.