10 Things You Should Know about OpenStack Trove, the Open Source Database as a Service
By Ken Rugg, CEO, Tesora
Ken Rugg is a founder, CEO and board member of Tesora. Ken has spent most of his career around databases in technical, strategic and business generating roles. Tesora is the leading contributor to the OpenStack Trove project.
When it comes to DBaaS, today’s market leader in public clouds, Amazon, is demonstrating just how significant this business is in terms of customer value and interest. Toward the end of last year, Amazon Web Services (AWS) database business was on a $1 billion annual revenues run-rate. Not surprisingly, in the same timeframe AWS’s most popular hiring category was in databases and data analysis with 84 open positions.
With that kind of success, it isn’t surprising that operators of OpenStack clouds, public and private, would want to be able to offer this kind of capability to their users as well. OpenStack Trove lets them do just that. Trove is the database as a service component of OpenStack that lets administrators and DevOps professionals manage multiple instances of different database management systems (DBMS), both relational and NoSQL, using a common infrastructure. It makes database capacity available that can be consumed on-demand, and handles complete database lifecycle management.
If you are interested in OpenStack and Database as a Service, then here are some things you should know.
1. Simply put, the goal of OpenStack Trove is to make it quick and easy to deploy and manage databases of all kinds.
The OpenStack Trove project mission statement is “to provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.” To achieve this, Trove automates complex database administrative tasks including deployment, configuration, patching, backups, restores, and monitoring. Trove allows IT professionals to offer their users the ability to provision and manage a wide variety of relational and non-relational databases through a single consistent set of interfaces.
The Trove DBaaS dramatically improves agility. While the value of quickly provisioning a database is considerable, the fact that an IT customer can just as easily discard a database and provision a new one is as significant. This makes it possible to experiment to rapidly arrive at the right long-term solution without unnecessary compromise. Gone are the days of requisitioning a database server and waiting weeks or months for it to be provisioned. Users simply request database instances, pairs of instances with replication or clusters for scalability.
2. Only OpenStack Trove provides a single framework within which it is possible to operate 13 different DBMS technologies in a consistent way.
A survey by 451 Research notes that when it comes to databases, enterprises are likely to have multiple suppliers for different usages. Those will include both SQL and NoSQL data stores, ones optimized for both operational and analytic workloads, as well as both open source databases and commercial database products. As these enterprises move to private, public and hybrid cloud implementations, they bring these databases with them.
While enterprises are now using lots of databases, their management platforms have traditionally been technology specific. This trend has largely continued as database management has moved into the cloud with single database DBaaS offerings dominating the landscape. Examples of this include Azure SQL Database (Microsoft SQL Server), MongoLab (MongoDB) and Cloudant (CouchDB). While Amazon’s Relational Database Service (RDS) supports a handful of different databases, they are all traditional relational databases with similar architectures. And, AWS provides completely different management technologies to support data warehouse and NoSQL data management with Redshift and DynamoDB, respectively.
Trove takes a fundamentally different approach by creating a pluggable architecture where many different types of databases can be supported from a common infrastructure. OpenStack Trove currently supports Cassandra, CouchBase, CouchDB, DataStax Enterprise, DB2, MariaDB, MongoDB, MySQL, Oracle, Percona Server, PostgreSQL, Redis and Vertica with several more currently under development.
3. The Trove’s unique architecture is the key to supporting such a wide variety of database technologies while still exposing the best of each.
The OpenStack Trove architecture has a number of unique features that make supporting many different database technologies possible. The key elements of this architecture are the Trove Controller, Guest Agent and Guest Images.
At the center of the system is the Trove controller which is database agnostic. Users interact with the controller through GUIs or APIs to manage databases of all sorts. If the user wants to perform a backup or create a replica of a database, they don’t have to worry about the specific calls required to do that for some particular database engine.
The Guest Agent is the “database adaptor” which translates the commands that the Trove Controller receives into the language of the specific database. To support a new type of database, one can implement a new Guest Agent that implements the necessary APIs that allow the user to manage that datastore in a standard way.
To make it easy to deploy database instances rapidly, on-demand, Guest Images are provided for each version of each datastore. These Guest Images are simply virtual machine images that bundle the database server software along with the Guest Agent code.
When a guest image boots, it unpacks itself and produces a full-service, ready-to-use database instance, eliminating the need to provision and configure the database from scratch. Guest images can be configured by the operator or downloaded from publicly available sources.
In addition, Guest Images can be pre-built in optimized configurations that are tuned to deliver optimal performance and to conform to industry best practices for security and reliability. These standardized configurations also make it easier for IT staff to manage these systems. When a new security alert is issued by a database vendor to address a newly found vulnerability, the Guest Image can be replaced with a patched version and all the systems that are subject to the issue can be updated en mass.
4. Beyond basic provisioning, Trove automates the lifecycle management of the database instances it provisions.
When people first consider database as a service, they often only think of the ability for developers to launch database servers, on-demand from a web-based UI. While Trove can certainly do this, it can also do much more. Trove provides APIs to automate tasks like backup, clustering, replication and failover and does it in a way that leverages the native tools of the underlying database engines it supports.
Of course, performing an administrative task like backup on a MongoDB instance or Cassanda cluster will likely use a very different approach than backing up an instance of Oracle or MySQL. The Trove architecture ensures that the administrator does not have to bother with this detail, however. This way, administration and operations for a diverse set of database technologies can be unified and simplified through a standard set of interfaces. For example, when a backup is needed, the administrator simply issues the trove backup-create command, through the API, command line or web GUI and Trove will initiate the appropriate native process for the particular datastore being backed up.
5. Trove lets you manage database clusters as easily as single instances.
As mentioned previously, Trove is often compared to Amazon RDS. One area that OpenStack Trove goes beyond what is provided by Amazon’s RDS – or other simple DBaaS offerings – is cluster management. Users can create, grow and shrink database clusters directly through the Trove GUI or API. The interface is flexible enough to accommodate a wide variety of clustering architectures from relational databases like MySQL supporting master-master replication to traditional parallel data warehouses like Veritca to peer to peer distributed NoSQL key value stores like Redis. Currently, clustering support is available for MongoDB, Vertica, MySQL, and Redis with Cassandra and Couchbase coming in the next release.
6. All of Trove’s power is accessible through a web-based GUI, a command line interface, or a set of RESTful APIs.
While Trove is often seen through the lens of its Horizon-based web dashboard to provision and manage databases, all the functionality that Trove provides is also available using the Trove command line interface or through a complete set of RESTful APIs. This makes it easy to automate administrative tasks. Trove can be built into automated test systems that must launch large numbers of database servers to accommodate diverse usage scenarios.
7. Trove can be operated in your own company data centers ensuring conformance with corporate policies such as data retention and privacy.
When OpenStack Trove is deployed as a private cloud inside the data center, it is operated by the company’s IT staff who can make sure that it adheres with enterprise best practices and policies, such as data retention, data privacy, encryption and backups. While public cloud based DBaaS offerings may provide the tools to do the same, ensuring that those tools are applied properly often falls to individual developers who provision the databases. This can result in inconsistent compliance with corporate or even regulatory policies. When operating Trove inside the envelope of corporate governance and data security, users can be assured that the configurations they are deploying have been reviewed by IT to verify that they follow industry best practices, corporate policies and the applicable data protection laws of the jurisdictions governing the data.
8. Trove is part of the fastest-growing open source project in history.
At this point, the benefits of open source software are well understood. With broad community participation, it can yield higher quality, more secure code while eliminating vendor lock-in. Since Trove is part of OpenStack, it benefits from being part of the fastest-growing open source community in history. According to OpenHUB, the OpenStack project has had contributions from more than 3,500 individuals with nearly half that number making contributions in the past year.
Trove is a very healthy open source project on its own with over 200 individuals from 40 different companies contributing to the project over its life time. While a lot of the contributions come from companies supporting OpenStack or Trove like Red Hat, HP and Tesora, enterprises using the platform are also active contributors. For example, eBay has made significant contributions adding support for several databases and implementing the initial clustering support in the project.
9. Trove leverages core components and shared services of OpenStack making it simple for enterprises to deploy DBaaS.
To paraphrase Sy Sperlling of the Hair Club for Men, (at the risk of showing my age,) “Trove’s not only an OpenStack project but it’s also a client.” For example, Trove uses the Nova compute service to create virtual machines on which to run database servers, Cinder block storage to provision database storage, and Swift’s object storage to capture backups, as shown in the diagram. Since Trove is layered on these core services, its users can take advantage of these services without any special customization. They can also benefit from any enhancements to the underlying core services of OpenStack. For example, if Nova has been configured to offer bare metal resources through the OpenStack Ironic service, Trove can leverage that. Also, and importantly, using open source software takes much of the risk out of private cloud implementations by decreasing dependencies on technology suppliers, along with the potential for pricing fluctuations.
Figure 1: Trove Architecture – Building on OpenStack
10. It’s easy to get started using with OpenStack and Trove and there are lots of resources to help
OpenStack Trove is an open source implementation of a DBaaS platform, backed by an active and diverse development community that continues to expand its capabilities. All of the following options, along with additional information and links, can be found on the “How To Get Started With OpenStack” page.
- Try one of the many OpenStack public clouds in production around the world listed here.
- Run an OpenStack cloud on your laptop (or even inside a virtual machine) using devstack. This is ideal for seeing what OpenStack looks like from an admin or user perspective.
- Install OpenStack through a commercial software distribution. You can find those listed here.
If you want to take the next step and contribute to the project, there are also lots of ways to do that as well. As an open source project, Trove welcomes participation. My advice for people who want to get involved is simply, do it. Most people who I’ve met who “want to contribute” believe that it is hard, or that they have to write a lot of code. That is just not true. Contributions don’t necessarily have to be complex code, features or blueprints. For example, code reviews are extremely valuable and they are a great way to get to know the community, the code, and Trove itself. Read the documentation, use the product, contribute bugs, draft short write-ups that could be used to improve the documentation, and share your own tips and tricks about using Trove with others. These are all great ways to get involved.
# # #