On Open Source Databases. Interview with Peter Zaitsev
“To be competitive with non-open-source cloud deployment options, open source databases need to invest in “ease-of-use.” There is no tolerance for complexity in many development teams as we move to “ops-less” deployment models.” –Peter Zaitsev
I have interviewed Peter Zaitsev, Co-Founder and CEO of Percona.
In this interview, Peter talks about the Open Source Databases market; the Cloud; the scalability challenges at Facebook; compares MySQL, MariaDB, and MongoDB; and presents Percona’s contribution to the MySQL and MongoDB ecosystems.
Q1. What are the main technical challenges in obtaining application scaling?
Peter Zaitsev: When it comes to scaling, there are different types. There is a Facebook/Google/Alibaba/Amazon scale: these giants are pushing boundaries, and usually are solving very complicated engineering problems at a scale where solutions aren’t easy or known. This often means finding edge cases that break things like hardware, operating system kernels and the database. As such, these companies not only need to build a very large-scale infrastructures, with a high level of automation, but also ensure it is robust enough to handle these kinds of issues with limited user impact. A great deal of hardware and software deployment practices must to be in place for such installations.
While these “extreme-scale” applications are very interesting and get a lot of publicity at tech events and in tech publications, this is a very small portion of all the scenarios out there. The vast majority of applications are running at the medium to high scale, where implementing best practices gets you the scalability you need.
When it comes to MySQL, perhaps the most important question is when you need to “shard.” Sharding — while used by every application at extreme scale — isn’t a simple “out-of-the-box” feature in MySQL. It often requires a lot of engineering effort to correctly implement it.
While sharding is sometimes required, you should really examine whether it is necessary for your application. A single MySQL instance can easily handle hundreds of thousands per second (or more) of moderately complicated queries, and Terabytes of data. Pair that with MemcacheD or Redis caching, MySQL Replication or more advanced solutions such as Percona XtraDB Cluster or Amazon Aurora, and you can cover the transactional (operational) database needs for applications of a very significant scale.
Besides making such high-level architecture choices, you of course need to also ensure that you exercise basic database hygiene. Ensure that you’re using the correct hardware (or cloud instance type), the right MySQL and operating system version and configuration, have a well-designed schema and good indexes. You also want to ensure good capacity planning, so that when you want to take your system to the next scale and begin to thoroughly look at it you’re not caught by surprise.
Peter Zaitsev: The Facebook Team is the most qualified to answer this question. However, I imagine that at Facebook scale being efficient is very important because it helps to drive the costs down. If your hot data is in the cache when it is important, your database is efficient at handling writes — thus you want a “write-optimized engine.”
If you use Flash storage, you also care about two things:
- – A high level of compression since Flash storage is much more expensive than spinning disk.
– You are also interested in writing as little to the storage as possible, as the more you write the faster it wears out (and needs to be replaced).
RocksDB and MyRocks are able to achieve all of these goals. As an LSM-based storage engine, writes (especially Inserts) are very fast — even for giant data sizes. They’re also much better suited for achieving high levels of compression than InnoDB.
This Blog Post by Mark Callaghan has many interesting details, including this table which shows MyRocks having better performance, write amplification and compression for Facebook’s workload than InnoDB.
Q3. Beringei is Facebook’s open source, in-memory time series database. According to Facebook, large-scale monitoring systems cannot handle large-scale analysis in real time because the query performance is too slow. What is your take on this?
Peter Zaitsev: Facebook operates at extreme scale, so it is no surprise the conventional systems don’t scale well enough or aren’t efficient enough for Facebook’s needs.
I’m very excited Facebook has released Beringei as open source. Beringei itself is a relatively low-end storage engine that is hard to use for a majority of users, but I hope it gets integrated with other open source projects and provides a full-blown high-performance monitoring solution. Integrating it with Prometheus would be a great fit for solutions with extreme data ingestion rates and very high metric cardinality.
Q4. How do you see the market for open source databases evolving?
Peter Zaitsev: The last decade has seen a lot of open source database engines built, offering a lot of different data models, persistence options, high availability options, etc. Some of them were build as open source from scratch, while others were released as open source after years of being proprietary engines — with the most recent example being CMDB2 by Bloomberg. I think this heavy competition is great for pushing innovation forward, and is very exciting! For example, I think if that if MongoDB hadn’t shown how many developers love a document-oriented data model, we might never of seen MySQL Document Store in the MySQL ecosystem.
With all this variety, I think there will be a lot of consolidation and only a small fraction of these new technologies really getting wide adoption. Many will either have niche deployments, or will be an idea breeding ground that gets incorporated into more popular database technologies.
I do not think SQL will “die” anytime soon, even though it is many decades old. But I also don’t think we will see it being the dominant “database” language, as it has been since the turn of millennia.
The interesting disruptive force for open source technologies is the cloud. It will be very interesting for me to see how things evolve. With pay-for-use models of the cloud, the “free” (as in beer) part of open source does not apply in the same way. This reduces incentives to move to open source databases.
To be competitive with non-open-source cloud deployment options, open source databases need to invest in “ease-of-use.” There is no tolerance for complexity in many development teams as we move to “ops-less” deployment models.
Q5. In your opinion what are the pros and cons of MySQL vs. MariaDB?
Peter Zaitsev: While tracing it roots to MySQL, MariaDB is quickly becoming a very different database.
It implements some features MySQL doesn’t, but also leaves out others (MySQL Document Store and Group Replication) or implements them in a different way (JSON support and Replication GTIDs).
From the MySQL side, we have Oracle’s financial backing and engineering. You might dislike Oracle, but I think you agree they know a thing or two about database engineering. MySQL is also far more popular, and as such more battle-tested than MariaDB.
MySQL is developed by a single company (Oracle) and does not have as many external contributors compared to MariaDB — which has its own pluses and minuses.
MySQL is “open core,” meaning some components are available only in the proprietary version, such as Enterprise Authentication, Enterprise Scalability, and others. Alternatives for a number of these features are available in Percona Server for MySQL though (which is completely open source). MariaDB Server itself is completely open source, through there are other components that aren’t that you might need to build a full solution — namely MaxScale.
Another thing MariaDB has going for it is that it is included in a number of Linux distributions. Many new users will be getting their first “MySQL” experience with MariaDB.
For additional insight into MariaDB, MySQL and Percona Server for MySQL, you can check out this recent article
Q6. What’s new in the MySQL and MongoDB ecosystem?
Peter Zaitsev: This could be its own and rather large article! With MySQL, we’re very excited to see what is coming in MySQL 8. There should be a lot of great changes in pretty much every area, ranging from the optimizer to retiring a lot of architectural debt (some of it 20 years old). MySQL Group Replication and MySQL InnoDB Cluster, while still early in their maturity, are very interesting products.
For MongoDB we’re very excited about MongoDB 3.4, which has been taking steps to be a more enterprise ready database with features like collation support and high-performance sharding. A number of these features are only available in the Enterprise version of MongoDB, such as external authentication, auditing and log redaction. This is where Percona Server for MongoDB 3.4 comes in handy, by providing open source alternatives for the most valuable Enterprise-only features.
Q7. Anything else you wish to add?
Peter Zaitsev: I would like to use this opportunity to highlight Percona’s contribution to the MySQL and MongoDB ecosystems by mentioning two of our open source products that I’m very excited about.
First, Percona XtraDB Cluster 5.7.
While this has been around for about a year, we just completed a major performance improvement effort that allowed us to increase performance up to 10x. I’m not talking about improving some very exotic workloads: these performance improvements are achieved in very typical high-concurrency environments!
I’m also very excited about our Percona Monitoring and Management product, which is unique in being the only fully packaged open source monitoring solution specifically built for MySQL and MongoDB. It is a newer product that has been available for less than a year, but we’re seeing great momentum in adoption in the community. We are focusing many of our resources to improving it and making it more effective.
Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With more than 150 professionals in 29 countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of Internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.
Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Data Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads.
follow us on Twitter: @odbmsorg