MySQL-State of the Union. Interview with Tomas Ulin.
“With MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements” –Tomas Ulin.
On February 5, 2013, Oracle announced the general availability of MySQL 5.6.
I have interviewed Tomas Ulin, Vice President for the MySQL Engineering team at Oracle. I asked him several questions on the state of the union for MySQL.
RVZ
Q1. You support several different versions of the MySQL database. Why? How do they differ with each other?
Tomas Ulin: Oracle provides technical support for several versions of the MySQL database to allow our users to maximize their investments in MySQL. Additional details about Oracle’s Lifetime Support policy can be found here.
Each new version of MySQL has added new functionality and improved the user experience. Oracle just made available MySQL 5.6, delivering enhanced linear scalability, simplified query development, better transactional throughput and application availability, flexible NoSQL access, improved replication and enhanced instrumentation.
Q2. Could you please explain in some more details how MySQL can offer a NoSQL access?
Tomas Ulin: MySQL 5.6 provides simple, key-value interaction with InnoDB data via the familiar Memcached API. Implemented via a new Memcached daemon plug-in to mysqld, the new Memcached protocol is mapped directly to the native InnoDB API and enables developers to use existing Memcached clients to bypass the expense of query parsing and go directly to InnoDB data for lookups and transactional compliant updates. With MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements. More information is available here.
MySQL Cluster presents multiple interfaces to the database, also providing the option to bypass the SQL layer entirely for native, blazing fast access to the tables. Each of the SQL and NoSQL APIs can be used simultaneously, across the same data set. NoSQL APIs for MySQL Cluster include memcached as well as the native C++ NDB API, Java (ClusterJ and ClusterJPA) and HTTP/REST. Additionally, during our MySQL Connect Conference last fall, we announced a new Node.js NoSQL API to MySQL Cluster as an early access feature. More information is available here.
Q3. No single data store is best for all uses. What are the applications that are best suited for MySQL, and which ones are not?
Tomas Ulin: MySQL is the leading open source database for web, mobile and social applications, delivered either on-premise or in the cloud. MySQL is increasingly used for Software-as-a-Service applications as well.
It is also a popular choice as an embedded database with over 3,000 ISVs and OEMs using it.
MySQL is also widely deployed for custom IT and departmental enterprise applications, where it is often complementary to Oracle Database deployments. It also represents a compelling alternative to Microsoft SQL Server with the ability to reduce database TCO by up to 90 percent.
From a developer’s perspective, there is a need to address growing data volumes, and very high data ingestion and query speeds, while also allowing for flexibility in what data is captured. For this reason, the MySQL team works to deliver the best of the SQL and Non-SQL worlds to our users, including native, NoSQL access to MySQL storage engines, with benchmarks showing 9x higher INSERT rates than using SQL, while also supporting online DDL.
At the same time, we do not sacrifice data integrity by trading away ACID compliance, and we do not trade away the ability to run complex SQL-based queries across the same data sets. This approach enables developers of new services to get the best out of MySQL database technologies.
Q4. Playful Play, a Mexico-based company, is using MySQL Cluster Carrier Grade Edition (CGE) to support three million subscribers on Facebook in Latin America. What are the technical challenges they are facing for such project? How do they solve them?
Tomas Ulin: As a start-up business, fast time to market at the lowest possible cost was their leading priority. As a result, they developed the first release of the game on the LAMP stack.
To meet both the scalability and availability requirements of the game, Playful Play initially deployed MySQL in a replicated, multi-master configuration.
As Playful Play’s game, La Vecidad de El Chavo, spread virally across Facebook, subscriptions rapidly exceeded one million users, leading Playful Play to consider how to best architect their gaming platforms for long-term growth.
The database is core to the game, responsible for managing:
• User profiles and avatars
• Gaming session data;
• In-app (application) purchases;
• Advertising and digital marketing event data.
In addition to growing user volumes, the El Chavo game also added new features that changed the profile of the database. Operations became more write-intensive, with INSERTs and UPDATEs accounting for up to 70 percent of the database load.
The game’s popularity also attracted advertisers, who demanded strict SLAs for both performance (predictable throughput with low latency) as well as uptime.
After their evaluation, PlayFul Play decided MySQL Cluster was best suited to meet their needs for scale and HA.
After their initial deployment, they engaged MySQL consulting services from Oracle to help optimize query performance for their environment, and started to use MySQL Cluster Manager to manage their installation, including automating the scaling of their infrastructure to support the growth from 30,000 new users every day. With subscriptions to MySQL Cluster CGE, which includes Oracle Premier Support and MySQL Cluster Manager in an integrated offering, Playful Play has access to qualified technical and consultative support which is also very important to them.
Playful Play currently supports more than four million subscribers with MySQL.
More details on their use of MySQL Cluster are here.
Q5. What are the current new commercial extensions for MySQL Enterprise Edition? How do commercial extensions differ from standard open source features of MySQL?
Tomas Ulin: MySQL Community Edition is available to all at no cost under the GPL. MySQL Enterprise Edition includes advanced features, management tools and technical support to help customers improve productivity and reduce the cost, risk and time to develop, deploy and manage MySQL applications. It also helps customers improve the performance, security and uptime of their MySQL-based applications.
This link is to a short demo that illustrates the added value MySQL Enterprise Edition offers.
Further details about all the commercial extensions can be found in this white paper. (Edit: You must be logged in to access this content.)
The newest additions to MySQL Enterprise Edition, released last fall during MySQL Connect, include:
* MySQL Enterprise Audit, to quickly and seamlessly add policy-based auditing compliance to new and existing applications.
* Additional MySQL Enterprise High Availability options, including Distributed Replicated Block Device (DRBD) and Oracle Solaris Clustering, increasing the range of certified and supported HA options for MySQL.
Q6. In September 2012, Oracle announced the first development milestone release of MySQL Cluster 7.3. What is new?
Tomas Ulin: The release comprises:
• Development Release 1: MySQL Cluster 7.3 with Foreign Keys. This has been one of the most requested enhancements to MySQL Cluster – enabling users to simplify their data models and application logic – while extending the range of use-cases.
• Early Access “Labs” Preview: MySQL Cluster NoSQL API for Node.js. Implemented as a module for the V8 engine, the new API provides Node.js with a native, asynchronous JavaScript interface that can be used to both query and receive results sets directly from MySQL Cluster, without transformations to SQL. This gives lower latency for simple queries, while also allowing developers to build “end-to-end” JavaScript based services – from the browser, to the web/application layer through to the database, for less complexity.
• Early Access “Labs” Preview: MySQL Cluster Auto-Installer. Implemented with a standard HTML GUI and Python-based web server back-end, the Auto-Installer intelligently configures MySQL Cluster based on application requirements and available hardware resources. This makes it simple for DevOps teams to quickly configure and provision highly optimized MySQL Cluster deployments – whether on-premise or in the cloud.
Additional details can be found here.
Q7. John Busch, previously CTO of former Schooner Information Technology commented in a recent interview (1): “legacy MySQL does not scale well on a single node, which forces granular sharding and explicit application code changes to make them sharding-aware and results in low utilization of severs”. What is your take on this?
Tomas Ulin: Improving scalability on single nodes has been a significant area of development, for example MySQL 5.6 introduced a series of enhancements which have been further built upon in the server, optimizer and InnoDB storage engine. Benchmarks are showing close to linear scalability for systems with 48 cores / threads, with 230% higher performance than MySQL 5.5. Details are here.
Q8. Could you please explain in your opinion the trade-off between scaling out and scaling up? What does it mean in practice when using MySQL?
Tomas Ulin: On the scalability front, MySQL has come a long way in the last five years and MySQL 5.6 has made huge improvements here – depending on your workload, you may scale well up to 32 or 48 cores. While the old and proven techniques can work well as well: you may use master -slave(s) replication and split your load by having writes on the master only, while reads on slave(s), or using “sharding” to partition data across multiple computers. The question here is still the same: do you hit any bottlenecks on a single (big) server or not?… – and if yes, then you may start to think which kind of “distributed” solution is more appropriate for you. And it’s not only MySQL server related — similar problems and solutions are coming with all applications today targeting a high load activity.
Some things to consider…
Scale-up pros:
– easier management
– easier to achieve consistency of your data
Scale-up cons:
– you need to dimension your server up-front, or take into account the cost of throwing away your old hardware when you scale
– cost/performance is typically better on smaller servers
– in the end higher cost
– at some point you reach the limit, i.e. you can only scale-up so much
Scale-out pros:
– you can start small with limited investment, and invest incrementally as you grow, reusing existing servers
– you can choose hardware with the optimal cost/performance
Scale-out cons:
– more complicated management
– you need to manage data and consistency across multiple servers, typically by making the application/middle tier aware of the data distribution and server roles in your scale-out (choosing MySQL Cluster for scale-out does not incur this con, only if you choose a more traditional master-slave MySQL setup)
Q9. How can you obtain scalability and high-performance with Big Data and at the same time offer SQL joins?
Tomas Ulin: Based on estimates from leading Hadoop vendors, around 80 percent of their deployments are integrated with MySQL.
As discussed above, there has been significant development in NoSQL APIs to InnoDB and MySQL Cluster storage engines which allow high speed ingestion of high velocity data Key/Value data, but which also allows complex queries, including JOIN operations to run across that same data set using SQL.
Technologies like Apache Sqoop are commonly used to load data to and from MySQL and Hadoop, so many users will JOIN structured, relational data from MySQL with unstructured data such as clickstreams within Map/Reduce processes within Hadoop. We are also working on our Binlog API to enable real time CDC with Hadoop. More on the Binlog API is here.
Q10. Talking about scalability and performance what are the main differences if the database is stored on hard drives, SAN, flash memory (Flashcache)? What happens when data does not fit in DRAM?
Tomas Ulin: The answer depends on your workload.
The most important question is: “how long is your active data set remaining cached?” — if not long at all, then it means your activity will remain I/O-bound, and having faster storage here will help. But, if the “active data set” is small enough to be cached most of the time, the impact of the faster storage will not be as noticeable. MySQL 5.6 delivers many changes improving performance on heavy I/O-bound workloads.
————–
Mr. Tomas Ulin has been working with the MySQL Database team since 2003. He is Vice President for the MySQL Engineering team, responsible for the development and maintenance of the MySQL related software products within Oracle, such as the MySQL Server, MySQL Cluster, MySQL Connectors, MySQL Workbench, MySQL Enterprise Backup, and MySQL Enterprise Monitor. Prior to working with MySQL his background was in the telecom industry, working for the Swedish telecom operator Telia and Telecom vendor Ericsson. He has a Masters degree in Computer Science and Applied Physics from Case Western Reserve University and a PhD in Computer Science from the Royal Institute of Technology.
Related Posts
(1): A super-set of MySQL for Big Data. Interview with John Busch, Schooner. on February 20, 2012
(2): Scaling MySQL and MariaDB to TBs: Interview with Martín Farach-Colton. on October 8, 2012.
–On Eventual Consistency– Interview with Monty Widenius.on October 23, 2012
Resources
ODBMS.org: Relational Databases, NewSQL, XML Databases, RDF Data Stores:
Blog Posts | Free Software | Articles and Presentations| Lecture Notes | Tutorials| Journals |
Follow us on Twitter: @odbmsorg
##
On MySQL5.6 scaling up, this is only feasible for internal applications, which you don’t want to spend the effort to refactor your code. Any Internet application that needs to support 10s of millions of users, needs a scale-out strategy.
On MySQL Cluster 7.x, this is a huge step forward. However there are still hardware and scale limitations. It is a stop gap solution to properly scaling an application to 10s of millions of users.
-Hardware: Every node needs to hold at least all the Index of the dataset in RAM. This makes using commodity hardware or virtualized hardware impractical.
-Scale: There is an upper limit to how many nodes your cluster can have. So once you hit that limit, you are back to scaling up.
I tackled the complexities of scaling a MySQL solution in my previous company. If your use case fits the limitations, these are great products. If you are going to be the next Big Data “thing”, these are both non-starters.
Thanks for taking the time to read and comment on the article. We are very excited about the new MySQL 5.6 release, and the trajectory of MySQL Cluster adoption
To answer your specific points, it is correct that scaling up will only get a user so far, but in a little over 4 years, we have scaled MySQL from 4 cores (at best) to 48 cores – this takes users an awful long way, for both internal and internet-based applications.
MySQL allows you to add replication slaves online, so you can scale out your database on commodity hardware on -premise or in the cloud – and you can choose between Asynchronous, Semi-synchronous, and Synchronous replication depending on you availability and consistency requirements.
To enumerate specific scalability enhancements in MySQL 5.6
– Innodb enhancements – allows you to scale your MySQL system further giving you 2x more performance
– Optimizer enhancements – allows anything from 2x to 280x faster query performance
– Memcached api – allows you to optimize performance critical parts of your application without sacrificing consistency, and retaining the flexibility and rapid development that SQL enables – 10x higher performance
– Replication enhancements: Multithreaded slave – allows you to scale your MySQL system further giving you 5x more performance
– Replication enhancements: Binlog group commit – allows you to scale your MySQL system further giving you 4x more performance
With respect to your comments on MySQL Cluster
– Hardware: The indexes are partitioned just like the rest of the data – each data node just stores its share which enables scale-out across commodity hardware
– Scale: Up to 255 nodes are currently supported by MySQL Cluster. The important factor is node-performance. In recent benchmark tests conducted with Intel (1), a 30 node Cluster delivered just under 20m UPDATE operations per second (1.2 Billion per minute). We believe that MySQL Cluster scale is not inhibited by node count
In summary, MySQL has been selected by many of the world’s largest and most demanding internet-scale services – for example, Facebook, YouTube, and Twitter all run on MySQL. Equally important is that some of the newest and fastest growing services have also selected MySQL, despite having a myriad of other options to choose from, ie Tumblr, Pinterest, Box, Quora, etc.
With MySQL 5.6 and MySQL Cluster we are pushing the boundaries further. With future releases we will continue to do even more of this
1. http://www.mysql.com/why-mysql/benchmarks/mysql-cluster/
Hi Tomas,
Thanks for replying. I do agree that replication will help scale out a MySQL installation for reads. I have used this strategy successfully and recommend it regularly. However you quickly run into a write scale issue on a single Master node. When your writes per second go beyond what (mostly) the hardware and (likely) what the software can handle, that is where a single RDBMS (not just MySQL) node fails to support an Internet based application.
I also do agree that MySQL Cluster’s 255 node limit should handle all but the most edge of cases. Zillow has been using MySQL Cluster successfully for a while now. However, as companies move more and more to the cloud, they realize that solutions that depend on physical hardware’s IOPS and network speeds, don’t translate very well in a virtualized environment, that has both limited IOPS and network throughput. Applications need to adjust to that reality and so eventual consistent solutions are being used more and more, ehich can excel in less powerful environments.
Furthermore, there are other limitations to Cluster, which you even point out here : http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-limitations.html
I’m a fan of MySQL, I even hosted a podcast for a while called the MySQLGuy Podcast. My goal is not to diminish the great work your team is doing, but to set the appropriate expectations for any users reading Roberto’s article.
Cheers,
Christos
There is a solution to these problems by architecting the big
data solution in a different way.We have several customers using our in-memory
product to hold terabytes of data in memory for fast access. For reads, the data
can be bulk-loaded from the data source (like MySQL) and accessed in micro-seconds
within the application without any scale limitations. For writes, the data
is setup to be eventually consistent by doing write behinds every few hours. In
instances where strong consistency is required, the data is either micro-batched
or uses a variety of locking configurations to optimize performance. Happy to share
more details or discuss specific use cases.
Thomas, I wonder if you have any comments regarding the role of MySQL and the direction it is headed as compared to the Oracle NoSQL Database initiative which is building on scale-out key:value distribution and reliable replication services – which seems to overlap considerable with some of the messaging on MySQL.
Also – somewhat of a side note, but I think its interesting that both Oracle’s NoSQL Database and MySQL are offering an API specifically integrated for Node.js. While I am not a huge fan of JavaScript, its penetration is undeniable and what the Node.js folks have done may prove to be transformative for the software industry. It may very well move the web space to distributed objects and eliminate the pain of XML/JSON/BSON, etc. Giving that server technology seamless database access is a smart move.
-Robert
Christos, a MySQL setup that is not partitioned in some way (i.e. all data is written to all MySQL servers), will eventually not scale with further updates.
This is however shared with any architecture that fully replicates the data, irrespective if you have a “single master”, or “several masters”, to write to. I.e. the scalability issue does not have to do with having a “single master”, you will be able to get the same amount of data through a cluster that has a single master, as you would one with multiple masters, as in the end, all nodes must be able to sustain the same number of updates.
What we have done in 5.6 is to allow you to scale your master (and slaves) even further as to push the limit further. When you eventually hit the limit, you need to start to partition your data somehow. This is how for example MySQL Cluster achieves linear scalability, it internally partitions the data transparently to the user. For regular MySQL master-slave replication we currently do not provide any off-the-shelf solution to partition your data transparently. This is something you either have to build into your application logic, or build a middle ware to handle for you, as several companies have successfully done. And some of these implementations are available as open source solutions.
BR, Tomas
Robert,
The Oracle NoSQL Database and MySQL are complementary technologies and enable Oracle to offer a complete solution stack to its customers. For web applications well served by a relational database, users can rely on MySQL, with the option to use the NoSQL access to MySQL via memcached to speed up key value read and write operations. For applications primarily handling large amounts of horizontally distributed unstructured and/or evolving data, Oracle NoSQL Database is the way to go. Oracle NoSQL Database can be used standalone or in conjunction with Hadoop and the Oracle Database for complex queries.
Regards,
Dave