MySQL-State of the Union. Interview with Tomas Ulin.
“With MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements” –Tomas Ulin.
On February 5, 2013, Oracle announced the general availability of MySQL 5.6.
I have interviewed Tomas Ulin, Vice President for the MySQL Engineering team at Oracle. I asked him several questions on the state of the union for MySQL.
Q1. You support several different versions of the MySQL database. Why? How do they differ with each other?
Tomas Ulin: Oracle provides technical support for several versions of the MySQL database to allow our users to maximize their investments in MySQL. Additional details about Oracle’s Lifetime Support policy can be found here.
Each new version of MySQL has added new functionality and improved the user experience. Oracle just made available MySQL 5.6, delivering enhanced linear scalability, simplified query development, better transactional throughput and application availability, flexible NoSQL access, improved replication and enhanced instrumentation.
Q2. Could you please explain in some more details how MySQL can offer a NoSQL access?
Tomas Ulin: MySQL 5.6 provides simple, key-value interaction with InnoDB data via the familiar Memcached API. Implemented via a new Memcached daemon plug-in to mysqld, the new Memcached protocol is mapped directly to the native InnoDB API and enables developers to use existing Memcached clients to bypass the expense of query parsing and go directly to InnoDB data for lookups and transactional compliant updates. With MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements. More information is available here.
MySQL Cluster presents multiple interfaces to the database, also providing the option to bypass the SQL layer entirely for native, blazing fast access to the tables. Each of the SQL and NoSQL APIs can be used simultaneously, across the same data set. NoSQL APIs for MySQL Cluster include memcached as well as the native C++ NDB API, Java (ClusterJ and ClusterJPA) and HTTP/REST. Additionally, during our MySQL Connect Conference last fall, we announced a new Node.js NoSQL API to MySQL Cluster as an early access feature. More information is available here.
Q3. No single data store is best for all uses. What are the applications that are best suited for MySQL, and which ones are not?
Tomas Ulin: MySQL is the leading open source database for web, mobile and social applications, delivered either on-premise or in the cloud. MySQL is increasingly used for Software-as-a-Service applications as well.
It is also a popular choice as an embedded database with over 3,000 ISVs and OEMs using it.
MySQL is also widely deployed for custom IT and departmental enterprise applications, where it is often complementary to Oracle Database deployments. It also represents a compelling alternative to Microsoft SQL Server with the ability to reduce database TCO by up to 90 percent.
From a developer’s perspective, there is a need to address growing data volumes, and very high data ingestion and query speeds, while also allowing for flexibility in what data is captured. For this reason, the MySQL team works to deliver the best of the SQL and Non-SQL worlds to our users, including native, NoSQL access to MySQL storage engines, with benchmarks showing 9x higher INSERT rates than using SQL, while also supporting online DDL.
At the same time, we do not sacrifice data integrity by trading away ACID compliance, and we do not trade away the ability to run complex SQL-based queries across the same data sets. This approach enables developers of new services to get the best out of MySQL database technologies.
Q4. Playful Play, a Mexico-based company, is using MySQL Cluster Carrier Grade Edition (CGE) to support three million subscribers on Facebook in Latin America. What are the technical challenges they are facing for such project? How do they solve them?
Tomas Ulin: As a start-up business, fast time to market at the lowest possible cost was their leading priority. As a result, they developed the first release of the game on the LAMP stack.
To meet both the scalability and availability requirements of the game, Playful Play initially deployed MySQL in a replicated, multi-master configuration.
As Playful Play’s game, La Vecidad de El Chavo, spread virally across Facebook, subscriptions rapidly exceeded one million users, leading Playful Play to consider how to best architect their gaming platforms for long-term growth.
The database is core to the game, responsible for managing:
• User profiles and avatars
• Gaming session data;
• In-app (application) purchases;
• Advertising and digital marketing event data.
In addition to growing user volumes, the El Chavo game also added new features that changed the profile of the database. Operations became more write-intensive, with INSERTs and UPDATEs accounting for up to 70 percent of the database load.
The game’s popularity also attracted advertisers, who demanded strict SLAs for both performance (predictable throughput with low latency) as well as uptime.
After their evaluation, PlayFul Play decided MySQL Cluster was best suited to meet their needs for scale and HA.
After their initial deployment, they engaged MySQL consulting services from Oracle to help optimize query performance for their environment, and started to use MySQL Cluster Manager to manage their installation, including automating the scaling of their infrastructure to support the growth from 30,000 new users every day. With subscriptions to MySQL Cluster CGE, which includes Oracle Premier Support and MySQL Cluster Manager in an integrated offering, Playful Play has access to qualified technical and consultative support which is also very important to them.
Playful Play currently supports more than four million subscribers with MySQL.
More details on their use of MySQL Cluster are here.
Q5. What are the current new commercial extensions for MySQL Enterprise Edition? How do commercial extensions differ from standard open source features of MySQL?
Tomas Ulin: MySQL Community Edition is available to all at no cost under the GPL. MySQL Enterprise Edition includes advanced features, management tools and technical support to help customers improve productivity and reduce the cost, risk and time to develop, deploy and manage MySQL applications. It also helps customers improve the performance, security and uptime of their MySQL-based applications.
This link is to a short demo that illustrates the added value MySQL Enterprise Edition offers.
Further details about all the commercial extensions can be found in this white paper. (Edit: You must be logged in to access this content.)
The newest additions to MySQL Enterprise Edition, released last fall during MySQL Connect, include:
* MySQL Enterprise Audit, to quickly and seamlessly add policy-based auditing compliance to new and existing applications.
* Additional MySQL Enterprise High Availability options, including Distributed Replicated Block Device (DRBD) and Oracle Solaris Clustering, increasing the range of certified and supported HA options for MySQL.
Q6. In September 2012, Oracle announced the first development milestone release of MySQL Cluster 7.3. What is new?
Tomas Ulin: The release comprises:
• Development Release 1: MySQL Cluster 7.3 with Foreign Keys. This has been one of the most requested enhancements to MySQL Cluster – enabling users to simplify their data models and application logic – while extending the range of use-cases.
• Early Access “Labs” Preview: MySQL Cluster Auto-Installer. Implemented with a standard HTML GUI and Python-based web server back-end, the Auto-Installer intelligently configures MySQL Cluster based on application requirements and available hardware resources. This makes it simple for DevOps teams to quickly configure and provision highly optimized MySQL Cluster deployments – whether on-premise or in the cloud.
Additional details can be found here.
Q7. John Busch, previously CTO of former Schooner Information Technology commented in a recent interview (1): “legacy MySQL does not scale well on a single node, which forces granular sharding and explicit application code changes to make them sharding-aware and results in low utilization of severs”. What is your take on this?
Tomas Ulin: Improving scalability on single nodes has been a significant area of development, for example MySQL 5.6 introduced a series of enhancements which have been further built upon in the server, optimizer and InnoDB storage engine. Benchmarks are showing close to linear scalability for systems with 48 cores / threads, with 230% higher performance than MySQL 5.5. Details are here.
Q8. Could you please explain in your opinion the trade-off between scaling out and scaling up? What does it mean in practice when using MySQL?
Tomas Ulin: On the scalability front, MySQL has come a long way in the last five years and MySQL 5.6 has made huge improvements here – depending on your workload, you may scale well up to 32 or 48 cores. While the old and proven techniques can work well as well: you may use master -slave(s) replication and split your load by having writes on the master only, while reads on slave(s), or using “sharding” to partition data across multiple computers. The question here is still the same: do you hit any bottlenecks on a single (big) server or not?… – and if yes, then you may start to think which kind of “distributed” solution is more appropriate for you. And it’s not only MySQL server related — similar problems and solutions are coming with all applications today targeting a high load activity.
Some things to consider…
- easier management
- easier to achieve consistency of your data
- you need to dimension your server up-front, or take into account the cost of throwing away your old hardware when you scale
- cost/performance is typically better on smaller servers
- in the end higher cost
- at some point you reach the limit, i.e. you can only scale-up so much
- you can start small with limited investment, and invest incrementally as you grow, reusing existing servers
- you can choose hardware with the optimal cost/performance
- more complicated management
- you need to manage data and consistency across multiple servers, typically by making the application/middle tier aware of the data distribution and server roles in your scale-out (choosing MySQL Cluster for scale-out does not incur this con, only if you choose a more traditional master-slave MySQL setup)
Q9. How can you obtain scalability and high-performance with Big Data and at the same time offer SQL joins?
Tomas Ulin: Based on estimates from leading Hadoop vendors, around 80 percent of their deployments are integrated with MySQL.
As discussed above, there has been significant development in NoSQL APIs to InnoDB and MySQL Cluster storage engines which allow high speed ingestion of high velocity data Key/Value data, but which also allows complex queries, including JOIN operations to run across that same data set using SQL.
Technologies like Apache Sqoop are commonly used to load data to and from MySQL and Hadoop, so many users will JOIN structured, relational data from MySQL with unstructured data such as clickstreams within Map/Reduce processes within Hadoop. We are also working on our Binlog API to enable real time CDC with Hadoop. More on the Binlog API is here.
Q10. Talking about scalability and performance what are the main differences if the database is stored on hard drives, SAN, flash memory (Flashcache)? What happens when data does not fit in DRAM?
Tomas Ulin: The answer depends on your workload.
The most important question is: “how long is your active data set remaining cached?” — if not long at all, then it means your activity will remain I/O-bound, and having faster storage here will help. But, if the “active data set” is small enough to be cached most of the time, the impact of the faster storage will not be as noticeable. MySQL 5.6 delivers many changes improving performance on heavy I/O-bound workloads.
Mr. Tomas Ulin has been working with the MySQL Database team since 2003. He is Vice President for the MySQL Engineering team, responsible for the development and maintenance of the MySQL related software products within Oracle, such as the MySQL Server, MySQL Cluster, MySQL Connectors, MySQL Workbench, MySQL Enterprise Backup, and MySQL Enterprise Monitor. Prior to working with MySQL his background was in the telecom industry, working for the Swedish telecom operator Telia and Telecom vendor Ericsson. He has a Masters degree in Computer Science and Applied Physics from Case Western Reserve University and a PhD in Computer Science from the Royal Institute of Technology.
ODBMS.org: Relational Databases, NewSQL, XML Databases, RDF Data Stores:
Blog Posts | Free Software | Articles and Presentations| Lecture Notes | Tutorials| Journals |
Follow us on Twitter: @odbmsorg