A super-set of MySQL for Big Data. Interview with John Busch, Schooner.
“Legacy MySQL does not scale well on a single node, which forces granular sharding and explicit application code changes to make them sharding-aware and results in low utilization of severs”– Dr. John Busch, Schooner Information Technology
A super-set of MySQL suitable for Big Data? On this subject, I have interviewed Dr. John Busch, Founder, Chairman, and CTO of Schooner Information Technology.
Q1. What are the limitations of MySQL when handling Big Data?
John Busch: Legacy MySQL does not scale well and uses single threaded asynchronous replication. It’s poor scaling forces granular sharding across many servers and explicit application code changes to make them sharding-aware. It’s single threaded asynchronous replication results in slave lag and data inconsistency, and it requires complex manual fail-over with downtime and data loss. The net result is low utilization of severs, server sprawl, limited service availability, limited data integrity, and complex programming and administration. These are all serious problems when handling Big Data.
Q2. How is SchoonerSQL different with respect to Oracle/MySQL (5.1, 5.5, and the future 5.6)?
John Busch:Schooner licensed the source for MySQL and InnoDB directly from Oracle, with the right to enhance it in a compatible manner. Schooner made fundamental and extensive architectural and resource management advances to MySQL/InnoDB in order to make it enterprise class. SchoonerSQL fully exploits today’s commodity multi-core servers, flash memory, and high-performance networking while dramatically improving performance, availability, scalability, and cost of ownership relative to MySQL 5.X. SchoonerSQL advances include:
- very high thread level parallelism with granular concurrency control and highly parallel DRAM < -> FLASH memory hierarchy management, enabling linear vertical scaling as a function of processor cores;
- tightly integrated (DRAM to DRAM) synchronous replication, coupled with fully parallel asynchronous replication, with automated fail-over within and between data centers, enabling the highest levels of availability with no data loss. and
- transparent, workload-aware, relational sharding with DBShards, enabling unlimited high performance horizontal scaling;
SchoonerSQL is a super-set of Oracle MySQL/InnoDB 18.104.22.168/5.6, providing 100% compatibility for applications and data, while delivering order of magnitude improvements in availability, scalability, performance and cost of ownership.
Q3. How can SchoonerSQL achieve high availability (HA) with performance, scalability and at a reasonable cost?
John Busch: In the past, major trade-offs were required between performance, availability and TCO (total cost of ownership). Today’s commodity multi-core server, flash memory, and high speed networking, coupled with new database architectures and resource management algorithms, enable concurrently achieving radical improvements in performance, scalability, availability, data integrity, and cost of ownership. SchoonerSQL innovations incorporate fundamental database architecture and resource management advances, including:
· Linear vertical scaling, which fully utilizes modern commodity multi-core servers,providing 10:1 consolidation and capital and operating expense reduction;
· Unlimited horizontal scaling, which allows support of very large databases with high-performance and high availability and low cost of ownership using commodity hardware and standard SQL; and
· High performance synchronous and parallel asynchronous replication with automated fail-over, which provides 99.999% HA with full data integrity and no loss on performance.
Q4. What is your relationship with Oracle/MySQL?
John Busch: Schooner is an Oracle gold partner, and an OEM and go-to-market partner of Oracle. Schooner licensed the source for MySQL and InnoDB directly from Oracle and developed SchoonerSQL, which is completely compatible with Oracle’s MySQL. SchoonerSQL is an enterprise class database, and is targeted for customers requiring a mission critical database.. SchoonerSQL provides an order of magnitude improvement in performance, availability, scalability, and cost of ownership relative to Oracle’s MySQL 5.X.
Q5. What is special about SchoonerSQL’s transparent sharding?
John Busch: Beyond SchoonerSQL’s linear vertical scaling andclustering, which enables high-performance and high availability support ofmulti-terabyte databases, SchoonerSQL offers optional transparent relational sharding with DBShards to enable horizontal scaling across nodes for unlimited sized databases and unlimited scaling. SchoonerSQL’s DBShards relational transparent sharding is application-aware, based on analysis and optimization for the query and data access behavior of the specific customer workload.
Based on observed workload behavior, it optimally partitions the data across nodes and transparently replicates supporting data structures to eliminate cross nodecommunication to accomplish query execution.
Q6. How can you obtain scalability and high-performance with Big Data and at the same time offer SQL joins?
John Busch: SchoonerSQL’s DBShards relational transparent sharding optimally replicates supporting data structures used in dynamic queries. This is done on a workload specific-basis, based on dynamic query and data access patterns. As a result, there is no cross-node communication required to execute queries, and in particular for SQL-joins, the data is fully coalesced at the client with Schooner libraries transparently invoking the involved nodes.
Q7. What is your take on MariaDB?
John Busch: MariaDB is trying to offer an alternative to MySQL and to Oracle.
SchoonerSQL is focused on offering a superior MySQL for mission-critical applications and services, with 100% MySQL compatibility, vastly superior performance, availability, scalability and TCO, all in partnership with the Oracle corporation.
Q8. You also offer a Memcached-based product (Membrain). Why Membrain?
John Busch: Membrain is a very high-performance and very high availability scalable key-value store supporting the memcached protocol. Schooner’s experience in the market is that SchoonerSQL and Membrain are both required and very complementary.
Schooner Membrain provides high-performance, scalability, and high availability with low TCO for unstructured data based on fully exploiting flash memory and multi-core servers with synchronous replication and transparent fail -over. SchoonerSQL provides high-performance, scalability, and high availability with low TCO for structured data.
Q9. Talking about scalability and performance what are the main differences if the database is stored on hard drives, SAN, flash memory (Flashcache)? What happens when data does not fit in DRAM?
John Busch: SchoonerSQL and Schooner Membrain are designed to fully exploit the high IOPS of flash memory and SANs and the cores of today’s commodity servers. With SchoonerSQL, the performance when executing out of flash or SAN is almost the same as if everything were executing from DRAM memory, enabling the full utilization of today’s commodity multi-core servers with very high vertical scaling and consolidation at low cost.
SchoonerSQL also provides significant performance improvements with disc storage and flash cache. Measurements based on standard benchmarks show that Schooner offers much higher performance, consolidation, and scalability than any other MySQL or NoSQL product, and SchoonerSQL does this with 99.999% availability and much lower cost of ownership relative to legacy MySQL 5.X or other NoSQL offerings.
Q10. How do you differentiate yourselves from other NoSQL vendors (Key/Value stores, document-based databases and similar NoSQL databases)?
John Busch: SchoonerSQL and Membrain are unique in the industry. Schooner has over 20 filed patents on its advances in database/data store architecture and resource management.
SchoonerSQL and Membrain deliver order of magnitude improvements in performance, scalability, availability, and cost of ownership relative to any other MySQL or NoSQL, while maintaining 100% SQL and memcached compatibility.
Q11. What is your take on a database such as VoltDB?
John Busch: VoltDB is a large DRAM-only database. DRAM is expensive and volatile.
Schooner effectively utilizes parallel flash memory in a tightly integrated architecture , effectively exploiting flash, DRAM , multi-core, and multi-node scalability and availability. This results in superior cost of ownership and availability relative to VoltDB while providing high performance and unlimited scalability, all with 100% SQL compatibility.
Proliferation of Analytics
Q12. A/B testing, sessionization, bot detection, and pathing analysis all require powerful analytics on many petabytes of semi-structured Web data. How do you handle big semi-structured data?
John Busch: As we discussed above, SchoonerSQL provides optimized vertical scaling and clustering coupled with DBShards transparent relational horizontal scaling. This enables queries to be performed on unlimited semi-structured datasets, with 99.999% high availability, full data integrity, and the minimal number of commodity servers.
Q13. How do you see converging data from multiple data sources, both structured and unstructured?
John Busch: Today, SchoonerSQL and Schooner Membrain are often used in conjunction to provide support for both unstructured and structureddata. There are also emerging standards in interfacing heterogeneous structured and unstructured data stores. Schooner believes these are very important, and will contribute to and support these standards in our products.
Q14. Does it make sense to use Apache Hadoop, MapReduce and MySQL together?
John Busch: Hadoop/MapReduce provide a new distributed computational model which is very appropriate for certain application classes, and which is receiving industry acceptance and traction.
Schooner intends to enhance our products to provide exceptional interoperability with Hadoop so that customers can use these products in conjunction in delivering their services.
Dr. John Busch, is the Founder, Chairman, and CTO of Schooner Information Technology.
Prior to Schooner, John was director of computer system architecture at Sun Microsystems Laboratories from 1999 through 2006. In this role, John led research in multi-core processors, multi-tier scale-out architectures, and advanced high-performance computer systems. John received the President’s Award for Innovation at Sun.
Prior to Sun, John was VP of engineering and business partnerships with Diba, Inc., co-founder, CTO and VP of engineering of Clarity Software, and director of computer systems R&D at Hewlett Packard.
John holds a Ph.D. in Computer Science from UCLA, an M.A. in Mathematics from UCLA, an M.S. in Computer Science from Stanford University, and attended the Sloan Program at Stanford.