The evolving market for NoSQL Databases: Interview with James Phillips.
“It is possible we will see standards begin to emerge, both in on-the-wire protocols and perhaps in query languages, allowing interoperability between NoSQL database technologies similar to the kind of interoperability we’ve seen with SQL and relational database technology.” — James Phillips.
In my understanding of how the market of Data Management Platforms is evolving, I have identified three phases:
Phase I— New Proprietary data platforms developed: Amazon (Dynamo), Google (BigTable). Both systems remained proprietary and are in use by Amazon and Google.
Phase II– The advent of Open Source Developments: Apache projects such as Cassandra, Hadoop (MapReduce, Hive, Pig). Facebook and Yahoo! played major roles. Multitude of new data platforms emerge.
Phase III– Evolving Analytical Data Platforms. Hadoop for analytic. Companies such a Cloudera, but also IBM`s BigInsight are in this space.
I wanted to learn more about the background of Phases I and II. I have interviewed James Phillips, who co-founded of Membase, and since last month is Co-Founder and Senior Vice President of Couchbase, the new company that originated from the merge of Membase and CouchOne.
Phase I– New Proprietary data platforms developed: Amazon (Dynamo), Google (BigTable). Both systems remained proprietary and are in use by Amazon and Google.
Q1. Why did Amazon and Google have to develop their own database systems? Why didn’t they use/”adjust” existing database systems?
James Phillips: Existing relational database technologies were a poor match for the flexibility, performance, scaling and cost requirements of these organizations. It wasn’t enough to simply “adjust” relational database technology; there was a wholesale rethinking required. See our (non marketing 🙂 white paper for a detailed look at why that was the case.
Phase II- The advent of Open Source Developments: Apache projects such as Cassandra, Hadoop (MapReduce, Hive, Pig). Facebook and Yahoo! played major roles. Multitude of new data platforms emerge.
Q2. How was it possible that Amazon and Google`s proprietary systems were used as input for Open Source Projects?
James Phillips: Amazon and Google published academic papers
[e.g. GoogleBigTable, Amazon-Dynamo], highlighting the design of many of their data management technologies. These papers inspired the creation of a number of open source software projects.
The Apache Cassandra open source project, initially developed at Facebook, has been described by Jeff Hammerbacher who led the Facebook Data team at the time as a Big Table data model running on an Amazon Dynamo-like infrastructure.
Key Apache Hadoop project technologies (initially developed by Doug Cutting while employed at Yahoo!) were heavily influenced by published papers describing the Google File System and MapReduce technologies.
Q3. Why Facebook and Yahoo! developed Open Source data platforms and not proprietary software?
James Phillips: Obviously I can’t answer why another company made a business decision, because I wasn’t involved in the decisions.
But one can make a reasonable guess as to why these companies would do this: Facebook is not in the database business
– they are a social networking company.
They would probably have taken and used Dynamo and/or BigTable had they been open-sourced by Google and or Amazon.
But they weren’t and Facebook was forced to write Cassandra.
By open-sourcing the technology, Facebook could ostensibly benefit from the advancement of that technology by the open source community
– new features, increased performance, bug fixes and other community-driven value.
Assuming rational behavior, one can reasonably infer that the potential value of these community-driven benefits was deemed greater than the perceived cost of possibly “arming” a potential social networking competitor with data management technology, of having to fully maintain the technology themselves, or of any sort of liability associated with open-sourcing the software.
There is also some recruiting value to be gained for companies like Facebook
– by demonstrating they are developing leading-edge technology, solving hard computer science problems, etc. they are able to attract the best and brightest minds to the company.
Q4. Not everyone has data and scalability requirements such as Amazon and Google. Who currently needs such new data management platforms, and why?
James Phillips: Our white paper covers this in great detail. While not everyone has the scalability requirements, anyone building web applications (and who isn’t) needs the flexibility and cost advantages these solutions deliver.
Google and Amazon had the biggest pain initially, driving the innovation, but everyone benefits now. Velcro was invented to solve problems in space travel. Not everyone struggles with those problems. But use cases for Velcro are still being discovered.
Q5. Is there a common denominator between the business models built around such open source projects?
James Phillips:s: The only common denominator between business models related to open source software is open source software. There are support, product, services, training and countless other offerings that are derived from, based on or related to open source software projects.
Q6. Last month Membase and CouchOne merged to form Couchbase. What is the reasoning for this merge?
James Phillips:: Prior to merging, Membase and CouchOne had focused on different layers of the NoSQL database technology stack:
– Membase had focused on distributed caching, cluster management and high-performance memory-to-disk data flows.
– CouchOne had concentrated on advanced indexing, document-oriented operations, real-time map-reduce, replication and support for mobile/web application synchronization.
There were many Membase customers asking for features of the CouchOne platform, and vice versa. This merger has allowed us to each eliminate roughly 18 months of redundant R&D we would have done separately. We’ve effectively doubled the size of our engineering team and eliminated 2 net years of work allowing us to get better products to market more quickly and focus on innovating versus duplicating functionality.
Q7 Technically speaking, do you plan to “merge” the two products into one? If you do not “merge” the two products, what else do you do then?
James Phillips:: Yes, Elastic Couchbase is a new product we will introduce this Summer; it will combine technologies from Membase and CouchOne.
Q8. What happens to existing customers who are using Membase and CouchOne respective products?
James Phillips:: We will continue to support customers using existing Membase and CouchOne products, while providing a seamless upgrade path for customers who want to migrate to Elastic Couchbase. Couch customers who migrate get a higher-performance, elastic version of CouchDB. Membase customers who migrate gain the ability to index and query data stored in the cluster.
Q9. How do you see the NoSQL market evolving in the next 12 months?
James Phillips: It is possible we will see standards begin to emerge, both in on-the-wire protocols and perhaps in query languages, allowing interoperability between NoSQL database technologies similar to the kind of interoperability we’ve seen with SQL and relational database technology. It would not be surprising to see additional consolidation as well.
James Phillips, Co-Founder and Senior Vice President, Couchbase.
A twenty-five year veteran of the software industry, James Phillips started his career writing software for the Apple II and TRS-80 microcomputer platforms. In 1984, at age 17, he co-founded his first software company, Fifth Generation Systems, which was acquired by Symantec in 1993 forming the foundation of Symantec’s PC backup software business.
Most recently, James was co-founder and CEO of Akimbi Systems, a venture-backed software company acquired by VMware in 2006. Book-ended by these entrepreneurial successes, James has held executive leadership roles in software engineering, product management, marketing and corporate development at large public companies including Intel, Synopsys and Intuit and with venture-backed software startups including Central Point Software (acquired by Symantec), Ensim and Actional Corporation (acquired by Progress Software). Additionally, James spent two years as a technology investment banker with PaineWebber and Robertson Stephens and Co., delivering M&A advisory services to software companies.
James holds a BS in Mathematics and earned his MBA, with honors, from the University of Chicago.
He currently serves on the board of directors of Teneros and as an investor in and advisor to a number of privately-held software companies including Delphix, Replay Solutions and Virsto.
For further Reading
NoSQL Database Technology.
White Paper, Couchbase.
This white paper introduces the basic concepts of NoSQL Database technology and compares them with RDBMS.
Paper | Introductory | English | Link DOWNLOAD (PDF)| March 2011|
– “Marrying objects with graphs”: Interview with Darren Wood.
– “Distributed joins are hard to scale”: Interview with Dwight Merriman.
– On Graph Databases: Interview with Daniel Kirstenpfad.
– Interview with Rick Cattell: There is no “one size fits all” solution.