{"id":4218,"date":"2016-08-31T03:33:42","date_gmt":"2016-08-31T03:33:42","guid":{"rendered":"http:\/\/www.odbms.org\/blog\/?p=4218"},"modified":"2016-08-31T03:33:42","modified_gmt":"2016-08-31T03:33:42","slug":"database-challenges-and-innovations-interview-with-jim-starkey","status":"publish","type":"post","link":"https:\/\/www.odbms.org\/blog\/2016\/08\/database-challenges-and-innovations-interview-with-jim-starkey\/","title":{"rendered":"Database Challenges and Innovations.  Interview with Jim Starkey"},"content":{"rendered":"<blockquote><p><strong>&#8220;Isn\u2019t it ironic that in 2016 a non-skilled user can find a web page from Google\u2019s untold petabytes of data in millisecond time, but a highly trained SQL expert can\u2019t do the same thing in a relational database one billionth the size?.&#8211;Jim Starkey.<\/strong><\/p><\/blockquote>\n<p>I have interviewed <strong>Jim Starkey<\/strong>. A database legend<i>,\u00a0<\/i>Jim\u2019s career as an entrepreneur, architect, and innovator spans more than three decades of database history.<\/p>\n<p>RVZ<\/p>\n<p><strong>Q1. In your opinion, what are the most significant advances in databases in the last few years?<\/strong><\/p>\n<p><strong>Jim Starkey: <\/strong>I\u2019d have to say the \u201catom programming model\u201d where a database is layered on a substrate of peer-to-peer replicating distributed objects rather than disk files. The atom programming model enables scalability, redundancy, high availability, and distribution not available in traditional, disk-based database architectures.<\/p>\n<p><strong>Q2. What was your original motivation to invent the NuoDB Emergent Architecture?<\/strong><\/p>\n<p><strong>Jim Starkey: <\/strong>It all grew out of a long Sunday morning shower. I knew that the performance limits of single-computer database systems were in sight, so distributing the load was the only possible solution, but existing distributed systems required that a new node copy a complete database or partition before it could do useful work. I started thinking of ways to attack this problem and came up with the idea of <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Peer-to-peer');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Peer-to-peer\" target=\"_blank\">peer to peer<\/a> replicating <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Distributed_object');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Distributed_object\" target=\"_blank\">distributed objects<\/a> that could be serialized for network delivery and persisted to disk. It was a pretty neat idea. I came out much later with the core architecture nearly complete and very wrinkled (we have an awesome domestic hot water system).<\/p>\n<p><strong>Q3. In your career as an entrepreneur and architect what was the most significant innovation you did?<\/strong><\/p>\n<p><strong>Jim Starkey: <\/strong>Oh, clearly <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Multiversion_concurrency_control');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Multiversion_concurrency_control\" target=\"_blank\">multi-generational concurrency control<\/a> (MVCC). The problem I was trying to solve was allowing ad hoc access to a production database for a 4GL product I was working on at the time, but the ramifications go far beyond that. MVCC is the core technology that makes true <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Distributed_database');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Distributed_database\" target=\"_blank\">distributed database systems<\/a> possible. <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Serializability');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Serializability\" target=\"_blank\">Transaction serialization<\/a> is like Newtonian physics \u2013 all observers share a single universal reference frame. MVCC is like special relativity, where each observer views the universe from his or her reference frame. The views appear different but are, in fact, consistent.<\/p>\n<p><strong>Q4. Proprietary vs. open source software: what are the pros and cons?<\/strong><\/p>\n<p><strong>Jim Starkey: <\/strong>It\u2019s complicated. I\u2019ve had feet in both camps for 15 years. But let\u2019s draw a distinction between open source and open development. Open development \u2013 where anyone can contribute &#8211; is pretty good at delivering implementations of established technologies, but it\u2019s very difficult to push the state of the art in that environment. Innovation, in my experience, requires focus, vision, and consistency that are hard to maintain in open development. If you have a controlled development environment, the question of open source versus propriety is tactics, not philosophy. Yes, there\u2019s an argument that having the source available gives users guarantees they don\u2019t get from proprietary software, but with something as complicated as a database, most users aren\u2019t going to try to master the sources. But having source available lowers the perceived risk of new technologies, which is a big plus.<\/p>\n<p><strong>Q5. You led the Falcon project &#8211; a transactional storage engine for the MySQL server- through the acquisition of MySQL by Sun Microsystems. What impact did it have this project in the database space?<\/strong><\/p>\n<p><strong>Jim Starkey: <\/strong>In all honesty, I\u2019d have to say that <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Falcon_(storage_engine)');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Falcon_(storage_engine)\" target=\"_blank\">Falcon<\/a>\u2019s most important contribution was its competition with <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/InnoDB');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/InnoDB\" target=\"_blank\">InnoDB<\/a>. In the end, that competition made InnoDB three times faster. Falcon, multi-version in memory using the disk for backfill, was interesting, but no matter how we cut it, it was limited by the performance of the machine it ran on. It was fast, but no single node database can be fast enough.<\/p>\n<p><strong>Q6. What are the most challenging issues in databases right now?<\/strong><\/p>\n<p><strong>Jim Starkey: <\/strong>I think it\u2019s time to step back and reexamine the assumptions that have accreted around database technology \u2013 data model, API, access language, data semantics, and implementation architectures. The \u201c<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Relational_model');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Relational_model\" target=\"_blank\">relational model<\/a>\u201d, for example, is based on what <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Edgar_F._Codd');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Edgar_F._Codd\" target=\"_blank\">Codd<\/a> called relations and we call tables, but otherwise have nothing to do with his mathematic model. That model, based on set theory, requires automatic duplicate elimination. To the best of my knowledge, nobody ever implemented Codd\u2019s model, but we still have tables which bear a scary resemblance to decks of punch cards. Are they necessary? Or do they just get in the way?<br \/>\nIsn\u2019t it ironic that in 2016 a non-skilled user can find a web page from Google\u2019s untold petabytes of data in millisecond time, but a highly trained SQL expert can\u2019t do the same thing in a relational database one billionth the size?. SQL has no provision for flexible text search, no provision for multi-column, multi-table search, and no mechanics in the APIs to handle the results if it could do them. And this is just one a dozen problems that SQL databases can\u2019t handle. It was a really good technical fit for computers, memory, and disks of the 1980\u2019s, but is it right answer now?<\/p>\n<p><strong>Q7. How do you see the database market evolving?<\/strong><\/p>\n<p><strong>Jim Starkey: <\/strong>I\u2019m afraid my crystal ball isn\u2019t that good. <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Binary_large_object');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Binary_large_object\" target=\"_blank\">Blobs<\/a>, another of my creations, spread throughout the industry in two years. MVCC took 25 years to become ubiquitous. I have a good idea of where I think it should go, but little expectation of how or when it will.<\/p>\n<p><strong>Qx. Anything else you wish to add?<\/strong><\/p>\n<p><strong>Jim Starkey: <\/strong>Let me say a few things about my current project, AmorphousDB, an implementation of the Amorphous Data Model (meaning, no data model at all). AmorphousDB is my modest effort to question everything database.<br \/>\nThe best way to think about Amorphous is to envision a relational database and mentally erase the boxes around the tables so all records free float in the same space \u2013 including data and metadata. Then, if you\u2019re uncomfortable, add back a \u201crecord type\u201d attribute and associated syntactic sugar, so table-type semantics are available, but optional. Then abandon punch card data semantics and view all data as abstract and subject to search. Eliminate the fourteen different types of numbers and strings, leaving simply numbers and strings, but add useful types like URL\u2019s, email addresses, and money. Index everything unless told not to. Finally, imagine an API that fits on a single sheet of paper (OK, 9 point font, both sides) and an implementation that can span hundreds of nodes. That\u2019s AmorphousDB.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;<br \/>\n<strong>Jim Starkey <\/strong><em>invented the NuoDB Emergent Architecture, and developed the initial implementation of the product. He founded <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.nuodb.com');\"  href=\"http:\/\/www.nuodb.com\" target=\"_blank\">NuoDB<\/a> [formerly NimbusDB] in 2008, and retired at the end of 2012, shortly before the NuoDB product launch.<\/em><\/p>\n<p><em>Jim\u2019s career as an entrepreneur, architect, and innovator spans more than three decades of database history from the Datacomputer project on the fledgling ARPAnet to his most recent startup, NuoDB, Inc. Through the period, he has been<\/em><br \/>\n<em> responsible for many database innovations from the date data type to the BLOB to multi-version concurrency control (MVCC). Starkey has extensive experience in proprietary and open source software.<\/em><\/p>\n<p><em>Starkey joined Digital Equipment Corporation in 1975, where he created the Datatrieve family of products, the DEC Standard Relational Interface architecture, and the first of the Rdb products, Rdb\/ELN. Starkey was also software architect for DEC\u2019s database machine group.<\/em><\/p>\n<p><em>Leaving DEC in 1984, Starkey founded Interbase Software to develop relational database software for the engineering workstation market. Interbase was a technical leader in the database industry producing the first commercial implementations of heterogeneous networking, blobs, triggers, two phase commit, database events, etc. Ashton-Tate acquired Interbase Software in 1991, and was, in turn, acquired by Borland International a few months later. The Interbase database engine was released open source by Borland in 2000 and became the basis for the Firebird open source database project.<\/em><\/p>\n<p><em>In 2000, Starkey founded Netfrastructure, Inc., to build a unified platform for distributable, high quality Web applications. The Netfrastructure platform included a relational database engine, an integrated search engine, an integrated Java virtual machine, and a high performance page generator.<\/em><\/p>\n<p><em>MySQL, AB, acquired Netfrastructure, Inc. in 2006 to be the kernel of a wholly owned transactional storage engine for the MySQL server, later known as Falcon. Starkey led the Falcon project through the acquisition of MySQL by Sun Microsystems.<\/em><\/p>\n<p><em>Jim has a degree in Mathematics from the University of Wisconsin.<\/em><br \/>\n<em> For amusement, Jim codes on weekends, while sailing, but not while flying his plane.<\/em><\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p><strong>Resources<\/strong><\/p>\n<p>&#8211; <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/go.nuodb.com\/rs\/nuodb\/images\/Greenbook_Final.pdf');\"  href=\"http:\/\/go.nuodb.com\/rs\/nuodb\/images\/Greenbook_Final.pdf\" target=\"_blank\">NuoDB Emergent Architecture (.PDF)<\/a><\/p>\n<p>&#8211; <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/2015\/03\/interview-seth-proctor\/');\"  href=\"http:\/\/www.odbms.org\/blog\/2015\/03\/interview-seth-proctor\/\" target=\"_blank\">On Database Resilience. Interview with Seth Proctor, ODBMs Industry Watch,\u00a0March 17, 2015<\/a><\/p>\n<p><strong>Related Posts<\/strong><\/p>\n<p>&#8211;\u00a0<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/2015\/10\/challenges-and-opportunities-of-the-internet-of-things-interview-with-steve-cellini\/');\"  href=\"http:\/\/www.odbms.org\/blog\/2015\/10\/challenges-and-opportunities-of-the-internet-of-things-interview-with-steve-cellini\/\" target=\"_blank\">Challenges and Opportunities of The Internet of Things. Interview with Steve Cellini, ODBMS Industry Watch,\u00a0October 7, 2015<\/a><\/p>\n<p>&#8211;\u00a0<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/2016\/05\/hands-on-with-nuodb-and-docker\/');\"  href=\"http:\/\/www.odbms.org\/2016\/05\/hands-on-with-nuodb-and-docker\/\" target=\"_blank\">Hands-On with NuoDB and Docker, BY\u00a0MJ Michaels, NuoDB. ODBMS.org&#8211;\u00a0OCT 27 2015<\/a><\/p>\n<p>&#8211;\u00a0<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/2016\/01\/how-leading-operational-dbmss-rank-popularity-wise\/');\"  href=\"http:\/\/www.odbms.org\/2016\/01\/how-leading-operational-dbmss-rank-popularity-wise\/\" target=\"_blank\">How leading Operational DBMSs rank popularity wise?\u00a0By\u00a0Michael Waclawiczek&#8211; ODBMS.org\u00a0\u00b7 JANUARY 27, 2016<\/a><\/p>\n<p>&#8211;\u00a0<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/2015\/12\/a-glimpse-into-u-sql\/');\"  href=\"http:\/\/www.odbms.org\/2015\/12\/a-glimpse-into-u-sql\/\" target=\"_blank\">A Glimpse into U-SQL\u00a0BY\u00a0Stephen Dillon,\u00a0<em>Schneider Electric, <\/em>ODBMS.org-DECEMBER 7, 2015<\/a><\/p>\n<p>&#8211;\u00a0<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/2015\/10\/gartner-magic-quadrant-for-operational-dbms\/');\"  href=\"http:\/\/www.odbms.org\/2015\/10\/gartner-magic-quadrant-for-operational-dbms\/\" target=\"_blank\">Gartner Magic Quadrant for Operational DBMS 2015<\/a><\/p>\n<p><strong>Follow us on Twitter: <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/twitter.com\/odbmsorg');\"  href=\"https:\/\/twitter.com\/odbmsorg\" target=\"_blank\">@odbmsorg<\/a><\/strong><\/p>\n<p>##<\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>&#8220;Isn\u2019t it ironic that in 2016 a non-skilled user can find a web page from Google\u2019s untold petabytes of data in millisecond time, but a highly trained SQL expert can\u2019t do the same thing in a relational database one billionth the size?.&#8211;Jim Starkey. I have interviewed Jim Starkey. A database legend,\u00a0Jim\u2019s career as an entrepreneur, [&hellip;]<!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[998,66,997,995,224,310,395,406,412,413,415,446,490,996,549],"_links":{"self":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/4218"}],"collection":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/comments?post=4218"}],"version-history":[{"count":3,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/4218\/revisions"}],"predecessor-version":[{"id":4228,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/4218\/revisions\/4228"}],"wp:attachment":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/media?parent=4218"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/categories?post=4218"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/tags?post=4218"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}