On PostgreSQL. Interview with Tom Kincaid.
“Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality. Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a
product with all of those properties”–Tom Kincaid.
What is new with PostgreSQL? I have Interviewed Tom Kincaid, head of Products and Engineering at EnterpriseDB.
(Tom prepared the following responses with contributions from the EnterpriseDB development team)
Q1. EnterpriseDB products are based upon PostgreSQL. What is special about your product offering?
Tom Kincaid: EnterpriseDB has integrated many enterprise features and performance enhancements into the core PostgreSQL code to create a database with the lowest possible TCO and provide the “last mile” of service needed by enterprise database users.
EnterpriseDB´s Postgres Plus software provides the performance, security and Oracle compatibility needed to address a range of enterprise business applications. EnterpriseDB´s Oracle compatibility, also integrated into the PostgreSQL code base, allows many Oracle shops to realize a much lower database TCO while utilizing their Oracle skills and applications designed to work against Oracle databases.
EnterpriseDB also creates enterprise-grade tools around PostgreSQL and Postgres Plus Advanced Server for use in large-scale deployments. They are Postgres Enterprise Manager, a powerful management console for managing, monitoring and tuning databases en masse whether they´re PostgreSQL community version or EnterpriseDB´s enhanced Postgres Plus Advanced Server; xDB Replication Server with multi-master replication and replication between Postgres, Oracle and SQL Server databases; and SQL/Protect for guarding against SQL Injection attacks.
Tom Kincaid: There are several areas of difference. PostgreSQL has traditionally had a stronger focus on data integrity and compliance with the SQL standard.
MySQL has traditionally been focused on raw performance for simple queries, and a typical benchmark is the number of read queries per second that the database engine can carry out, while PostgreSQL tends to focus more on having a sophisticated query optimizer that can efficiently handle more complex queries, sometimes at the expense of speed on simpler queries. And, for a long time, MySQL had a big lead over PostgreSQL in the area of replication technologies, which discouraged many users from choosing PostgreSQL.
Over time, these differences have diminished. PostgreSQL´s replication options have expanded dramatically in the last three releases, and its performance on simple queries has greatly improved in the most recent release (9.2). On the other hand, MySQL and MariaDB have both done significant recent work on their query optimizers. So each product is learning from the strengths of the other.
Of course, there´s one other big difference, which is that PostgreSQL is an independent open source project that is not, and cannot be, controlled by any single company, while MySQL is now owned and controlled by Oracle.
MariaDB is primarily developed by the Monty Program and shows signs of growing community support, but it does not yet have the kind of independent community that PostgreSQL has long enjoyed.
Q3. Tomas Ulin mentioned in an interview that “with MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements”. What is your take on this?
Tom Kincaid: I think anyone who is developing an RDBMS today has to be aware that there are some users who are looking for the features of a key-value store or document database.
On the other hand, many NoSQL vendors are looking to add the sorts of features that have traditionally been associated with an enterprise-grade RDBMS. So I think that theme of convergence is going to come up over and over again in different contexts.
That´s why, for example, PostgreSQL added a native JSON datatype as part of the 9.2 release, which is being further enhanced for the forthcoming 9.3 release.
Will we see a RESTful or memcached-like interface to PostgreSQL in the future? Perhaps.
Right now our customers are much more focused on improving and expanding the traditional RDBMS functionality, so that´s where our focus is as well.
Q4. How would you compare your product offering with respect to NoSQL data stores, such as CouchDB, MongoDB, Cassandra and Riak, and NewSQL such as NuoDB and VoltDB?
Tom Kincaid: It is a matter of the right tools for the right problem. Many of our customers use our products together with the NoSQL solutions you mention. If you need ACID transaction properties for your data, with savepoints and rollback capabilities, along with the ability to access data in a standardized way and a large third party tool set for doing it, a time tested relational database is the answer.
The SQL standard provides the benefit of always being able to switch products and having a host of tools for reporting and administration. PostgreSQL, like Linux, provides the benefit of being able to switch service partners.
If your use case does not mandate the benefits mentioned above and you have data sets in the Petabyte range and require the ability to ingest Terabytes of data every 3-4 hours, a NoSQL solution is likely the right answer. As I said earlier many of our customers use our database products together with NoSQL solutions quite successfully. We expect to be working with many of the NoSQL vendors in the coming year to offer a more integrated solution to our joint customers.
Since it is still pretty new, I haven´t had a chance to evaluate NuoDB so I can´t comment on how it compares with PostgreSQL or Postgres Plus Advanced Server.
As far as VoltDB is concerned there is a blog by Dave Page, our Chief Architect for tools and installers, that describes the differences between PostgreSQL and VoltDB. It can be found here.
There is also some terrific insight, on this topic, in an article by my colleague Bruce Momjian, who is one of the most active contributors to PostgreSQL, that can be found here.
Q5. Justin Sheehy of Basho in an interview said “I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense”. What is your opinion on this?
Tom Kincaid: It´s overly simplistic. There is certainly room for asynchronous multi-master replication in applications such as banking, but it has to be done very, very carefully to avoid losing track of the money.
It´s not clear that the NoSQL products which provide eventual consistency today make the right trade-offs or provide enough control for serious enterprise applications – or that the products overall are sufficiently stable. Relational databases remain the most mature, time-tested, and stable solution for storing enterprise data.
NoSQL may be appealing for Internet-focused applications that must accommodate truly staggering volumes of requests, but we anticipate that the RDBMS will remain the technology of choice for most of the mission-critical applications it has served so well over the last 40 years.
Q6. What are the suggested criteria for users when they need to choose between durability for lower latency, higher throughput and write availability?
Tom Kincaid: Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality.
Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a product with all of those properties.
If you have an application that has a large write throughput and you assume that you can store all of that data using a single database server, which has to scale vertically to meet the load, you´re going to be unhappy eventually. With a traditional RDBMS, you´re going to be unhappy when you can´t scale far enough vertically. With a distributed key-value store, you can avoid that problem, but then you have all the challenges of maintaining a distributed system, which can sometimes involve correlated failures, and it may also turn out that your application makes assumptions about data consistency that are difficult to guarantee in a distributed environment.
By making your assumptions explicit at the beginning of the project, you can consider alternative designs that might meet your needs better, such as incorporating mechanisms for dealing with data consistency issues or even application-level shading into the application itself.
Q7. How do you handle Large Objects Support?
Tom Kincaid: PostgreSQL supports storing objects up to 1GB in size in an ordinary database column.
For larger objects, there1s a separate large object API. In current releases, those objects are limited to just 2GB, but the next release of PostgreSQL (9.3) will increase that limit to 4TB. We don´t necessarily recommend storing objects that large in the database, though
in many cases, it´s more efficient to store enormous objects on a file server rather than as database objects. But the capabilities are there for those who need them.
Q8. Do you use Data Analytics at EnterpriseDB and for what?
Tom Kincaid: Most companies today use some form of data analytics to understand their customers and their marketplace and we1re no exception. However, how we use data is rapidly changing given our rapid growth and deepening
penetration into key markets.
Q9. Do you have customers who have Big Data problem? Could you please give us some examples of Big Data Use Cases?
Tom Kincaid: We have found that most customers with big data problems are using specialized appliances and in fact we partnered with Netezza to assist in creating such an appliance – The Netezza TwinFin Data Warehousing appliance.
Q10. How do you handle the Big Data Analytics “process” challenges with deriving insight?
Tom Kincaid: EnterpriseDB does not specialize in solutions for the Big Data market and will refer prospects to specialists like Netezza.
Q11. Do you handle un-structured data? If yes, how?
Tom Kincaid: PostgreSQL has an integrated full-text search capability that can be used for document processing, and there are also XML and JSON data types that can be used for data of those types. We also have a PostgreSQL-specific data type called hstore that can be used to store groups of key-value pairs.
Q12. Do you use Hadoop? If yes, what is your experience with Hadoop so far?
Tom Kincaid: We developed, and released in late 2011, our Postgres Plus Connector for Hadoop, which allows massive amounts of data from a Postgres Plus Advanced Server (PPAS) or PostgreSQL database to be accessed, processed and analyzed in a Hadoop cluster. The Postgres Plus Connector for Hadoop allows programmers to process large amounts of SQL-based data using their familiar MapReduce constructs. Hadoop combined with PPAS or PostgreSQL enables users to perform real time queries with Postgres and non-real time CPU intensive analysis and with our connector, users can load SQL data to Hadoop, process it and even push the results back to Postgres.
Q13 Cloud computing and open source: How does it relate to PostgreSQL?
Tom Kincaid: In 2012, EnterpriseDB released its Postgres Plus Cloud Database. We´re seeing a wide-scale migration to cloud computing across the enterprise. With that growth has come greater clarity in what developers need in a cloudified database. The solutions are expected to deliver lower costs and management ease with even greater functionality because they are taking advantage of the cloud.
Tom Kincaid.As head of Products and Engineering, Tom leads the company’s product development and directs the company’s world-class software engineers. Tom has nearly 25 years of experience in the Enterprise Software Industry.
Prior to EnterpriseDB, he was VP of software development for Oracle’s GlassFish and Web Tier products.
He integrated Sun’s Application Server Product line into Oracle’s Fusion middleware offerings. At Sun Microsystems, he was part of the original Java EE architecture and management teams and played a critical role in defining and delivering the Java Platform.
Tom is a veteran of the Object Database industry and helped build Object Design’s customer service department holding management and senior technical contributor roles. Other positions in Tom’s past include Director of Quality Engineering at Red Hat and Director of Software Engineering at Unica.
ODBMS.org: Relational Databases, NewSQL, XML Databases, RDF Data Stores
Blog Posts |Free Software | Articles and Presentations| Lecture Notes | Tutorials| Journals |
Follow ODBMS.org on Twitter: @odbmsorg