” There aren’t enough open source contributors to keep projects competitive in features and performance, and the companies supporting the open source offerings will have trouble making enough money to keep the products competitive themselves. Likewise, companies with closed source will have trouble finding customers willing to risk a closed source (or limited open source) solution. It will be interesting to see what happens. But I don’t see NoSQL going away, there is a well-established following.” –Rick Cattell.
I have asked Rick Cattell, one of the leading independent consultants in database systems, a few questions on NoSQL.
Q1. For years, you have been studying the NoSQL area and writing articles about scalable databases. What is new in the last year, in your view? What is changing?
Rick Cattell: It seems like there’s a new NoSQL player every month or two, now!
It’s hard to keep track of them all. However, a few players have become much more popular than the others.
Q2. Which players are those?
Rick Cattell: Among the open source players, I hear the most about MongoDB, Cassandra, and Riak now, and often HBase and Redis. However, don’t forget that the proprietary players like Amazon, Oracle, and Google have NoSQL systems as well.
Q3. How do you define “NoSQL”?
Rick Cattell: I use the term to mean systems that provide a simple operations like key/value storage or simple records and indexes, and that focus on horizontal scalability for those simple operations. Some people categorize horizontally scaled graph databases and object databases to be “NoSQL” as well. However, those systems have very different characteristics. Graph databases and object databases have to efficiently break connections up over distributed servers, and have to provide operations that somehow span servers as you traverse the graph. Distributed graph/object databases have been around for a while, but efficient distribution is a hard problem. The NoSQL databases simply distribute (or shard) each data type based on a primary key; that’s easier to do efficiently.
Q4. What other categories of systems do you see?
Rick Cattell: Well, there are systems that focus on horizontal scaling for full SQL with joins, which are generally called “NewSQL“, and systems optimized for “Big Data” analytics, typically based on Hadoop map/reduce. And of course, you can also sub-categorize the NoSQL systems based on their data model and distribution model.
Q5. What subcategories would those be?
Rick Cattell: On data model, I separate them into document databases like MongoDB and CouchBase, simple key/value stores like Riak and Redis, and grouped-column stores like HBase and Cassandra. However, a categorization by data model is deceptive, because they also differ quite a bit in their performance and concurrency guarantees.
Q6: Which systems perform best?
Rick Cattell: That’s hard to answer. Performance is not a scale from “good” to “bad”… the different systems have better performance for different kinds of applications. MongoDB performs incredibly well if all your data fits in distributed memory, for example, and Cassandra does a pretty good job of using disk, because of its scheme of writing new data to the end of disk files and consolidating later.
Q7: What about their concurrency guarantees?
Rick Cattell: They are all over the place on concurrency. The simplest provide no guarantees, only “eventual consistency“. You don’t know which version of data you’ll get with Cassandra. MongoDB can keep a “primary” replica consistent if you can live with their rudimentary locking mechanism.
Some of the new systems try to provide full ACID transactions. FoundationDB and Oracle NoSQL claim to do that, but I haven’t yet verified that. I have studied Google’s Spanner paper, and they do provide true ACID consistency in a distributed world, for most practical purposes. Many people think the CAP theorem makes that impossible, but I believe their interpretation of the theorem is too narrow: most real applications can have their cake and eat it too, given the right distribution model. By the way, graph/object databases also provide ACID consistency, as does VoltDB, but as I mentioned I consider them a different category.
Q8: I notice you have an unpublished paper on your website, called 2x2x2 Requirements for Scalability. Can you explain what the 2x2x2 means?
Rick Cattell: Well, the first 2x means that there are two different kinds of scalability: horizontal scaling over multiple servers, and vertical scaling for performance on a single server. The remaining 2×2 means that there are two key features needed to achieve the horizontal and vertical scaling, and for each of those, there are two additional things you have to do to make the features practical.
Q9: What are those key features?
Rick Cattell: For horizontal scaling, you need to partition and replicate your data. But you also need automatic failure recovery and database evolution with no downtime, because when your database runs on 200 nodes, you can’t afford to take the database offline and you can’t afford operator intervention on every failure.
To achieve vertical scaling, you need to take advantage of RAM and you need to avoid random disk I/O. You also need to minimize the overhead for locking and latching, and you need to minimize network calls between servers. There are various ways to do that. The best systems have all eight of these key features. These eight features represent my summary of scalable databases in a nutshell.
Q10: What do you see happening with NoSQL, going forward?
Rick Cattell: Good question. I see a lot of consolidation happening… there are too many players! There aren’t enough open source contributors to keep projects competitive in features and performance, and the companies supporting the open source offerings will have trouble making enough money to keep the products competitive themselves. Likewise, companies with closed source will have trouble finding customers willing to risk a closed source (or limited open source) solution.
It will be interesting to see what happens. But I don’t see NoSQL going away, there is a well-established following.
R. G. G. “Rick” Cattell is an independent consultant in database systems.
He previously worked as a Distinguished Engineer at Sun Microsystems, most recently on open source database systems and distributed database scaling. Dr. Cattell served for 20+ years at Sun Microsystems in management and senior technical roles, and for 10 years in research at Xerox PARC and at Carnegie-Mellon University. Dr. Cattell is best known for his contributions in database systems and middleware, including database scalability, enterprise Java, object/relational mapping, object-oriented databases, and database interfaces. He is the author of several dozen papers and five books, and a co-inventor of six U.S. patents.
At Sun he instigated the Enterprise Java, Java DB, and Java Blend projects, and was a contributor to a number of Java APIs and products. He previously developed the Cedar DBMS at Xerox PARC, the Sun Simplify database GUI, and SunSoft’s CORBA-database integration.
He is a co-founder of SQL Access (a predecessor to ODBC), the founder and chair of the Object Data Management Group (ODMG), the co-creator of JDBC, the author of the world’s first monograph on object/relational and object databases, a recipient of the ACM Outstanding PhD Dissertation Award, and an ACM Fellow.
- ODBMS.org Free Downloads and Links
In this section you can download free resources covering the following topics:
Big Data and Analytical Data Platforms
Cloud Data Stores
NoSQL Data Stores
Graphs and Data Stores
Entity Framework (EF) Resources
Object-Relational Impedance Mismatch
NewSQL, XML, RDF Data Stores, RDBMS
Follow ODBMS.org on Twitter: @odbmsorg
” One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on”–Jon Brumfitt.
I first did an interview with Dr. Jon Brumfitt, System Architect & System Engineer of Herschel Scientific Ground Segment, at the European Space Agency in March 2011. You can read that interview here.
Two years later, I wanted to know the status of the project. This is a follow up interview.
Q1. What is the status of the mission?
Jon Brumfitt: The operational phase of the Herschel mission came to an end on 29th April 2013, when the super-fluid helium used to cool the instruments was finally exhausted. By operating in the far infra-red, Herschel has been able to see cold objects that are invisible to normal telescopes.
However, this requires that the detectors are cooled to an even lower temperature. The helium cools the instruments down to 1.7K (about -271 Celsius). Individual detectors are then cooled down further to about 0.3K. This is very close to absolute zero, which is the coldest possible temperature. The exhaustion of the helium marks the end of new observations, but it is by no means the end of the mission.
We still have a lot of work to do in getting the best results from the data processing to give astronomers a final legacy archive of high-quality data to work with for years to come.
The spacecraft has been in orbit around a point known as the second Lagrangian point “L2″, which is about 1.5 million kilometres from Earth (around four times as far away as the Moon). This location provided a good thermal environment and a relatively unrestricted view of the sky. The spacecraft cannot be left in this orbit because regular correction manoeuvres would be needed. Consequently, it is being transferred into a “parking” orbit around the Sun.
Q2. What are the main results obtained so far by using the “Herschel” telescope?
Jon Brumfitt: That is a difficult one to answer in a few sentences. Just to take a few examples, Herschel has given us new insights into the way that stars form and the history of star formation and galaxy evolution since the big-bang.
It has discovered large quantities of cold water vapour in the dusty disk surrounding a young star, which suggests the possibility of other water covered planets. It has also given us new evidence for the origins of water on Earth.
The following are some links giving more detailed highlights from the mission:
With its 3.5 metre diameter mirror, Herschel is the largest space telescope ever launched. The large mirror not only gives it a high sensitivity but also allows us to observe the sky with a high spatial resolution. So in a sense every observation we make is showing us something we have never seen before. We have performed around 35,000 science observations, which have already resulted in over 600 papers being published in scientific journals. There are many years of work ahead for astronomers in interpreting the results, which will undoubtedly lead to many new discoveries.
Q3. How much data did you receive and process so far? Could you give us some up to date information?
Jon Brumfitt: We have about 3 TB of data in the Versant database, most of which is raw data from the spacecraft. The data received each day is processed by our data processing pipeline and the resulting data products, such as images and spectra, are placed in an archive for access by astronomers.
Each time we make a major new release of the software (roughly every six months at this stage), with improvements to the data processing, we reprocess everything.
The data processing runs on a grid with around 35 nodes, each with typically 8 cores and between 16 and 256 GB of memory. This is able to process around 40 days worth of data per day, so it is possible to reprocess everything in a few weeks. The data in the archive is stored as FITS files (a standard format for astronomical data).
The archive uses a relational (PostgreSQL) database to catalogue the data and allow queries to find relevant data. This relational database is only about 60 GB, whereas the product files account for about 60 TB.
This may reduce somewhat for the final archive, once we have cleaned it up by removing the results of earlier processing runs.
Q4. What are the main technical challenges in the data management part of this mission and how did you solve them?
Jon Brumfitt: One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on.
The lifetime of Herschel will have been 18 years from the start of software development to the end of the post-operations phase.
We designed a single system to meet the needs of all mission phases, from early instrument development, through routine in-flight operations to the end of the post-operations phase. Although the spacecraft was not launched until 2009, the database was in regular use from 2002 for developing and testing the instruments in the laboratory. By using the same software to control the instruments in the laboratory as we used to control them in flight, we ended up with a very robust and well-tested system. We call this approach “smooth transition”.
The development approach we adopted is probably best classified as an Agile iterative and incremental one. Object orientation helps a lot because changes in the problem domain, resulting from changing requirements, tend to result in localised changes in the data model.
Other important factors in managing change are separation of concerns and minimization of dependencies, for example using component-based architectures.
When we decided to use an object database, it was a new technology and it would have been unwise to rely on any database vendor or product surviving for such a long time. Although work was under way on the ODMG and JDO standards, these were quite immature and the only suitable object databases used proprietary interfaces.
We therefore chose to implement our own abstraction layer around the database. This was similar in concept to JDO, with a factory providing a pluggable implementation of a persistence manager. This abstraction provided a route to change to a different object database, or even a relational database with an object-relational mapping layer, should it have proved necessary.
One aspect that is difficult to abstract is the use of queries, because query languages differ. In principle, an object database could be used without any queries, by navigating to everything from a global root object. However, in practice navigation and queries both have their role. For example, to find all the observation requests that have not yet been scheduled, it is much faster to perform a query than to iterate by navigation to find them. However, once an observation request is in memory it is much easier and faster to navigate to all the associated objects needed to process it. We have used a variety of techniques for encapsulating queries. One is to implement them as methods of an extent class that acts as a query factory.
Another challenge was designing a robust data model that would serve all phases of the mission from instrument development in the laboratory, through pre-flight tests and routine operations to the end of post-operations. We approached this by starting with a model of the problem domain and then analysing use-cases to see what data needed to be persistent and where we needed associations. It was important to avoid the temptation to store too much just because transitive persistence made it so easy.
One criticism that is sometimes raised against object databases is that the associations tend to encode business logic in the object schema, whereas relational databases just store data in a neutral form that can outlive the software that created it; if you subsequently decide that you need a new use-case, such as report generation, the associations may not be there to support it. This is true to some extent, but consideration of use cases for the entire project lifetime helped a lot. It is of course possible to use queries to work-around missing associations.
Examples are sometimes given of how easy an object database is to use by directly persisting your business objects. This may be fine for a simple application with an embedded database, but for a complex system you still need to cleanly decouple your business logic from the data storage. This is true whether you are using a relational or an object database. With an object database, the persistent classes should only be responsible for persistence and referential integrity and so typically just have getter and setter methods.
We have encapsulated our persistent classes in a package called the Core Class Model (CCM) that has a factory to create instances. This complements the pluggable persistence manager. Hence, the application sees the persistence manager and CCM factories and interfaces, but the implementations are hidden.
Applications define their own business classes which can work like decorators for the persistent classes.
Q5. What is your experience in having two separate database systems for Herschel? A relational database for storing and managing processed data products and an object database for storing and managing proposal data, mission planning data, telecommands and raw (unprocessed) telemetry?
Jon Brumfitt: There are essentially two parts to the ground segment for a space observatory.
One is the “uplink” which is used for controlling the spacecraft and instruments. This includes submission of observing proposals, observation planning, scheduling, flight dynamics and commanding.
The other is the “downlink”, which involves ingesting and processing the data received from the spacecraft.
On some missions the data processing is carried out by a data centre, which is separate from spacecraft operations. In that case there is a very clear separation.
On Herschel, the original concept was to build a completely integrated system around an object database that would hold all uplink and downlink data, including processed data products. However, after further analysis it became clear that it was better to integrate our product archive with those from other missions. This also means that the Herschel data will remain available long after the project has finished. The role of the object database is essentially for operating the spacecraft and storing the raw data.
The Herschel archive is part of a common infrastructure shared by many of our ESA science projects. This provides a uniform way of accessing data from multiple missions.
The following is a nice example of how data from Herschel and our XMM-Newton X-ray telescope have been combined to make a multi-spectral image of the Andromeda Galaxy.
Our archive, in turn, forms part of a larger international archive known as the “Virtual Observatory” (VO), which includes both space and ground-based observatories from all over the world.
I think that using separate databases for operations and product archiving has worked well. In fact, it is more the norm rather than the exception. The two databases serve very different roles.
The uplink database manages the day-to-day operations of the spacecraft and is constantly being updated. The uplink data forms a complex object graph which is accessed by navigation, so an object database is well suited.
The product archive is essentially a write-once-read-many repository. The data is not modified, but new versions of products may be added as a result of reprocessing. There are a large number of clients accessing it via the Internet. The archive database is a catalogue containing the product meta-data, which can be queried to find the relevant product files. This is better suited to a relational database.
The motivation for the original idea of using a single object database for everything was that it allowed direct association between uplink and downlink data. For example, processed products could be associated with their observation requests. However, using separate databases does not prevent one database being queried with an observation identifier obtained from the other.
One complication is that processing an observation requires both downlink data and the associated uplink data.
We solved this by creating “uplink products” from the relevant uplink data and placing them in the archive. This has the advantage that external users, who do not have access to the Versant database, have everything they need to process the data themselves.
Q6. What are the main lessons learned so far in using Versant object database for managing telemetry data and information on steering and calibrating scientific on-board instruments?
Jon Brumfitt: Object databases can be very effective for certain kinds of application, but may have less benefit for others. A complex system typically has a mixture of application types, so the advantages are not always clear cut. Object databases can give a high performance for applications that need to navigate through a complex object graph, particularly if used with fairly long transactions where a significant part of the object graph remains in memory. Web (JavaEE) applications lose some of the benefit because they typically perform many short transactions with each one performing a query. They also use additional access layers that result in a system which loses the simplicity of the transparent persistence of an object database.
In our case, the object database was best suited for the uplink. It simplified the uplink development by avoiding object-relational mapping and the complexity of a design based on JDBC or EJB 2. Nowadays with JPA, relational databases are much easier to use for object persistence, so the rationale for using an object database is largely determined by whether the application can benefit from fast navigational access and how much effort is saved in mapping. There are now at least two object database vendors that support both JDO and JPA, so the distinction is becoming somewhat blurred.
For telemetry access we query the database instead of using navigation, as the packets don’t fit neatly into a single containment hierarchy. Queries allows packets to be accessed by many different criteria, such as time, instrument, type, source and so on.
Processing calibration observations does not introduce any special considerations as far as the database is concerned.
Q7. Did you have any scalability and or availability issues during the project? If yes, how did you solve them?
Jon Brumfitt: Scalability would have been an important issue if we had kept to the original concept of storing everything including products in a single database. However, using the object database for just uplink and telemetry meant that this was not a big issue.
The data processing grid retrieves the raw telemetry data from the object database server, which is a 16-core Linux machine with 64 GB of memory. The average load on the server is quite low, but occasionally there have been high peak loads from the grid that have saturated the server disk I/O and slowed down other users of the database. Interactive applications such as mission planning need a rapid response, whereas batch data processing is less critical. We solved this by implementing a mechanism to spread out the grid load by treating the database as a resource.
Once a year, we have made an “Announcement of Opportunity” for astronomers to propose observations that they would like to perform with Herschel. It is only human nature that many people leave it until the last minute and we get a very high peak load on the server in the last hour or two before the deadline! We have used a separate server for this purpose, rather than ingesting proposals directly into our operational database. This has avoided any risk of interfering with routine operations. After the deadline, we have copied the objects into the operational database.
Q8. What about the overall performance of the two databases? What are the lessons learned?
Jon Brumfitt: The databases are good at different things.
As mentioned before, an object database can give a high performance for applications involving a complex object graph which you navigate around. An example is our mission planning system. Object persistence makes application design very simple, although in a real system you still need to introduce layers to decouple the business logic from the persistence.
For the archive, on the other hand, a relational database is more appropriate. We are querying the archive to find data that matches a set of criteria. The data is stored in files rather than as objects in the database.
Q9. What are the next steps planned for the project and the main technical challenges ahead?
Jon Brumfitt: As I mentioned earlier, the coming post-operations phase will concentrate on further improving the data processing software to generate a top-quality legacy archive, and on provision of high-quality support documentation and continued interactive support for the community of astronomers that forms our “customer base”. The system was designed from the outset to support all phases of the mission, from early instrument development tests in the laboratory, though routine operations to the end of the post-operations phase of the mission. The main difference moving into post-operations is that we will stop uplink activities and ingesting new telemetry. We will continue to reprocess all the data regularly as improvements are made to the data processing software.
We are currently in the process of upgrading from Versant 7 to Versant 8.
We have been using Versant 7 since launch and the system has been running well, so there has been little urgency to upgrade.
However, with routine operations coming to an end, we are doing some “technology refresh”, including upgrading to Java 7 and Versant 8.
Q10. Anything else you wish to add?
Jon Brumfitt: These are just some personal thoughts on the way the database market has evolved over the lifetime of Herschel. Thirteen years ago, when we started development of our system, there were expectations that object databases would really take off in line with the growing use of object orientation, but this did not happen. Object databases still represent rather a niche market. It is a pity there is no open-source object-database equivalent of MySQL. This would have encouraged more people to try object databases.
JDO has developed into a mature standard over the years. One of its key features is that it is “architecture neutral”, but in fact there are very few implementations for relational databases. However, it seems to be finding a new role for some NoSQL databases, such as the Google AppEngine datastore.
NoSQL appears to be taking off far quicker than object databases did, although it is an umbrella term that covers quite a few kinds of datastore. Horizontal scaling is likely to be an important feature for many systems in the future. The relational model is still dominant, but there is a growing appreciation of alternatives. There is even talk of “Polyglot Persistence” using different kinds of databases within a system; in a sense we are doing this with our object database and relational archive.
More recently, JPA has created considerable interest in object persistence for relational databases and appears to be rapidly overtaking JDO.
This is partly because it is being adopted by developers of enterprise applications who previously used EJB 2.
If you look at the APIs of JDO and JPA they are actually quite similar apart from the locking modes. However, there is an enormous difference in the way they are typically used in practice. This is more to do with the fact that JPA is often used for enterprise applications. The distinction is getting blurred by some object database vendors who now support JPA with an object database. This could expand the market for object databases by attracting some traditional relational type applications.
So, I wonder what the next 13 years will bring! I am certainly watching developments with interest.
Dr Jon Brumfitt, System Architect & System Engineer of Herschel Scientific Ground Segment, European Space Agency.
Jon Brumfitt has a background in Electronics with Physics and Mathematics and has worked on several of ESA’s astrophysics missions, including IUE, Hipparcos, ISO, XMM and currently Herschel. After completing his PhD and a post-doctoral fellowship in image processing, Jon worked on data reduction for the IUE satellite before joining Logica Space and Defence in 1980. In 1984 he moved to Logica’s research centre in Cambridge and then in 1993 to ESTEC in the Netherlands to work on the scientific ground segments for ISO and XMM. In January 2000, he joined the newly formed Herschel team as science ground segment System Architect. As Herschel approached launch, he moved down to the European Space Astronomy Centre in Madrid to become part of the Herschel Science Operations Team, where he is currently System Engineer and System Architect.
You can follow ODBMS.org on Twitter : @odbmsorg
” Our customers have understood that their data is much safer in our secure and professionally-managed cloud environment than it was on the server in the basement of the hotel.”– Keith Gruen.
I heard of hetras, a German-Austrian company that created a cloud-based hotel management product for global hotel chains. I wanted to know more. I therefore interviewed Keith Gruen, co-founder and managing director of hetras GmbH.
Q1. What is the business of Hetras?
Keith Gruen: hetras is the first company to build a fully cloud-based application for hotels and global chains of all sizes. Particularly suited for the new generation of hotels with a high degree of automation, the hetras hotel management software combines property management system (PMS) with powerful distribution and channel management into a unified application.
The product is offered on a SaaS basis, meaning that hotels pay an all-inclusive flat fee per month per room. Built from the ground up for the internet generation, hetras offers a refreshing new user experience and users only need a tablet or browser with a standard internet connection.
Q2. What are the new generation of hotels? How do they differ from conventional ones?
Keith Gruen: New generation hotels and chains offer a high quality hotel experience in a prime location but at a budget price. They can achieve low prices through elimination of services such as full-service restaurant, room service, SPA, conference and banqueting, valet parking, porters and concierge. These hotels also save money by reducing or eliminating reservation and front desk staff. As a result, the hotels encourage or even require self-service by the guests, including self-reservation via the hotel website, self-check-in and self-checkout via kiosks or mobile apps and self-management of preferences and frequent guest programs. New generation hotels typically place a large focus on high design, top quality bedding, high-tech rooms and excellent showers. Rooms are generally small but stylish and efficient.
Q3. What are the specific data management requirements for these new generation of hotels?
Keith Gruen: New generation hotels have a high degree of automation requirements and integration with third party systems. Integration with online reservation portals such as booking.com and Expedia as well as GDS (global distribution systems) have to be seamless and always up-to-date. Credit card authorization and other payment systems have to work without any human intervention.
The check-in and check-out kiosks or apps have to be intuitive and fast. Revenue management, i.e. establishing rates, restrictions, policies has to be automatic and reliable. As guests are doing more and more of the work, the user experience has to be magnitudes better than any hotel system in the past.
Q4. You developed a Cloud-based Hotel Management System as an Internet-based software-as-a-service. Which hotels currently use your system and for what?
Keith Gruen: Our customers include several new generation hotel groups, such as citizenM (based in Amsterdam), OKKO (Paris), BLOC (UK) and ADDUCCO (Romania). In addition, a number of independent hotels around Europe also use hetras.
Q5. How do you handle data processing, sorting, storage and retrieval?
Q6. Could you be more specific and explain what are these “speed-critical functions” and why do you use an in-memory database for that?
Keith Gruen: Our most speed-critical functions are rate and availability look-ups.
Q7. Rate/availability lookups are complex queries, and demand high performance. Do you handle such queries in MS-SQL or with ExtremeDB?
Keith Gruen: ExtremeDB.
Q8. Most rate/reservation queries are not done by humans but by machines. How do you handle requests from global distribution systems (GDSs)- such as Sabre, Amadeus, Worldport and others-used across the travel industry to check rates and book reservations? What kind of data requirements do they have?
Keith Gruen: The GDS tend to query the hotel reservation systems via robots on a frequent basis. This keeps their cache up-to-date and they can then offer nearly real-time rates and availability to the travel agents and end-users who query their system. Checking if a hotel is available or not is not a single query. The GDS has to check every individual date and for every reasonable length of stay. Because of the peculiarities of the hotel business, a hotel might be sold out for a single-night’s stay on Monday night and Tuesday night, but a guest who wants to stay both nights may still be offered a room. Furthermore, the GDS have to query every different room type. Standard rooms could be sold out while a few deluxe rooms remain. The GDS, most of which are based on 1960s technology, are not capable to poll for changes. They simply query everything on a regular basis. All in all, GDSs are known to make up to 70,000 queries for every single confirmed reservation.
Q9. How did you design your system to achieve scalability (scale out and scale up)?
Keith Gruen: We use virtualized environments. So far, we have scaled up by adding RAM and CPU to our machines.
We have two fully redundant sets of hardware with a load balancer. We could add a third set if necessary.
Q10: Which virtualized environments do you use? Don`t you have performance issue if you use virtualization?
Keith Gruen: We use VMWare. Virtualization has not caused us any performance issues.
Q11. Why do you use an in-memory database system for your system and for what?
Keith Gruen: We use the In-Memory Database (IMDB) especially for the queries as described above. We call this our “quote” module. We do not write to the IMDB. When a reservation is finally completed, we write directly to the main MS-SQL database, which in turn updates the IMDB. The data in the IMDB is configured to answer the most common queries very quickly.
Q12. What are the potential bottlenecks for your system?
Keith Gruen: Hotel staff can theoretically launch batch processes or large-scale searches across vast amounts of data that would slow down the system for all other hotels as well. This is a drawback of having a single-instance multi-tenant architecture.
Q13. Are there any concerns by customers to have their personal data stored in the Cloud?
Keith Gruen: No. Our customers have understood that their data is much safer in our secure and professionally-managed cloud environment than it was on the server in the basement of the hotel.
Keith Gruen is a co-founder and managing director of hetras GmbH, a German-Austrian company that created the first cloud-based hotel management product for global hotel chains. Mr. Gruen is also the founder of Fidelio Software, which built the market-leading Hotel PMS in the 1980’s and 90’s. He later sold the company to Micros. Mr. Gruen co-founded NXN, a developer of computer game technology and Kappa IT, a venture capital firm that invested in technology startups. Prior to hetras, he led corporate development for Conject AG, a developer of software for the real estate and construction industry. Mr. Gruen graduated from Brown University in Mathematics and Computer Science.
Follow ODBMS.org on Twitter: @odbmsorg
“My target is to be the MapReduce solution for the world’s realtime problems.”– Ken Birman.
I had the pleasure to interview Ken Birman, Professor of Computer Science at Cornell University. Ken is a pioneer and one of the leading researchers in the field of distributed systems. Ken has been working recently on a new system called Isis2 (isis2.codeplex.com). I asked him a few questions on Isis2.
Q1. Recently, you have been working on the Isis2 project. What is it?
Ken Birman: Isis2 started as a bit of a hobby for me, and the story actually dates back almost 25 years.
Early in my career, I created a system called the Isis Toolkit, using a model we called virtual synchrony to support strongly consistent replication for services running on clusters of various kinds. We had quite a success with that version of Isis, and it was the basis of the applications I mentioned above – I started a company and we did very well with it. In fact, for more than a decade there was never a disruptive failure at the New York Stock Exchange: components crashed now and then, obviously, but Isis would help guide the solution to a clean recovery and the traders were never aware of any issues. The same is true for the French ATC system or the US Navy AEGIS.
Now this model, virtual synchrony, has deep connections both to the ACID models used in database settings, and to the state machine replication model supported by protocols like Lamport’s Paxos. Indeed, recently we were able to show a bisimulation of the virtually synchronous protocol called gbcast (dating to around 1985) and a version of Paxos for dynamically reconfigurable systems. In some sense, gbcast was the first Paxos – Leslie Lamport says we should have named the protocol “virtually synchronous Paxos”.
(Of course if we had, I suspect that he would have named his own protocol something else!). I could certainly do the same with respect to database serializability – basically, virtual synchrony is like the ACID model, but aimed at “groups of processes” that might use purely in-memory replication. In effect my protocols were optimized ones aimed at supporting ACI but without the D.
Anyhow, over the years, the Isis Toolkit matured and people moved on. I moved on too, and worked on topics involving gossip communication, and security. But eventually this came to frustrate me: I felt that my own best work was being lost to disuse. And so starting four years ago, I decided to create a new and more modern Isis for the cloud. I’m calling it Isis2, and it uses the model Leslie and I talked about (virtually synchronous state machine replication). I’ve implemented the whole thing from scratch, using modern object-oriented languages and tools.
So Isis2 is a new open platform for data replication in the cloud, aiming at cases where a developer might be building some sort of service that would end up running on a cloud platform like Eucalyptus, Amazon EC2, etc. My original idea was to create the ultimate library for distributed computing, but also to make it as easy to use as the GUI builders that our students use in their first course on object oriented programming: powerful technologies but entirely accessed via very simple ideas, such as attaching an event handler to a suitable object. A first version of the Isis2 system became available about two years ago, and I’ve been working on it as much as possible since then, with new releases every couple of months.
But I’ve also slightly shifted my focus, in ways that are bringing Isis2 closer and closer to the big-data and ODBMS community.
I realized that unlike the situation 20 years ago, today the people who most need tools for replicating data with strong consistency are mostly working to create in-memory big-data platforms for the cloud, and then running large machine-learning algorithms on them.
For example, one important target for Isis2 is to support the smart grid here in the United States. So one imagines an application capturing all sorts of data in real-time, from what might be a vast number of sensing points, pulling this into the cloud, and then running a machine learning algorithm on the resulting in-memory data set. You replicate such a service for high availability and to gain faster read-only performance, and then run a distributed algorithm that learns the parameters to some model: perhaps a convex optimization, perhaps a support vector, etc.
I’ve run into other applications with a similar structure. For example, self-driving cars and other AUVs need to query a rapidly evolving situational database, asking questions such as “what road hazards should I be watching for the next 250 meters?” The community doing large-scale simulations for problems in particle physics needs to solve the “nearby particle” problem: “now that I’m at location such-and-such, what particles are close enough to interact with me?” One can make quite a long list. All of these demand rapid updates, in place, to a database that is living in-memory and being queried frequently.
With this in mind, I’ve been extending the Isis2 system to enlarge its support for key-value data, spread within a service and replicated, but with each item residing at just small subsets of the total set of members (“sharded”). My angle has been to offer strong consistency (not full transactions, because those involve locking and become expensive, but a still-powerful “one-shot atomic actions” model). Because Isis2 can offer these guarantees for a workload that includes a mix of updates and queries, the community working on graphical learning and other key-value learning problems where data might evolve even as the system is tracking various properties has shown strong interest in the platform.
Q2. You claim that Isis2 is a new option for cloud computing that can enable reliable, secure replication of data even in the highly elastic first-tier of the cloud. How does it different from existing commercial products such as NoSQL, Amazon Web Services etc.?
Ken Birman: Well, there are really a few differences. One is that Isis2 is a library. So if you compare with other technologies, the closest matches are probably Graphlab from CMU and Spark, Berkeley’s in-memory storage system for accelerating Hadoop computations. A second big difference is that Isis2 has a very strong consistency model, putting me at the other end of the spectrum relative to NoSQL. As for Web Services, well, the thinking is that these services you use Isis2 to create would often be web services, accessed over the web via some representative, which then turns around and interacts with other members using the Isis2 primatives.
Q3. How do you handle Big Data in Isis2
Ken Birman: I’m adding more and more features, but there are two that should be especially interesting to your readers. One is the Isis2 DHT: a key-value store spread over a group of nodes, such that any given key maps to some small subset of the members (a shard). So keys might, for example, be node-ids for a graph, and the values could represent data associated with that node: perhaps a weight, in and out edges to other nodes, etc. The model is incredibly flexible: a key can be any object you like, and the values can also be arbitrary objects. You even can control the mapping of keys to group members, by implementing a specialized version of GetHashCode().
What I do is to allow atomic actions on sets of key-value pairs. So you can insert a collection of key-value pairs, or query a set of them, or even do an ordered multicast to just the nodes that host some set of keys. These actions occur on a consistent cut, giving a form of action by action atomicity. And of course, the solution is also fault-tolerant and even offers a strong form of security via AES-256 encryption, if desired.
What this enables is a powerful new form of distributed aggregation, in which one can guarantee that a query result reflects exactly-once contributions from each matching key-value pair. You can also insert new key-value tuples as part of one of these actions (although those occur as new atomic actions, ordered after the one that initiated them), or even reshuffle the mapping of keys to nodes, by “reconfiguring” the group to use a new GetHashCode() method.
You can also use LINQ in these groups, for example to efficiently query the “slice” of a key-value set that mapped to a particular set of nodes.
As I mentioned, I also have a second big-data feature that should become available later this summer: a tool for moving huge objects around within a cluster of cloud nodes. So suppose that an application is working with gigabyte memory-mapped files.
Sending these around inside messages could be very costly, and will cause the system to stutter because anything ordered after that send or multicast might have to wait for the gigabyte to transfer. With this new feature, I do out-of-band movement of the big objects in a very efficient way, using IP multicast if available (and a form of fat-tree TCP mesh if not), and then ship just a reference to the big object in my messages. This way the in-band communication won’t stutter, and the gigabyte object gets replicated using very efficient out-of-band protocols. By the end of the summer, I’m hoping to have the world’s fastest tool for replicating big memory-mapped objects in the cloud. (Of course, there can be quite a gap between “hoping” and “will have”, but I’m optimistic).
Q4. Isis2 uses a distributed sharding scheme. What is it?
Ken Birman: Again, there are a few answers – if Isis2 has a flaw, I suspect that it comes down to trying to offer every possible cool mechanism to every imaginable developer. Just like Microsoft .NET this can mean that you end up with too many stories. (You won’t be surprised to learn that Isis2 is actually built in .NET, using C#, and hence usable from any .NET language. We use Mono to compile it for use on Linux. And it does have a bit of that .NET flavor).
Anyhow, one scheme is the sharding approach mentioned above. The user takes some group – perhaps it has 10,000 members in it and runs on 10,000 virtual machines on Amazon EC2. And they ask Isis2 to shard the group into shards of some size, maybe 5 or 10 – you pick a factor aimed at soaking up the query load but not imposing too high an update cost.
So in that case I might end up with 1000 or 2000 subgroups: those would be my shards. The mapping is very explicit: given a key-value pair with key K, I compute K.GetHashCode()%NS where NS is the number of shards obtained from the target group size (call it N) and the target shard size (call this S): hashcode%(N/S). And that tells me which shard will hold your key-value pair.
Given the group membership, which is consistent in Isis2, my protocols just count off from left to right to find the members that host this shard. The atomic update or query reaches out to those members: I involve all of them if the action is an update, and I load-balance over the shard for queries.
The other mechanism is coarser-grained: one Isis2 application can actually support multiple process groups. And each of those groups can be a DHT, sharded separately. This is convenient if an application has one long-lived in-memory group for the database, but wants to create temporary data structures too. What you can do is to use a separate temporary group for those temporary key-value pairs or structures. Then when your computation is finished, you have an easy way to clean up: you just tell Isis2 to discard the temporary group and all its contents will evaporate, like magic.
Q5. Isis2 sounds like MapReduce. Is this right? What are the differences and what are the similarities?
Ken Birman: Roberto, you are exactly right about this. While you can treat the Isis2 DHT as a key-value store, a natural style of computing would be to use the kinds of code one sees with iterative MapReduce applications. Isis2 is in-memory, of course (although nothing stops you from pairing it with a persistent database or log – in fact I provide tools for doing that). But you could do this, and if you did, Isis2 would be acting a lot like Spark.
I think the main point is that whereas a system like Spark just speeds MapReduce up by caching partial results, and really works only for immutable or append-only data sets, Isis2 can support dynamic updates while the queries are running. My target is to be the MapReduce solution for the world’s realtime problems. Moreover, I’m doing this for people who need strong consistency.
Get the answer wrong in the power grid and you cause a power outage or damage a transformer (does anyone remember how Christ Church took itself off the New Zealand power grid for a month?) Get the answer wrong in a self-driving car, and it might have an accident. So for situations in which data is rapidly evolving, Isis2 offers a way to do highly scalable queries even as you support updates in a highly scalable way too.
I actually see the similarity to MapReduce as a good thing. To me this model has been incredibly popular, because people find it easy to use. I can leverage that popularity.
The bigger contrast is with true transactional databases. Isis2 does support locking, but not full-scale transactions. It would be dishonest to say that I don’t think transactions can run at massive scale – in fact I’m working with a post-doc, Ittay Eyal, on an extension of Isis2 that we call ACID-RAIN that has a full transactional model. And I know of others who are doing competing systems – Marcos Aguilera at Microsoft, for example, who is hard at work on his follow-on to the famous Sinfornia system.
But I don’t think we need the full costs of ACID transactions for in-memory key-value stores, and this has pushed me towards a lock-free computing model for this part of Isis2, and towards the kind of weaker atomicity I offer: guarantees for individual actions on sets of key-value pairs, but with each step in your computation treated as a separate one-shot transaction.
Q6. You claim to be able to run consistent queries even on rapidly changing data, and yet scales as well as any sharding scheme. Please explain how?
Ken Birman: The question centers on semantics: what do I mean by a consistent query? For me, a query that ran on a consistent cut (like with snapshot isolation) and gives a result that reflects exactly one contribution from each matching key-value pair, is a “consistent” query. This is definitely what you want for some purposes. But if you want to define consistency to mean for full transactions, I’m not taking that last step. I offer the building blocks – locking, atomic multicast, etc. You could easily run a transaction and do a 2-phase commit at the end. But I just doubt that this would perform well and so I haven’t picked it as my main consistency model.
Q7. Why using LINQ as a query language and not SQL?
Ken Birman: I’m using LINQ, but maybe not in the way you are thinking. There are some projects that do use LINQ as their main user API. In the Isis2 space, Naiad and the Dryad system that preceded it would be famous examples. But I find it hard to work with systems that “map” my LINQ query to a distributed key-value structure. I’ve always favored simpler building blocks.
So the way I’m using LINQ is more limited. For me, a user might issue some sort of query, and at the first step this looks like a multicast: you specify the target group or the target shards (depending on whether you want every member or just the ones associated with certain keys), and the information you offered as arguments to the query or multicast are automatically translated to an efficient out-form and transmitted to the relevant members. On the members an upcall event occurs to a handler the developer coded, perhaps in C# or perhaps in some other .NET language like C++/CLI, Visual Basic, Java, etc. I think .NET has something like 40 languages you can use – I myself work mostly in C#.
So this C# handler is invoked with the specified arguments, in parallel, on the set of group members that matched the target you specified. Each one of them has a slice of the full DHT: the subset of key-value pairs that mapped to that particular member, in the way we talked about earlier – a mapping you as the user can control, and even modify dynamically (if you do, Isis2 reshuffles the key-value pairs appropriately).
What I do with LINQ is to let your code access this slice of the DHT as a collection on which you could run a LINQ query. And in fact, as you may know, LINQ has an SQL front end, so you could even use SQL on these collections. So the C# handler has this potentially big set of key-value tuples, the full set for that slice (remember that I guarantee consistency!), and with LINQ each of those members now can compute its share of the answer.
What happens next is up to you. You could use this answer to insert new key-value tuples: that would be like the shuffle and reduce step in MapReduce. You could send the answers to the query initiator, as a reply – Isis2 supports 1-K queries that get lists of K results back, and the initiator could then aggregate the results (perhaps using LINQ at that step too, or SQL). And finally I have a tree-structured aggregation option, where I build a binary tree spanning the participants and combine the subresults using user-specified code, so that just one aggregated result comes out, after log(N) delay. That last option would be sensible if you might end up with a very large number of answers, or really big ones.
Q8. How fault-tolerant is Isis2?
Ken Birman: Virtual synchrony is very powerful as a tool for building pretty much any fault-tolerance approach one likes (people who prefer Paxos can reread that sentence as “the dynamic state machine model…” and people who think in terms of ACID transactions can see this as “ACID-based SQL systems…”). The model I favor is one in which updates are always applied in an all-or-nothing way, but queries might be partially completed and then abandoned (“aborted”) if a failure occurs.
So with Isis2, the basic guarantee is that an atomic multicast or query reaches all the operational group members that it is supposed to reach, and in a total order with respect to conflicting updates. Queries either reflect exactly once contributions from each matching key-value pair, or they abort (if a failure disrupts them), and you need to reissue the request in the new group membership – the new process group “view”.
Q9. What about Hadoop? Do you plan to have an interface to Hadoop?
Ken Birman: I’ve suggested this topic to a few of my PhD students but up to now, none has “bit”. I think my group tends to attract students who have more of a focus on lower levels of the infrastructure. But with our work on the smart power grid, this could change; I’m starting to interact much more with students who come from the machine learning community and who might find that step very appealing. It wouldn’t really be all that hard to do.
Q10. Isis2 is open source. How can developers contribute?
Ken Birman: This is a great question. The basic system is open source under a free 3-clause BSD license. You can access it here: isis2.codeplex.com. – I have a big user manual there, and one of those compiled HTML documentation pages for each of my APIs, and I’m going to be doing some form of instructional MOOC soon, so there should end up being ten or so short videos showing how to program with the system.
Initially, I think that people who would want to play with Isis2 might be best off limiting themselves to working with the system but not changing my code directly. My worry is that the code really is quite subtle and while I would love to have help, my experience here at Cornell has been that even well-intentioned students have made a lot of mistakes by not really understanding my code and then trying to change it. I’m sorry to say that as of now, not a line of third party code has survived – not because I don’t want help (I would love help), but because so far, all the third-party code has died horribly when I really tested it carefully!
But over time, I’m hoping that Isis2 could become more and more of a community tool. In fact complex or not, I do think others could definitely master it and help me debug and extend it. They just need to move no faster than the speed of light, which is kind of slow where large, complex tools are concerned. Building things that work “over” Isis2 but don’t change it is an especially good path: I’ll be happy to fix bugs other people identify, and then your add-ons can become third party extensions without being somehow key parts of the system. Then as things mature (keep in mind that Isis2 itself is nearly four years old right now), things could gradually migrate into the core release.
I think this is how other big projects tend to evolve towards an open development model: nobody trusts a contributor who shows up one day and announces that he wants to rewrite half the system. But if that person hangs around for a while and proves his or her talents over time, they end up in the inner circle. So I’m not trying to be overprotective, but I do want the system to be incredibly robust.
This is how we’re building the ACID-RAIN system I mentioned. Ittay Eyal owns the architecture of that system, but he has no interest at all in replicating things already available in Isis2, which after all is quite a big and powerful system. So he’s using it but rather than building ACID-RAIN by modifying Isis2, his system will be more of an application that uses Isis2. To the extent that he needs things Isis2 is lacking, I can build them for him. But later, if ACID-RAIN becomes the world’s ultimate SQL solution for the cloud, maybe Isis2 and ACID-RAIN would merge in some way. Over time I have no doubt at all that a talented developer like Ittay could become as expert as he needs to be even with my most obscure stuff.
And the fact is that I do need help. Tuning Isis2 to work really well in these massive settings is a hard challenge for me; more hands would definitely help. Right now I’m in the middle of porting it to work well with Infiniband connections, for example. You might thing such a step would be trivial, and in a way it is: I just need to adapt the way that I talk to my network sockets. But in fact this has all sorts of implications for flow control, and timing in many protocols. A small step becomes a big task. (I’m kind of shuddering at the thought of needing to move to IPv6!) I can support the system now, with 3000 downloads to date but relatively few really active users. If someday I have lots of users and a StackOverflow.com tag of my very own, I’ll need a hand!
Qx Anything else you wish to add?
Ken Birman: Roberto, I just want to thank you for the great job you do with the ODBMS.org blog. I think it has become a great resource, and I hope this little interview gets a few of your readers interested in fooling around with Isis2! Tell them to feel free to contact me at Cornell, or email@example.com/
Ken Birman is the N. Rama Rao Professor of Computer Science at Cornell. An ACM Fellow and the winner of the IEEE Tsutomu Kanai award, Ken has written 3 textbooks and published more than 150 papers in prestigious journals and conferences. Software he developed operated the New York Stock Exchange for more than a decade without trading disruptions, and played central roles in the French Air Traffic Control System (now expanding into much of Europe) and the US Navy AEGIS warship. Other technologies from his group found their way into IBM’s Websphere product, Amazon’s EC2 and S3 systems, Microsoft’s cluster management solutions. His latest system, Isis2 (isis2.codeplex.com) helps developers create secure, strongly consistent and scalable cloud computing solutions.
- ODBMS.org Cloud Data Stores – Lecture Notes: “Data Management in the Cloud” by Michael Grossniklaus, David Maier, Portland State University.
Lecture Notes | Intermediate/Advanced | English | DOWNLOAD ~280 slides (PDF)| 2011-12|
Follow ODBMS.org on Twitter: @odbmsorg
“Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality. Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a
product with all of those properties”–Tom Kincaid.
What is new with PostgreSQL? I have Interviewed Tom Kincaid, head of Products and Engineering at EnterpriseDB.
(Tom prepared the following responses with contributions from the EnterpriseDB development team)
Q1. EnterpriseDB products are based upon PostgreSQL. What is special about your product offering?
Tom Kincaid: EnterpriseDB has integrated many enterprise features and performance enhancements into the core PostgreSQL code to create a database with the lowest possible TCO and provide the “last mile” of service needed by enterprise database users.
EnterpriseDB´s Postgres Plus software provides the performance, security and Oracle compatibility needed to address a range of enterprise business applications. EnterpriseDB´s Oracle compatibility, also integrated into the PostgreSQL code base, allows many Oracle shops to realize a much lower database TCO while utilizing their Oracle skills and applications designed to work against Oracle databases.
EnterpriseDB also creates enterprise-grade tools around PostgreSQL and Postgres Plus Advanced Server for use in large-scale deployments. They are Postgres Enterprise Manager, a powerful management console for managing, monitoring and tuning databases en masse whether they´re PostgreSQL community version or EnterpriseDB´s enhanced Postgres Plus Advanced Server; xDB Replication Server with multi-master replication and replication between Postgres, Oracle and SQL Server databases; and SQL/Protect for guarding against SQL Injection attacks.
Tom Kincaid: There are several areas of difference. PostgreSQL has traditionally had a stronger focus on data integrity and compliance with the SQL standard.
MySQL has traditionally been focused on raw performance for simple queries, and a typical benchmark is the number of read queries per second that the database engine can carry out, while PostgreSQL tends to focus more on having a sophisticated query optimizer that can efficiently handle more complex queries, sometimes at the expense of speed on simpler queries. And, for a long time, MySQL had a big lead over PostgreSQL in the area of replication technologies, which discouraged many users from choosing PostgreSQL.
Over time, these differences have diminished. PostgreSQL´s replication options have expanded dramatically in the last three releases, and its performance on simple queries has greatly improved in the most recent release (9.2). On the other hand, MySQL and MariaDB have both done significant recent work on their query optimizers. So each product is learning from the strengths of the other.
Of course, there´s one other big difference, which is that PostgreSQL is an independent open source project that is not, and cannot be, controlled by any single company, while MySQL is now owned and controlled by Oracle.
MariaDB is primarily developed by the Monty Program and shows signs of growing community support, but it does not yet have the kind of independent community that PostgreSQL has long enjoyed.
Q3. Tomas Ulin mentioned in an interview that “with MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements”. What is your take on this?
Tom Kincaid: I think anyone who is developing an RDBMS today has to be aware that there are some users who are looking for the features of a key-value store or document database.
On the other hand, many NoSQL vendors are looking to add the sorts of features that have traditionally been associated with an enterprise-grade RDBMS. So I think that theme of convergence is going to come up over and over again in different contexts.
That´s why, for example, PostgreSQL added a native JSON datatype as part of the 9.2 release, which is being further enhanced for the forthcoming 9.3 release.
Will we see a RESTful or memcached-like interface to PostgreSQL in the future? Perhaps.
Right now our customers are much more focused on improving and expanding the traditional RDBMS functionality, so that´s where our focus is as well.
Q4. How would you compare your product offering with respect to NoSQL data stores, such as CouchDB, MongoDB, Cassandra and Riak, and NewSQL such as NuoDB and VoltDB?
Tom Kincaid: It is a matter of the right tools for the right problem. Many of our customers use our products together with the NoSQL solutions you mention. If you need ACID transaction properties for your data, with savepoints and rollback capabilities, along with the ability to access data in a standardized way and a large third party tool set for doing it, a time tested relational database is the answer.
The SQL standard provides the benefit of always being able to switch products and having a host of tools for reporting and administration. PostgreSQL, like Linux, provides the benefit of being able to switch service partners.
If your use case does not mandate the benefits mentioned above and you have data sets in the Petabyte range and require the ability to ingest Terabytes of data every 3-4 hours, a NoSQL solution is likely the right answer. As I said earlier many of our customers use our database products together with NoSQL solutions quite successfully. We expect to be working with many of the NoSQL vendors in the coming year to offer a more integrated solution to our joint customers.
Since it is still pretty new, I haven´t had a chance to evaluate NuoDB so I can´t comment on how it compares with PostgreSQL or Postgres Plus Advanced Server.
As far as VoltDB is concerned there is a blog by Dave Page, our Chief Architect for tools and installers, that describes the differences between PostgreSQL and VoltDB. It can be found here.
There is also some terrific insight, on this topic, in an article by my colleague Bruce Momjian, who is one of the most active contributors to PostgreSQL, that can be found here.
Q5. Justin Sheehy of Basho in an interview said “I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense”. What is your opinion on this?
Tom Kincaid: It´s overly simplistic. There is certainly room for asynchronous multi-master replication in applications such as banking, but it has to be done very, very carefully to avoid losing track of the money.
It´s not clear that the NoSQL products which provide eventual consistency today make the right trade-offs or provide enough control for serious enterprise applications – or that the products overall are sufficiently stable. Relational databases remain the most mature, time-tested, and stable solution for storing enterprise data.
NoSQL may be appealing for Internet-focused applications that must accommodate truly staggering volumes of requests, but we anticipate that the RDBMS will remain the technology of choice for most of the mission-critical applications it has served so well over the last 40 years.
Q6. What are the suggested criteria for users when they need to choose between durability for lower latency, higher throughput and write availability?
Tom Kincaid: Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality.
Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a product with all of those properties.
If you have an application that has a large write throughput and you assume that you can store all of that data using a single database server, which has to scale vertically to meet the load, you´re going to be unhappy eventually. With a traditional RDBMS, you´re going to be unhappy when you can´t scale far enough vertically. With a distributed key-value store, you can avoid that problem, but then you have all the challenges of maintaining a distributed system, which can sometimes involve correlated failures, and it may also turn out that your application makes assumptions about data consistency that are difficult to guarantee in a distributed environment.
By making your assumptions explicit at the beginning of the project, you can consider alternative designs that might meet your needs better, such as incorporating mechanisms for dealing with data consistency issues or even application-level shading into the application itself.
Q7. How do you handle Large Objects Support?
Tom Kincaid: PostgreSQL supports storing objects up to 1GB in size in an ordinary database column.
For larger objects, there1s a separate large object API. In current releases, those objects are limited to just 2GB, but the next release of PostgreSQL (9.3) will increase that limit to 4TB. We don´t necessarily recommend storing objects that large in the database, though
in many cases, it´s more efficient to store enormous objects on a file server rather than as database objects. But the capabilities are there for those who need them.
Q8. Do you use Data Analytics at EnterpriseDB and for what?
Tom Kincaid: Most companies today use some form of data analytics to understand their customers and their marketplace and we1re no exception. However, how we use data is rapidly changing given our rapid growth and deepening
penetration into key markets.
Q9. Do you have customers who have Big Data problem? Could you please give us some examples of Big Data Use Cases?
Tom Kincaid: We have found that most customers with big data problems are using specialized appliances and in fact we partnered with Netezza to assist in creating such an appliance – The Netezza TwinFin Data Warehousing appliance.
Q10. How do you handle the Big Data Analytics “process” challenges with deriving insight?
Tom Kincaid: EnterpriseDB does not specialize in solutions for the Big Data market and will refer prospects to specialists like Netezza.
Q11. Do you handle un-structured data? If yes, how?
Tom Kincaid: PostgreSQL has an integrated full-text search capability that can be used for document processing, and there are also XML and JSON data types that can be used for data of those types. We also have a PostgreSQL-specific data type called hstore that can be used to store groups of key-value pairs.
Q12. Do you use Hadoop? If yes, what is your experience with Hadoop so far?
Tom Kincaid: We developed, and released in late 2011, our Postgres Plus Connector for Hadoop, which allows massive amounts of data from a Postgres Plus Advanced Server (PPAS) or PostgreSQL database to be accessed, processed and analyzed in a Hadoop cluster. The Postgres Plus Connector for Hadoop allows programmers to process large amounts of SQL-based data using their familiar MapReduce constructs. Hadoop combined with PPAS or PostgreSQL enables users to perform real time queries with Postgres and non-real time CPU intensive analysis and with our connector, users can load SQL data to Hadoop, process it and even push the results back to Postgres.
Q13 Cloud computing and open source: How does it relate to PostgreSQL?
Tom Kincaid: In 2012, EnterpriseDB released its Postgres Plus Cloud Database. We´re seeing a wide-scale migration to cloud computing across the enterprise. With that growth has come greater clarity in what developers need in a cloudified database. The solutions are expected to deliver lower costs and management ease with even greater functionality because they are taking advantage of the cloud.
Tom Kincaid.As head of Products and Engineering, Tom leads the company’s product development and directs the company’s world-class software engineers. Tom has nearly 25 years of experience in the Enterprise Software Industry.
Prior to EnterpriseDB, he was VP of software development for Oracle’s GlassFish and Web Tier products.
He integrated Sun’s Application Server Product line into Oracle’s Fusion middleware offerings. At Sun Microsystems, he was part of the original Java EE architecture and management teams and played a critical role in defining and delivering the Java Platform.
Tom is a veteran of the Object Database industry and helped build Object Design’s customer service department holding management and senior technical contributor roles. Other positions in Tom’s past include Director of Quality Engineering at Red Hat and Director of Software Engineering at Unica.
ODBMS.org: Relational Databases, NewSQL, XML Databases, RDF Data Stores
Blog Posts |Free Software | Articles and Presentations| Lecture Notes | Tutorials| Journals |
Follow ODBMS.org on Twitter: @odbmsorg
“The benefits of hybrid storage are lower cost, and more varied price/performance options. We are seeing SSD pricing range down to $3/GB for enterprise drives, and less than $1/GB for consumer drives, and we also support using standard rotational drives for persistence. Our goal is in-memory performance with the price of ‘storage’ – which makes multi-terabyte analytics caches possible” –Brian Bulkowski.
I have interviewed Brian Bulkowski, Founder & CTO of Aerospike.
Q1. What is Aerospike?
Brian Bulkowski: Aerospike is a high speed and highly available NoSQL database. We have proven levels of performance and reliability on commodity hardware that go far beyond any other database – and have the years of production experience to back up our claims.
Q2. How does it compare with other NoSQL or NewSQL databases?
Brian Bulkowski: Aerospike has specialized in areas where downtime is not an option – and at the highest levels of performance. Another area of specialization is applications where low latencies dramatically simplify application development.
As a row-based store with very high levels of read and write concurrency, we perform well where behavior and interactions are concerned – whether those interactions are user-oriented, such as advertising and gaming; or machine to machine, such as capital markets or the emerging Internet of Things.
Q3. What are the typical applications for which Aerospike is currently in use?
Brian Bulkowski: A number of our customers use Aerospike for real-time audience and behavior analysis. A common requirement is to hold profile information for several billion cookies, with a rich schema-free ability to add new data and types. In this use case, a cost-effective multi-terabyte store changes what a business can achieve.
Other use cases include massive multi-player gaming, where a shared game state (at scale) is key to a great user experience, and retaining peak capacity is critical.
Q4. Could you give us some detail on Aerospike`s hybrid memory architecture?
Brian Bulkowski: One of the greatest challenges with databases is keeping indexes in sync with data, and inserting and deleting index entries while under read load. In this area, in-memory techniques shine. An in-memory index can handle a high level of parallelism, and support simultaneous read and write loads. Indexes can be made very small – Aerospike has optimized index entries to one CPU cache line.
Aerospike also uses the technique of allowing a process to restart and retain its index, by using the shared memory facility of Linux. An index needs to be recalculated only when a machine is restarting, and in those cases it can be best to simply use the promoted replica during the short time the index is being re-created.
Q5. What is the main advantage of using a hybrid memory architecture?
Brian Bulkowski: The benefits of hybrid storage are lower cost, and more varied price/performance options. We are seeing SSD pricing range down to $3/GB for enterprise drives, and less than $1/GB for consumer drives, and we also support using standard rotational drives for persistence. Our goal is in-memory performance with the price of ‘storage’ – which makes multi-terabyte analytics caches possible.
Q6. How do you ensure scale up and scale out?
Brian Bulkowski: Scale up is maintained through short individual code paths, and single-server performance benchmarks. Our architecture focuses on enabling multiple cores, deployment configurations with appropriate interrupt routing rules, and classic database techniques like asynchronous IO.
Scale out is ensured through a number of techniques, such as enforced randomness through provably random storage distribution, a minimal and deterministic number of nodes participating in a transaction, and client techniques which limit the effect of a single slow node on other nodes.
Q7. Do you have any performance measures to share with us?
Brian Bulkowski: Please see this performance benchmark report from Thumbtack Technology, demonstrating how Aerospike is 10 times faster than MongoDB or Cassandra.
Please also see Aerospike Principal Architect Russ Sullivan’s High Scalability blog post on achieving 1M TPS on commodity hardware using Aerospike. This use case involves more system tuning than the Thumbtack tests.
Q8. How do you handle real-time big data analytics?
Brian Bulkowski: Many customers use Aerospike as a persistent part of their real-data analytics processing chain. Their application logic needs to know counts of recent visits, recent searches, preferences, preferences of friends, and also store batch analytics results. For example, you might calculate that a certain user is part of an audience, then keep up-to-date per audience counters.
By placing the calculations in the hands of the developer – with extreme low latency – we find developers and architects are able to scale increasingly powerful real-time computations.
Q9. You claim to provide 17x better TCO than other NoSQL databases. Could you please provide some evidence for this claim?
Brian Bulkowski: A great benefit of Aerospike is the ability to store data in Flash, and indexes in DRAM. This plays to the strengths of both SSD and DRAM, as memory accelerates the indexes and finding of the data, and then only one device data read is required to fetch (or write) the data.
Most NoSQL databases require all the data in memory in order to achieve high performance. A more detailed implementation is outlined in David Floyer’s Dec 12, 2012 Wikibon article.
Q10. How do you manage replication in Aerospike?
Brian Bulkowski: We commit each write to memory, and reserve space for the data to be written to disk. The writes are aggregated into large blocks to allow the streaming writes that are optimal for both SSDs and standard rotational disks, which passes through a short disk queue (often committing in less than a few milliseconds).
Each row has a current master server within the cluster. This mapping changes as servers are added and removed, and one or more replicas are chosen randomly from the remaining nodes.
Q11. What about data consistency when updating the data?
Brian Bulkowski: By applying the change to all relevant machines in the cluster, we achieve the important application requirement of a write taking effect immediately. By applying the write to multiple machines in-memory, we achieve very high reliability, because multiple machines must fail in the same millisecond to have uncommitted transactions. When the application code reads, they will always see the updated data by default – with no performance impact.
Q12. Could you please explain how cross data center synchronization works for Aerospike?
Brian Bulkowski: Our cross data center synchronization is a no-single-point of failure asynchronous replication mechanism. Asynchronous replication is important because a network outage or bandwidth issues impedes long haul links, but a data center can still serve requests based on its local cluster. The system can be operated both with a ‘star topology’, where a single master cluster is replicated to other more local data centers, or in a ‘ring topology’, where every cluster takes writes and those writes are forwarded around the ring. Conflicts can happen in a ring deployment if a row is modified in different locations at the same time.
This requires a conflict resolution system – like Aerospike’s versions – that retains conflicts for later resolution, or a heuristic – retaining a row with the most writes and a later timestamp.
Q13. What is the contribution of Don Haderle to Aerospike?
Brian Bulkowski: Don first exposed us to DB2‘s historical strategy of building very fast and reliable single-row operations, then adding higher level primitives. Primary key database operations are similar to storage operations in a file store, and have extraordinary business value. By deploying with many paying customers, we’ve proven our reliable and fast primary key layer – unlike other new, clustered databases which are still seeking large production proof.
Brian Bulkowski, Co-Founder & CTO Aerospike
Brian has 20+ years experience designing, developing and tuning networking systems and high performance high scale infrastructures. He founded Aerospike after learning first hand, the scaling limitations of sharded MySQL systems at Aggregated Knowledge. As Director of Performance at this Media Intelligence SaaS company, Brian’s team built and operated a clustered recommendation engine. Prior to Aggregate Knowledge, Brian was a founding member of the digital TV team at Navio Communications and Chief Architect of Cable Solutions at Liberate Technologies where he built the high performance embedded networking stack and the internet scale broadcast server infrastructure. Before Liberate, Brian was a Lead Engineer at Novell, where he was responsible for the AppleTalk stack for Netware 3 and 4. Brian holds a B.S. from Brown University in Mathematics/Computer Science.
NoSQL Failover Characteristics: Aerospike, Cassandra, Couchbase, and MongoDB.
Ben Engber, CEO, Thumbtack Technology
Thumbtack’s latest whitepaper focuses on how NoSQL databases perform during hardware failures. This paper follows our previous study of NoSQL performance and explores how different consistency and durability models affect hardware planning and application design.
Paper | Advanced| English |DOWNLOAD (PDF)| MARCH 2013
Ultra-High Performance NoSQL Benchmarking: Analyzing Durability and Performance Tradeoffs.
Ben Engber, CEO, Thumbtack Technology, JANUARY 2013
One question that comes up a lot in our discussions with customers is “Tell me which NoSQL database is best.” Of course, the answer to this depends on the use cases and the transactional needs of the specific customer. We regularly go through this analysis with customers, but one thing we find as we start is often the basic questions of how much consistency is needed and what are the tradeoffs involved that have not been answered. Ultra-High Performance NoSQL Benchmarking: Analyzing Durability and Performance Tradeoffs is a benchmark report created to help organizations answer exactly those questions pertaining to NoSQL databases. Our goal was to create a benchmark that addressed a specific class of real-world problems with big data, unlike other published reports that are out there. We focused on how the durability and consistency needs affected raw performance.
Paper | Advanced| English | DOWNLOAD (PDF)| 2013|
- Don Haderle, Father of IBM DB2, Talks about the Information Explosion:
** Follow ODBMS.org on Twitter:@odbmsorg
“The only obstacle to Semantic Web technologies in the enterprise lies in better articulation of the value proposition in a manner that reflects the concerns of enterprises. For instance, the non disruptive nature of Semantic Web technologies with regards to all enterprise data integration and virtualization initiatives has to be the focal point”
–Kingsley Uyi Idehen.
I have interviewed Kingsley Idehen founder and CEO of OpenLink Software. The main topics of this interview are: the Semantic Web, and the Virtuoso Hybrid Data Server.
Q1. The vision of the Semantic Web is the one where web pages contain self describing data that machines will be able to navigate them as easily as humans do now. What are the main benefits? Who could profit most from the Semantics Web?
Kingsley Uyi Idehen: The vision of a Semantic Web is actually the vision of the Web. Unbeknownst to most, they are one and the same. The goal was always to have HTTP URIs denote things, and by implication, said URIs basically resolve to their meaning  .
Paradoxically, the Web bootstrapped on the back of URIs that denoted HTML documents (due to Mosaic’s ingenious exploitation of the “view source” pattern ) thereby accentuating its Web of hyper-linked Documents (i.e., Information Space) aspect while leaving its Web of hyper-linked Data aspect somewhat nascent.
The nascence of the Web of hyper-linked Data (aka Web of Data, Web of Linked Data etc.) laid the foundation for the “Semantic Web Project” which naturally evoled into “The Semantic Web” meme. Unfortunately, “The Semantic Web” meme hit a raft of issues (many self inflicted) that basically disconnected it from its original Web vision and architecture aspect reality.
The Semantic Web is really about the use of hypermedia to enhance the long understood entity relationship model  via the incorporation of _explicit_ machine- and human-comprehensible entity relationship semantics via the RDF data model. Basically, RDF is just about an enhancement to the entity relationship model that leverages URIs for denoting entities and relations that are described using subject->predicate->object based proposition statements.
For the rest of this interview, I would encourage readers to view “The Semantic Web” phrase as meaning: a Web-scale entity relationship model driven by hypermedia resources that bear entity relationship model description graphs that describe entities and their relations (associations).
To answer your question, the benefits of the Semantic Web are as follows: fine-grained access to relevant data on the Web (or private Web-like networks) with increasing degrees of serendipity .
Q2. Who is currently using Semantic Web technologies and how? Could you please give us some examples of current commercial projects?
Kingsley Uyi Idehen: I wouldn’t used “project” to describe endeavors that exploit Semantic Web oriented solutions. Basically, you have entire sectors being redefined by this technology. Examples range from “Open Government” (US, UK, Italy, Spain, Portugal, Brazil etc..) all the way to publishing (BBC, Globo, Elsevier, New York Times, Universal etc..) and then across to pharmaceuticals (OpenPHACTs, St. Judes, Mayo, etc.. ) and automobiles (Daimler Benz, Volkswagen etc..). The Semantic Web isn’t an embryonic endeavor deficient on usecases and case studies, far from it.
Q3. Virtuoso is a Hybrid RDBMS/Graph Column store. How does it differ from relational databases and from XML databases?
Kingsley Uyi Idehen:: First off, we really need to get the definitions of databases clear. As you know, the database management technology realm is vast. For instance, there isn’t anything such thing as a non relational database.
Such a system would be utterly useless beyond an comprehendible definition, to a marginally engaged audience. A relational database management system is typically implemented with support for a relational model oriented query language e.g., SQL, QUEL, OQL (from the Object DBMS era), and more recently SPARQL (for RDF oriented databases and stores). Virtuoso is comprised of a relational database management system that supports SQL, SPARQL, and XQuery. It is optimized to handle relational tables and/or relational property graphs (aka. entity relationship graphs) based data organization. Thus, Virtuoso is about providing you with the ability to exploit the intensional (open world propositions or claims) and extensional (closed world statements of fact) aspects of relational database management without imposing either on its users.
Q4. Is there any difference with Graph Data stores such as Neo4j?
Kingsley Uyi Idehen: Yes, as per my earlier answer, it is a hybrid relational database server that supports relational tables and entity relationship oriented property graphs. It’s support for RDF’s data model enables the use of URIs as native types. Thus, every entity in a Virtuoso DBMS is endowed with a URI as its _super key_. You can de-reference the description of a Virtuoso entity from anywhere on a network, subject to data access policies and resource access control lists.
Q5. How do you position Virtuoso with respect to NoSQL (e.g Cassandra, Riak, MongoDB, Couchbase) and to NewSQL (e.g.NuoDB, VoltDB)?
Kingsley Uyi Idehen: Virtuoso is a SQL, NoSQL, and NewSQL offering. Its URI based _super keys_ capability differentiates it from other SQL, NewSQL, and NoSQL relational database offerings, in the most basic sense. Virtuoso isn’t a data silo, because its keys are URI based. This is a “deceptively simple” claim that is very easy to verify and understand. All you need is a Web Browser to prove the point i.e., a Virtuoso _super key_ can be placed in the address bar of any browser en route to exposing a hypermedia based entity relationship graph that navigable using the Web’s standard follow-your-nose pattern.
Q6. RDF can be encoded in various formats. How do you handle that in Virtuoso?
Kingsley Uyi Idehen: Virtuoso supports all the major syntax notations and data serialization formats associated with the RDF data model. This implies support for N-Triples, Turtle, N3, JSON-LD, RDF/JSON, HTML5+Microdata, (X)HTML+RDFa, CSV, OData+Atom, OData+JSON.
Q7. Does Virtuoso restrict the contents to triples?
Kingsley Uyi Idehen: Assuming you mean: how does it enforce integrity constraints on triple values?
It doesn’t enforce anything per se. since the principle here is “schema last” whereby you don’t have a restrictive schema acting as an inflexible view over the data (as is the case with conventional SQL relational databases). Of course, an application can apply reasoning to OWL (Web Ontology Language) based relation semantics (i.e, in the so-called RBox) as option for constraining entity types that constitute triples. In addition, we will soon be releasing a SPARQL Views mechanism that provides a middle ground for this matter whereby the aforementioned view can be used in a loosely coupled manner at the application, middleware, or dbms layer for applying constraints to entity types that constitute relations expressed by RDF triples.
Q8. RDF can be represented as a direct graph. Graphs, as data structure do not scale well. How do you handle scalability in Virtuoso? How do you handle scale-out and scale-up?
Kingsley Uyi Idehen: The fundamental mission statement of Virtuoso has always be to destroy any notion of performance and scalability as impediments to entity relationship graph model oriented database management. The crux of the matter with regards to Virtuoso is that it is massively scalable due for the following reasons:
• fine-grained multi-threading scoped to CPU cores
• vectorized (array) execution of query commands across fine-grained threads
• column-store based physical storage which provides storage layout and data compaction optimizations (e.g., key compression)
• share-nothing clustering that scales from multiple instances (leveraging the items above) on a single machine all the way up to a cluster comprised of multiple machines.
The scalability prowess of Virtuoso are clearly showcased via live Web instances such as DBpedia and the LOD Cloud Cache (50+ Billion Triples). You also have no shortage of independent benchmark reports to compliment the live instances:
•50 – 150 Billion scale Berlin SPARQL Benchmark (BSBM) report (.pdf)
Q9. Could you give us some commercial examples where Virtuoso is in use?
Kingsley Uyi Idehen: Elsevier, Globo, St. Judes Medical, U.S. Govt., EU, are a tiny snapshot of entities using Virtuoso on a commercial basis.
Q10. Do you plan in the near future to develop integration interfaces to other NoSQL data stores?
Kingsley Uyi Idehen: If a NewSQL or NoSQL store supports any of the following, their integration with Virtuoso is implicit: HTTP based RESTful interaction patterns, SPARQL, ODBC, JDBC, ADO.NET, OLE-DB. In the very worst of cases, we have to convert the structured data returned into 5-Star Linked Data using Virtuoso’s in-built Linked Data middleware layer for heterogeneous data virtualization.
Q11. Virtuoso supports SPARQL. SPARQL is not SQL, how do handle querying relational data then?
Kingsley Uyi Idehen: Virtuoso support SPARQL, SQL, SQL inside SPARQL and SPARQL inside SQL (we call this SPASQL). Virtuoso has always had its own native SQL engine, and that’s integral to the entire product. Virtuoso provides an extremely powerful and scalable SQL engine as exemplified by the fact that the RDF data management services are basically driven by the SQL engine subsystem.
Q12. How do you support Linked Open Data? What advantages are the main benefits of Linked Open Data in your opinion?
Kingsley Uyi Idehen: Virtuoso enables you expose data from the following sources, courtesy of its in-built 5-star Linked Data Deployment functionality:
• RDF based triples loaded from Turtle, N-Triples, RDF/XML, CSV etc. documents
• SQL relational databases via ODBC or JDBC connections
• SOAP based Web Services
• Web Services that provide RESTful interaction patterns for data access.
• HTTP accessible document types e.g., vCard, iCalendar, RSS, Atom, CSV, and many others.
Q13. What are the most promising application domains where you can apply triple store technology such as Virtuoso?
Kingsley Uyi Idehen: Any application that benefits from high-performance and scalable access to heterogeneously shaped data across disparate data sources. Healthcare, Pharmaceuticals, Open Government, Privacy enhanced Social Web and Media, Enterprise Master Data Management, Big Data Analytics etc..
Q14. Big Data Analysis: could you connect Virtuoso with Hadoop? How does Viruoso relate to commercial data analytics platforms, e.g Hadapt, Vertica?
Kingsley Uyi Idehen: You can integrate data managed by Hadoop based ETL workflows via ODBC or Web Services driven by Hapdoop clusters that expose RESTful interaction patterns for data access. As for how Virtuoso relates to the likes of Vertica re., analytics, this is about Virtuoso being the equivalent of Vertica plus the added capability of RDF based data management, Linked Data Deployment, and share-nothing clustering. There is no job that Vertica performs that Virtuoso can’t perform.
There are several jobs that Virtuoso can perform that Vertica, VoltDB, Hadapt, and many other NoSQL and NewSQL simply cannot perform with regards to scalable, high-performance RDF data management and Linked Data deployment. Remember, RDF based Linked Data is all about data management and data access without any kind of platform lock-in. Virtuoso locks you into a value proposition (performance and scale) not the platform itself.
Q15. Do you also benchmark loading trillion of RDF triples? Do you have current benchmark results? How much time does it take to querying them?
Kingsley Uyi Idehen: As per my earlier responses, there is no shortage of benchmark material for Virtuoso.
The benchmarks are also based on realistic platform configurations unlike the RDBMS patterns of the past which compromised the utility of TPC benchmarks.
Q16. In your opinion, what are the main current obstacles for the adoption of Semantic Web technologies in the Enterprise?
Kingsley Uyi Idehen:The only obstacle to Semantic Web technologies in the enterprise lies in better articulation of the value proposition in a manner that reflects the concerns of enterprises. For instance, the non disruptive nature of Semantic Web technologies with regards to all enterprise data integration and virtualization initiatives has to be the focal point.
. — 5-Star Linked Data URIs and Semiotic Triangle
. — what do HTTP URIs Identify?
. — View Source Pattern & Web Bootstrap
. — Unified View of Data using the Entity Relationship Model (Peter Chen’s 1976 dissertation)
. — Serendipitous Discovery Quotient (SDQ).
Kingsley Idehen is the Founder and CEO of OpenLink Software. He is an industry acclaimed technology innovator and entrepreneur in relation to technology and solutions associated with data management systems, integration middleware, open (linked) data, and the semantic web.
Kingsley has been at the helm of OpenLink Software for over 20 years during which he has actively provided dual contributions to OpenLink Software and the industry at large, exemplified by contributions and product deliverables associated with: Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), Object Linking and Embedding (OLE-DB), Active Data Objects based Entity Frameworks (ADO.NET), Object-Relational DBMS technology (exemplified by Virtuoso), Linked (Open) Data (where DBpedia and the LOD cloud are live showcases), and the Semantic Web vision in general.
-ODBMS.org free resources on : Relational Databases, NewSQL, XML Databases, RDF Data Stores
Follow ODBMS Industry Watch on Twitter: @odbmsorg
“A common misconception when virtualizing Hadoop clusters is that we decouple the data nodes from the physical infrastructure. This is not necessarily true. When users virtualize a Hadoop cluster using Project Serengeti, they separate data from compute while preserving data locality. By preserving data locality, we ensure that performance isn’t negatively impacted, or essentially making the infrastructure appear as static.” – Joe Russell.
VMware announced in June last year an open source project called Serengeti.
The main idea of Project Serengeti is to enable users and companies to quickly deploy, manage and scale Apache Hadoop on virtual infrastructure.
I have interviewed Joe Russell, VMware, Product Line Marketing Manager, Big Data.
Q1. Why Virtualize Hadoop?
Joe Russell: Hadoop is a technology that Enterprises are increasingly using to process large amounts of information. While the technology is generally pretty early in its lifecycle, we are starting to see more enterprise-grade use cases. In its current form, Hadoop is difficult to use and lacking the toolsets to efficiently deploy run and manage Hadoop clusters in an Enterprise context. Virtualizing Hadoop not only brings enterprise-tested High Availability and Fault Tolerance to Hadoop, but it also allows for much more agile and automated management of Hadoop clusters. Additionally, virtualization allows for separation of data and compute, which allows users to preserve data locality and paves the way towards more advanced use cases such as mixed workload deployments and Hadoop-as-a-service.
Q2. You claim to be able to deploy an Apache Hadoop cluster (HDFS, MapReduce, Pig, Hive) in minutes on an existing vSphere cluster using Serengeti. How do you do this? Could you give us an example on how you customize a Hadoop Cluster?
You will be able to customize Hadoop clusters easily from within the Serengeti tool by specifying node and resource allocations through an easy to use user interface.
Q3. There are concerns on the approach of decoupling Apache Hadoop nodes from the underlying physical infrastructure. Quoting Steve Loughran (HP Research): “Hadoop contains lots of assumptions about running in a static infrastructure; it’s scheduling and recovery algorithms assume this.” What is your take on this?
Joe Russell: A common misconception when virtualizing Hadoop clusters is that we decouple the data nodes from the physical infrastructure. This is not necessarily true. When users virtualize a Hadoop cluster using Project Serengeti, they separate data from compute while preserving data locality. By preserving data locality, we ensure that performance isn’t negatively impacted, or essentially making the infrastructure appear as static. Additionally, it creates true multi-tenancy within more layers of the Hadoop stack, not just the name node.
I think there is some confusion when we say “in the cloud”. Here, Steve is talking about running it on a public cloud like Amazon. Steve is largely introducing the concept of data locality, or the notion that large amounts of data are hard to move. In this scenario, it makes sense to bring compute resources to the data to ensure performance isn’t negatively impacted by networking limitations. VMware advocates that Hadoop should be virtualized, as it introduces a level of flexibility and management that allows companies to easily deploy, manage, and scale internal Hadoop clusters.
Q4. How do ensure High Availability (HA)? How do you protect against host and VM failures?
Joe Russell: We ensure High Availability (HA) by leveraging vSphere’s tested solution via Project Serengeti’s integration with vCenter (management console of vSphere).
In the event of physical server failure, affected virtual machines are automatically restarted on other production servers with spare capacity. In the case of operating system failure, vSphere HA restarts the affected virtual machine on the same physical server.
In Hadoop nomenclature, this means that there is HA on more than just the name node. vSphere’s solution also allows for HA on the jobtracker node, metastores, and on the management server, which are critical pieces of any Hadoop system that require high availability.
More importantly, as Hadoop is a batch-oriented process, it is important that when a physical host does fail, that you are able to pause and then restart that job from the point in time in which it went down. VMware’s vSphere solution allows for this and has been tested amongst the biggest Enterprises for the better part of the past decade.
Q5. How do you get Data Insights? Do you already have examples how such Virtualize Hadoop is currently used in the Enterprises? If yes, which ones?
Joe Russell: Data Insights occur farther up the stack with analytics vendors.
Project Serengeti is a tool that allows you to run Hadoop ontop of vSphere and is a solution designed to allow users to consolidate Hadoop clusters on a single underlying virtual infrastructure. The tool allows for users to run different types of Hadoop distributions on a hypervisor to gain the benefits of virtualization, which include efficiency, elasticity, and agility.
Q6. Does Serengeti only works with VMware vSphere® platform?
Joe Russell: Project Serengeti today only works with the vSphere hypervisor.
However, VMware made the decision to open source Project Serengeti to make the code available to anyone who wishes to use it.
By making it open source vs. just offering a free closed source product, VMware allows users to take the Serengeti code and alter it for their own purposes. For example, any user could download the Project Serengeti code and alter it to make it work with other hypervisors other than vSphere. While it isn’t in VMware’s interest to dedicate resources to make Project Serengeti run with other hypervisors, it doesn’t prevent users from doing so. This is an important point.
Q7. VMware is working with the Apache Hadoop community to contribute changes to the Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects to make them “virtualization-aware”. What does it mean? What are these changes?
Joe Russell: Hadoop Virtual Extensions (“HVE”) is one example of this. VMware contributed HVE back to the Apache community to make Hadoop distributions virtualization aware. This means inserting a node group layer between the rack and host to make Hadoop distributions topology aware for virtualized platforms. In its simplest terms, this allows for VMware to preserve data locality and increase reliability through the separation of data and compute.
A link to a whitepaper with further detail can be found here (.pdf).
Q8. What about the performance of such “Virtualize” Hadoop? Do you have performance measures to share?
Joe Russell: Please see whitepaper referenced above.
Q9. What is the value of Hadoop-in-cloud? How does it relate to the virtualization of Hadoop?
Joe Russell: I don’t necessarily understand the question and it would be particularly helpful to define what you mean by “Hadoop-in-Cloud”.
I think you may be referring to Hadoop-as-a-Service, which is valuable in that users are able to deliver Hadoop resources to internal users based on need. Centralized control through Hadoop-as-a-Service ensures high cluster utilization, lower TCO, and an agile framework to adjust to ever-changing business needs. As Enterprises increasingly look to service internal customers, I expect Hadoop-as-a-Service to become more popular as the Hadoop technology emerges within the enterprise. Please keep in mind that this relates both to private and public clouds. Virtualizing Hadoop is the first step toward being able to provision Hadoop in the cloud.
Q10. VMware also announced updates to Spring for Apache Hadoop. Could you tell us what are these updates?
Q11 VMware is working with a number of Apache Hadoop distribution vendors (Cloudera, Greenplum, Hortonworks, IBM and MapR ) to support a wide range of distributions. Why? Could you tell us exactly what is VMware contribution?
Joe Russell: VMware is focused on providing a common underlying virtual infrastructure so each of these vendors can run their software better on vSphere. Project Serengeti is a toolset that pre-configures, tunes and makes it easier to deploy and run Hadoop with increased reliability on vSphere. These efforts make it easier for enterprises to make architectural decisions around how to setup Hadoop within their companies. Deciding to virtualize Hadoop can have dramatic effects not only on companies just beginning to use Hadoop, but also on more advanced users of the technology. VMware’s contributions through Project Serengeti allow each of the vendor’s software to run better on virtualized infrastructure. As you know, these contributions are available for anyone to use.
Q12 Serengeti, “Virtualize” Hadoop, Hadoop in the Cloud, Spring for Apache Hadoop: what is the global picture here? How all of these efforts relate to each other? What are the main benefits for developers and users of Apache Hadoop?
Joe Russell: All of these efforts improve the technology and make it easier for developers and users of Hadoop to actually use Hadoop. Additionally, these efforts focus on virtualizing Hadoop to make the technology more elastic, reliable, and performant.
VMware is focused on bringing the benefits of virtualization to Hadoop, both from a community standpoint and a customer standpoint. It has been open in its approach to contributing back technology that makes it easier for users / developers to utilize virtualization for their Hadoop clusters. Conversely, it is investing in bringing Hadoop to its existing customers by making the technology more reliable and building easy to use tools around the technology to make it easier to deploy and administrate in an Enterprise setting with SLAs and business critical workloads.
Joe Russell is responsible for product strategy, GTM, evangelism and product marketing of Big Data at VMware.
He has over a decade of experience in a blend of product marketing, finance, operations, and M&A roles.
Previously he worked for Yahoo!, and as an Investment Banking – Technology M&A Analyst for GCA Savvian, Credit Suisse and Societe Generale.
He holds a MSc, Accounting & Finance from London School of Economics and Political Science, a BS, Economics with Honors from University of Washington, and a MBA from Wharton School, University of Pennsylvania.
ODBMS.org- Lecture Notes: Data Management in the Cloud.
by Michael Grossniklaus, David Maier, Portland State University.
Course Description: “Cloud computing has recently seen a lot of attention from research and industry for applications that can be parallelized on shared-nothing architectures and have a need for elastic scalability. As a consequence, new data management requirements have emerged with multiple solutions to address them. This course will look at the principles behind data management in the cloud as well as discuss actual cloud data management systems that are currently in use or being developed. The topics covered in the courserange from novel data processing paradigms (MapReduce, Scope, DryadLINQ), to commercial cloud data management platforms (Google BigTable, Microsoft Azure, Amazon S3 and Dynamo, Yahoo PNUTS) and open-source NoSQL databases (Cassandra, MongoDB, Neo4J). The world of cloud data management is currently very diverse and heterogeneous. Therefore, our course will also report on efforts to classify, compare and benchmark the various approaches and systems. Students in this course will gain broad knowledge about the current state of the art in cloud data management and, through a course project, practical experience with a specific system.”
Lecture Notes | Intermediate/Advanced | English | LINK TO DOWNLOAD ~280 slides (PDF)| 2011-12|
Follow ODBMS.org on Twitter: @odbmsorg