Skip to content

"Trends and Information on Big Data, New Data Management Technologies, and Innovation."

This is the Industry Watch blog. To see the complete ODBMS.org
website with useful articles, downloads and industry information, please click here.

Oct 28 13

On multi-model databases. Interview with Martin Schönert and Frank Celler.

by Roberto V. Zicari

“We want to prevent a deadlock where the team is forced to switch the technology in the middle of the project because it doesn’t meet the requirements any longer.”–Martin Schönert and Frank Celler.

On “multi-model” databases, I have interviewed Martin Schönert and Frank Celler, founders and creators of the open source ArangoDB.

RVZ

Q1. What is ArangoDB and for what kind of applications is it designed for?

Frank Celler: ArangoDB is a multi-model mostly-memory database with a flexible data model for documents and graphs. It is designed as a “general purpose database”, offering all the features you typically need for modern web applications.

ArangoDB is supposed to grow with the application—the project may start as a simple single-server prototype, nothing you couldn’t do with a relational database equally well. After some time, some geo-location features are needed and a shopping cart requires transactions. ArangoDB’s graph data model is useful for the recommendation system. The smartphone app needs a lean API to the back-end—this is where Foxx, ArangoDB’s integrated Javascript application framework, comes into play.
The overall idea is: “We want to prevent a deadlock where the team is forced to switch the technology in the middle of the project because it doesn’t meet the requirements any longer.”

ArangoDB is open source (Apache 2 licence)—you can get the source code at GitHub or download the precompiled binaries from our website.

Though ArangoDB as a universal approach, there are edge cases where we don’t recommend ArangoDB. Actually, ArangoDB doesn’t compete with massively distributed systems like Cassandra with thousands of nodes and many terabytes of data.

Q2. What’s so special about the ArangoDB data model?

Martin Schönert: ArangoDB is a multi-model database. It stores documents in collections. A specialized binary data file format is used for disk storage. Documents that have similar structure (i.e., that have the same attribute names and attribute types) can share their structural information. The structure (called “shape”) is saved just once, and multiple documents can re-use it by storing just a pointer to their “shape”.
In practice, documents in a collection are likely to be homogenous, and sharing the structure data between multiple documents can greatly reduce disk storage space and memory usage for documents.

Q3. Who is currently using ArangoDB for what?

Frank Celler: ArangoDB is open source. You don’t have to register to download the source code or precompiled binaries. As a user, you can get support via Google Group, GitHub’s issue tracker and even via Twitter. We are very amenable, which is an essential part of the project. The drawback is that we don’t really know what people are doing with ArangoDB in detail. We are noticing an exponentially increasing number of downloads over the last months.
We are aware of a broad range of use cases: a CMS, a high-performance logging component, a geo-coding tool, an annotation system for animations, just to name a few. Other interesting use cases are single page apps or mobile apps via Foxx, ArangoDB’s application framework. Many of our users have in-production experience with other NoSQL databases, especially the leading document stores.

Q4. Could you motivate your design decision to use Google’s V8 JavaScript engine?

Martin Schönert: ArangoDB uses Google’s V8 engine to execute server-side JavaScript functions. Users can write server-side business logic in JavaScript and deploy it in ArangoDB. These so-called “actions” are much like stored procedures living close to the data.
For example, with actions it is possible to perform cascading deletes/updates, assign permissions, and do additional calculations and modifications to the data.
ArangoDB also allows users to map URLs to custom actions, making it usable as an application server that handles client HTTP requests with user-defined business logic.
We opted for Javascript as it meets our requirements for an “embedded language” in the database context:
• Javascript is widely used. Regardless in which “back-end language” web developers write their code, almost everybody can code also in Javascript.
• Javascript is effective and still modern.
Just as well, we chose Google V8, as it is the fastest, most stable Javascript interpreter available for the time being.

Q5. How do you query ArangoDB if you don’t want to use JavaScript?

Frank Celler: ArangoDB offers a couple of options for getting data out of the database. It has a REST interface for CRUD operations and also allows “querying by example”. “Querying by example” means that you create a JSON document with the attributes you are looking for. The database returns all documents which look like the “example document”.
Expressing complex queries as JSON documents can become a tedious task—and it’s almost impossible to support joins following this approach. We wanted a convenient and easy-to-learn way to execute even complex queries, not involving any programming as in an approach based on map/reduce. As ArangoDB supports multiple data models including graphs, it was neither sufficient to stick to SQL nor to simply implement UNQL. We ended up with the “ArangoDB query language” (AQL), a declarative language similar to SQL and Jsoniq. AQL supports joins, graph queries, list iteration, results filtering, results projection, sorting, variables, grouping, aggregate functions, unions, and intersections.
Of course, ArangoDB also offers drivers for all major programming languages. The drivers wrap the mentioned query options following the paradigm of the programming language and/or frameworks like Ruby on Rails.

Q6. How do you perform graph queries? How does this differ from systems such as Neo4J?

Frank Celler: SQL can’t cope with the required semantics to express the relationships between graph nodes, so graph databases have to provide other ways to access the data.
The first option is to write small programs, so called “path traversals.” In ArangoDB, you use Javascript; in neo4j Java, the general approach is very similar.
Programming gives you all the freedom to do whatever comes to your mind. That’s good. For standard use cases, programming might be too much effort. So, both ArangoDB and neo4j offer a declarative language—neo4j has “Cypher,” ArangoDB the “ArangoDB Query Language.” Both also implement the blueprints standard so that you can use “Gremlin” as query-language inside Java. We already mentioned that ArangoDB is a multi-model database: AQL covers documents and graphs, it provides support for joins, lists, variables, and does much more.

The following example is taken from the neo4j website:

“For example, here is a query which finds a user called John in an index and then traverses the graph looking for friends of John’s friends (though not his direct friends) before returning both John and any friends-of-friends that are found.

START john=node:node_auto_index(name = ‘John’)
MATCH john-[:friend]->()-[:friend]->fof
RETURN john, fof ”

The same query looks in AQL like this:

FOR t IN TRAVERSAL(users, friends, “users/john”, “outbound”,
{minDepth: 2}) RETURN t.vertex._key

The result is:
[ "maria", "steve" ]

You see that Cypher describes patterns while AQL describes joins. Internally, ArangoDB has a library of graph functions—those functions return collections of paths and paths or use those collections in a join.

Q7. How did you design ArangoDB to scale out and/or scale up? Please give us some detail.

Martin Schönert: Solid state disks are becoming more and more a commodity hardware. ArangoDB’s append-only design is a perfect fit for such SSD, allowing for data-sets which are much bigger than the main memory but still fit unto a solid state disk.
ArangoDB supports master/slave replication in version 1.4 which will be released in the next days (a beta has been available for some time). On the one hand this provides easy fail-over setups. On the other hand it provides a simple way to scale the read-performance.
Sharding is implemented in version 2.0. This enables you to store even bigger data-sets and increase the write-performance. As noted before, however, we see our main application when scaling to a low number of nodes. We don’t plan to optimize ArangoDB for massive scaling with hundreds of nodes. Plain key/value stores are much more usable in such scenarios.

Q8. What is ArangoDB’s update and delete strategy?

Martin Schönert: ArangoDB versions prior to 1.3 store all revisions of documents in an append-only fashion; the objects will never be overwritten. The latest version of a document is available to the end user.

With the current version 1.3, ArangoDB introduces transactions and sets the technical fundament for replication and sharding. In the course of those highly wanted features comes “real” MVCC with concurrent writes.

In databases implementing an append-only strategy, obsolete versions of a document have to be removed to save space. As we already mentioned, ArangoDB is multi-threaded: The so-called compaction is automatically done in the background in a different thread without blocking reads and writes.

Q9. How does ArangoDB differ from other NoSQL data stores such as Couchbase and MongoDB and graph data stores such as Neo4j, to name a few?

Frank Celler: ArangoDB’s feature scope is driven by the idea to give the developer everything he needs to master typical tasks in a web application—in a convenient and technically sophisticated way alike.
From our point of view it’s the combination of features and quality of the product which accounts for ArangoDB: ArangoDB not only handles documents but also graphs.
ArangoDB is extensible via Javascript and Ruby. Enclosed with ArangoDB you get “Foxx”. Foxx is an integrated application framework ideal for lean back-ends and single page Javascript applications (SPA).
Multi-collection transactions are useful not only for online banking and e-commerce but they become crucial in any web app in a distributed architecture. Here again, we offer the developers many choices. If transactions are needed, developers can use them.
If, on the other hand, the problem requires a higher performance and less transaction-safety, developers are free to ignore multi-collections transactions and to use the standard single-document transactions implemented by most NoSQL databases.
Another unique feature is ArangoDB’s query language AQL—it makes querying powerful and convenient. For simple queries, we offer a simple query-by-example interface. Then again, AQL enables you to describe complex filter conditions and joins in a readable format.

Q10. Could you summarize the main results of your benchmarking tests?

Frank Celler: To quote Jan Lenhardt from CouchDB: “Nosql is not about performance, scaling, dropping ACID or hating SQL—it is about choice. As nosql databases are somewhat different it does not help very much to compare the databases by their throughput and chose the one which is fasted. Instead—the user should carefully think about his overall requirements and weight the different aspects. Massively scalable key/value stores or memory-only system[s] can archive much higher benchmarks. But your aim is [to] provide a much more convenient system for a broader range of use-cases—which is fast enough for almost all cases.”
Anyway, we have done a lot of performance tests and are more than happy with the results. ArangoDB 1.3 inserts up to 140,000 documents per second. We are going to publish the whole test suite including a test runner soon, so everybody can try it out on his own hardware.

We have also tested the space usage: Storing 3.5 millions AQL search queries takes about 200 MB in MongoDB with pre-allocation compared to 55 MB in ArangoDB. This is the benefit of implementing the concept of shapes.

Q11. ArangoDB is open source. How do you motivate and involve the open source development community to contribute to your projects rather than any other open source NoSQL?

Frank Celler: To be honest: The contributors come of their own volition and until now we didn’t have to “push” interested parties. Obviously, ArangoDB is fascinating enough, even though there are more than 150 NoSQL databases available to choose from.

It all started when Ruby inventor Yukihiro “Matz” Matsumoto tweeted on ArangoDB and recommended it to the community. Following this tweet, ArangoDB’s first local fan base was established in Japan—and we learned a lot about the limits of automatic translation from Japanese tweets to English and the other way around ;-).

In our daily “work” with our community, we try to be as open and supportive as possible. The core developers communicate directly and within short response times with people having ideas or needing help through Google Groups or GitHub. We take care of a community, especially for contributors, where we discuss future features and inform about upcoming changes early so that API contributors can keep their implementations up to date.

——————————
Martin Schönert
Martin is the origin of many fancy ideas in ArangoDB. As chief architect he is responsible for the overall architecture of the system, bringing in his experience from more than 20 years in IT as developer, architect, project manager and entrepreneur.
Martin started his career as scientist at the technical university of Aachen after earning his degree in Mathematics. Later he worked as head of product development (Team4 Systemhaus), Director IT (OnVista Technologies) and head of division at Deutsche Post.
Martin has been working with relational and non-relations databases (e.g. a torrid love-hate relationsship with the granddaddy of all non-relational databases: Lotus Notes) for the largest part of his professional life.
When no database did what he needed he also wrote his own, one for extremely high update rate and the other for distributed caching
.

Frank Celler
Frank is both entrepreneur and backend developer, developing mostly memory databases for two decades. He is the lead developer of ArangoDB and co-founder of triAGENS. Besides Frank organizes Cologne’s NoSQL user group, NoSQL conferences and is speaking at developer conferences.
Frank studied in Aachen and London and received a PHD in Mathematics. Prior to founding triAGENS, the company behind ArangoDB, he worked for several German tech companies as consultant, team lead and developer.
His technical focus is C and C++, recently he gained some experience with Ruby when integrating Mruby into ArangoDB.

Resources

- The stable version (1.3 branch) of ArangoDB can be downloaded here.
- ArangoDB on Twitter
- ArangoDB Google Group
- ArangoDB questions on StackOverflow
- Issue Tracker at Github

Related Posts

- On geo-distributed data management — Interview with Adam Abrevaya. October 19, 2013

- On Big Data and NoSQL. Interview with Renat Khasanshyn. October 7, 2013

- On NoSQL. Interview with Rick Cattell. August 19, 2013

- On Big Graph Data. August 6, 2012

Follow ODBMS.org on Twitter: @odbmsorg

##

Oct 19 13

On geo-distributed data management — Interview with Adam Abrevaya.

by Roberto V. Zicari

“Geo-distribution is the ability to distribute a single, logical SQL/ACID database that delivers transactional consistency across multiple datacenters, cloud provider regions, or a hybrid” — Adam Abrevaya.

I have interviewed Adam Abrevaya, Vice President of Engineering, NuoDB.

RVZ

Q1. You just launched NuoDB 2.0, what is special about it?

Adam Abrevaya: NuoDB Blackbirds Release 2.0 demonstrates a strong implementation of the NuoDB vision. It includes over 200 new features and improvements, making it even more stable and reliable than previous versions.
We have improved migration tools; included Java stored procedures; are introducing powerful automated administration; made enhancements to core geo-distribution functionality and more.

Q2. You offer a feature called geo-distribution. What is it and why is it useful?

Adam Abrevaya: Geo-distribution is the ability to distribute a single, logical SQL/ACID database that delivers transactional consistency across multiple datacenters, cloud provider regions, or a hybrid.

NuoDB’s geo-distributed data management lets customers build an active/active, highly-responsive database for high availability and low latency. By bringing the database closer to the end user, we can enable faster responses while simultaneously eliminating the time spent on complex tasks like replication, backup and recovery schemes.

One of the most exciting aspects of the Release 2.0 launch was the discussion about a major deployment of NuoDB Geo-Distribution by a customer. We were very excited to include Cameron Weeks, CEO and Co-Founder of Fathom Voice, talking about the challenges his company was facing—both managing his existing business and cost-effectively expanding globally. After a lengthy evaluation of alternative technologies, he found NuoDB’s distributed database is the only one that met his needs.

Q3. NuoDB falls broadly into the category of NewSQL databases, but you say that you are also a distributed database and that your architecture is fundamentally different than other databases out there. What’s different about it?

Adam Abrevaya: Yes, we are a NewSQL database and we offer the scale-out performance typically associated with NoSQL solutions, while still maintaining the safety and familiarity of SQL and ACID guarantees.

Our architecture, envisioned by renowned data scientist, Jim Starkey, is based on what we call “On-demand Replication”. We have an architecture whitepaper (registration required) which provides all the technical differentiators of our approach.

Q4. NuoDB is SQL compliant, and you claim that it scales elastically. But how do you handle complex join operations on data sets that are geographically distributed and at the same time scale (in) (out)?

Adam Abrevaya: NuoDB can have transactions that work against completely different on-demand caches.
For example, you can have OLTP transactions running in 9 Amazon AWS regions, each working on a subset of the overall database. Separately, there can be on-demand caches that can be dedicated to queries across the entire data set. NuoDB manages these on-demand ACID-compliant caches with very different use cases automatically without impact to the critical end user OLTP operations.

Q5. What is special about NuoDB with respect to availability? Several other NoSQL data stores are also resilient to infrastructure and partition failures.

Adam Abrevaya: First off, NuoDB offers a distributed SQL database system that provides all the ACID guarantees you expect from a relational database. We scale out like NoSQL databases, and offer support for handling independent failures at each level of our architecture. Redundant processes take over for failed processes (due to machine or other failures) and we make it easy for new machines and process to be brought online and added to the overall database dynamically. Applications that make use of the typical facilities when building an enterprise application will automatically reconnect to surviving processes in our system. We can detect network partition failures and allow the application to take appropriate measures.

Q6 How are some of your customers using NuoDB?

Adam Abrevaya: We are seeing a number of common uses of NuoDB among our customers. These range from startups building new web-facing solutions, to geo-distributed SaaS applications, to ISVs moving existing apps to the cloud, to all sorts of other apps that hit the performance wall with MySQL and other traditional DBMS. Ultimately, with lots of replication, sharding, new server hardware, etc., customers can use traditional databases to scale out or up but at a very high cost in terms of both time, money and usually by giving up transactional guarantees. One customer said he decided to look at alternatives to MySQL just because he was spending so much time in meetings talking about how to get it to do what they needed it to do. He added up the cost of the man-hours and he said “migrate.”

As I mentioned already, Fathom Voice, a SaaS provider offering VoIP, conference bridging, receptionist services and some innovative communications apps, had a global deployment challenge. How to get the database near their globe trotting customers; reduce latency and ensure redundancy. They are one of many customers and prospects tackling these issues.

———————-
Adam Abrevaya, Vice President of Engineering, NuoDB
Adam has been building and managing world-class engineering teams and products for almost two decades. His passion is around building and delivering high-performance core infrastructure products that companies depend on to build their businesses.

Adam started his career at MIT Lincoln Laboratory where he developed a distributed platform and image processing algorithms for detecting dangerous weather patterns in radar images. The system was deployed at several airports around the country.

From there, Adam joined Object Design and held various senior management positions where he was responsible for managing several major releases of ObjectStore (an Object database) along with spearheading the development team building XML products that included: Stylus Studio, an XML database, and a Business Process Manager.

Adam joined Pantero Corporation as VP of Development where he developed a revolutionary Semantic Data Integration product. Pantero was eventually sold to Progress Software.

From Pantero, Adam joined m-Qube to manage and build the team creating its Mobile Messaging Gateway platform. The m-Qube platform is a carrier grade product that has become the leading Mobile Messaging Gateway in North America and generated billions of dollars in revenue. Adam continued managing the mQube platform along with expanded roles after acquisitions of the technology from VeriSign and Mobile Messenger.

———

Related Posts

- On Big Data and NoSQL. Interview with Renat Khasanshyn. October 7, 2013

- On NoSQL. Interview with Rick Cattell. August 19, 2013

Resources

- Download NuoDB Pro Edition (Registration required) (NuoDB Blackbirds Release 2.0)

-ODBMS.org free resources on
Relational Databases, NewSQL, XML Databases, RDF Data Stores:
Blog Posts |Free Software | Articles and Presentations| Lecture Notes | Tutorials| Journals |

Follow ODBMS.org on Twitter: @odbmsorg

##

Oct 7 13

On Big Data and NoSQL. Interview with Renat Khasanshyn.

by Roberto V. Zicari

“The most important thing is to focus on a task you need to solve instead of a technology” –Renat Khasanshyn.

I have interviewed Renat Khasanshyn, Founder and Chairman of Altoros.
Renat is a NoSQL and Big Data specialist.

RVZ

Q1. In your opinion, what are the most popular players in the NoSQL market?

Khasanshyn: I think, MongoDB is definitely one of the key players of the NoSQL market. This database has a long history, I mean for this kind of products, and a good commercial support. For many people this database became the first mass market NoSQL store. I can assume that MongoDB is going to become something like MySQL for the field of relational databases. The second position I would give to Cassandra. It has a great architecture and enables building clusters with geographically dispersed nodes. For me it seems absolutely amazing. In addition, this database is often chosen by big companies that need a large highly available cluster.

Q2. How do you evaluate and compare different NoSQL Databases?

Khasanshyn: Thank you for an interesting question. How to choose a database? Which one is the best? These are the main questions for any company that wants to try a NoSQL solution. Definitely, for some cases it may be quite easy to select a suitable NoSQL store. However, very often it is not enough just to know customer’s business goals. When we suggest a particular database we take into consideration the following factors: the business issues a NoSQL store should solve, database reading/writing speed, availability, scalability, and many other important indicators. Sometimes we use a hybrid solution that may include several NoSQL databases.
Or we can see that a relational database will be a good match for a case. The most important thing is to focus on a task you need to solve instead of a technology.

We think that a good scalability, performance, and ease of administration are the most important criteria for choosing a NoSQL database. These are the key factors that we take into consideration. Of course, there are some additional criteria that sometimes may be even more important than those mentioned above. To somehow simplify a choice of a database for our engineers and for many other people, for two years, we carry out independent tests that evaluate performance of different NoSQL databases. Although aimed at comparing performance, these investigations also touch consistency, scalability, and configuration issues. You can take a look at our most recent article on this subject on Network World. Some new researches on this subject are to be published in a couple of months.

Q3. Which NoSQL databases did you evaluate so far, and what are the results did you obtain?

Khasanshyn: We used a great variety of NoSQL databases, for instance, Cassandra, HBase, MongoDB, Riak, Couchbase Server, Redis, etc. for our researches and real-life projects. From this experience, we learned that one should be very careful when choosing a database. It is better to spend more time on architecture design and make some changes to the project in the beginning rather than come across a serious issue in the future.

Q4. For which projects did use NoSQL databases and for what kind of problems?

Khasanshyn: It is hardly possible to name a project for which a NoSQL database would be useless, except for a blog or a home page. As the main use cases for NoSQL stores I would mention the following tasks:

● collecting and analyzing large volumes of data
● scaling large historical databases
● building interactive applications for which performance and fast response time to users’ actions are crucial

The major “drawback” of NoSQL architecture is the absence of ACID engine that provides a verification of transaction. It means that financial operations or user registration should be better performed by RDBMS like Oracle or MS SQL. However, absence of ACID allows for significant acceleration and decentralization of NoSQL databases which are their major advantages. The bottom line, non-relational databases are much faster in general, and they pay for it with a fraction of their reliability. Is it a good tradeoff? Depends on the task.

Q5. What do you see happening with NoSQL, going forward?

Khasanshyn: It’s quite difficult to make any predictions, but we guess that NoSQL and relational databases will become closer. For instance, NewSQL solutions took good scalability from NoSQL and a query language from the SQL products.
Probably, a kind of a standard query language based on SQL or SQL-like language will soon appear for NoSQL stores. We are also looking forward to improved consistency, or to be more precise, better predictability of NoSQL databases. In other words, NoSQL solutions should become more mature. We will also see some market consolidation. Smaller players will form alliances or quit the game. Leaders will take bigger part of the market. We will most likely see a couple of acquisitions. Overall, it will be easier to work with NoSQL and to choose a right solution out of the available options.

Q6. What what do you think is still needed for big data analytics to be really useful for the enterprise?

Khasanshyn: It is all about people and their skills. Storage is relatively cheap and available. Variety of databases is enormous and it helps solving virtually any task. Hadoop is stable. Majority of software is open source-based, or at least doesn’t cost a fortune. But all these components are useless without data scientists who can do modeling and simulations on the existing data. As well as without engineers who can efficiently employ the toolset. As well as without managers who understand the outcomes of data handling revolution that happened just recently. When we have these three types of people in place, then we will say that enterprises are fully equipped for making an edge in big data analytics.

Q7. Hadoop is still quite new for many enterprises, and different enterprises are at different stages in their Hadoop journey. When you speak with your customers what are the typical use cases and requirements they have?

Khasanshyn: I agree with you. Some customers just make their first steps with Hadoop while others need to know how to optimize their Hadoop-based systems. Unfortunately, the second group of customers is much smaller. I can name the following typical tasks our customers have:

● To process historical data that has been collected for a long period of time. Some time ago, users were unable to process large volumes of unstructured data due to some financial and technical limitations. Now Hadoop can do it at a moderate cost and for reasonable time.

● To develop a system for data analysis based on Hadoop. Once an hour, the system builds patterns on a typical user behavior on a Web site. These patterns help to react to users’ actions in the real-time mode, for instance, allow doing something or temporary block some actions because they are not typical of this user. The data is collected continuously and is analyzed at the same time. So, the system can rapidly respond to the changes in the user behavior.

● To optimize data storage. It is interesting that in some cases HDFS can replace a database, especially when the database was used for storing raw input data. Such projects do not need an additional database level.

I should say that our customers have similar requirements. Apart from solving a particular business task, they need a certain level of performance and data consistency.

Q8. In your opinion is Hadoop replacing the role of OLAP (online analytical processing) in preparing data to answer specific questions?

Khasanshyn: In a few words, my answer is yes. Some specific features of Hadoop enable to prepare data for future analysis fast and at a moderate cost. In addition, this technology can work with unstructured data. However, I do not think it will happen very soon. There are many OLAP systems and they solve their tasks, doing it better or worse. In general enterprises are very reluctant to change something. In addition, replacing the OLAP tools requires additional investments. The good news is that we don’t have to choose one or another. Hadoop can be used as a pre-processor of the data for OLAP analysis. And analysts can work with the familiar tools.

Q9. How do you categorize the various stages of the Hadoop usage in the enterprises?

Khasanshyn: I would name the following stages of Hadoop usage:

1. Development of prototypes to check out whether Hadoop is suitable for their tasks
2. Using Hadoop in combination with other tools for storing and processing data of some business units
3. Implementation of a centralized enterprise data storage system and gradual integration of all business units into it

Q10. Data Warehouse vs Big “Data Lake”. What are the similarities and what are the differences?

Khasanshyn: Even though Big “Data Lake” is a metaphor, I do not really like it.
This name highlights that it is something limited, isolated from other systems. I would better call this concept a “Big Data Ocean”, because the data added to the system can interact with the surrounding systems. In my opinion, data warehouses are a bit outdated. At the earlier stage, such systems enabled companies to aggregate a lot of data in one central storage and arrange this data. All this was done with the acceptable speed. Now there are a lot of cheap storage solutions, so we can keep enormous volumes of data and process it much faster than with data warehouses.

The possibility to store large volumes of data is a common feature of data warehouses and a Big “Data Lake”. With a flexible architectures and broad capabilities for data analysis and discovery, a Big “Data Lake” provides a wider range of business opportunities. A modern company should adjust to changes very fast. The structure that was good yesterday may become a limitation today.

Q11. In your opinion, is there a technology which is best suited to build a Big Data Analytics Data Platform? If yes, which one?

Khasanshyn: As I have already said, there is no “magic bullet” that can cure every disease. There is no “Universal Big Data Analytics Data Platform” fitting each size; everything depends on the requirements of a particular business. A system of this kind should have the following features:

● A Hadoop-based system for storing and processing data
● An operational database that contains the most recent data, it can be raw data, that should be analyzed.
A NoSQL solution can be used for this case.
● A database for storing predicted indicators. Most probably, it should be a relational data store.
● A system that allows for creating data analysis algorithms. The R language can be used to complete this task.
● A report building system that provides access to data. For instance, there such good options like Tableau or Pentaho.

Q12. What about elastic computing in the Cloud? How does it relate to Big Data Analytics?

Khasanshyn: In my opinion, cloud computing became the force that raised a Big Data wave. Elastic computing enabled us to use the amount of computation resources we need and also reduced the cost of maintaining a large infrastructure.

There is a connection between elastic computing and big data analytics. For us it is quite a typical case that we have to process data from time to time, for instance, once a day. To solve this task, we can deploy a new cluster or just scale up an existing Hadoop cluster in the cloud environment. We can temporary increase the speed of data processing by scaling a Hadoop cluster. The task will be executed faster and after that we can stop the Hadoop cluster or reduce its size. I can even say, that cloud technologies is a must have component for a Big Data analysis system.

——-

Renat Khasanshyn, Founder and Chairman, Altoros.
Renat is founder & CEO of Altoros, and Venture Partner at Runa Capital. Renat helps define Altoros’s strategic vision, and its role in Big Data, Cloud Computing, and PaaS ecosystem. Renat is a frequent conference and meetup speaker on this topic.
Under his supervision Altoros has been servicing such innovative companies as Couchbase, RightScale, Canonical, DataStax, Joyent, Nephoscale, and NuoDB.
In the past, Renat has been selected as finalist for the Emerging Executive of the Year award by the Massachusetts Technology Leadership Council and once won an IBM Business Mashup Challenge. Prior to founding Altoros, Renat was VP of Engineering for Tampa-based insurance company PriMed. Renat is also founder of Apatar, an open source data integration toolset, founder of Silicon Valley NewSQL User Group and co-founder of the Belarusian Java User Group.

——————–
Related Posts

- On NoSQL. Interview with Rick Cattell. August 19, 2013

- On Oracle NoSQL Database –Interview with Dave Segleau. July 2, 2013

- On Big Data and Hadoop. Interview with Paul C. Zikopoulos. June 10, 2013
————————-

Resources

- Evaluating NoSQL Performance, Sergey Bushik, Altoros (Slideshare)

- “A Vendor-independent Comparison of NoSQL Databases: Cassandra, HBase, MongoDB, Riak”. Altoros (registration required)

- ODBMS.org free resources on Big Data and Analytical Data Platforms:
Blog Posts | Free Software| Articles | Lecture Notes | PhD and Master Thesis|

- ODBMS.org free resources on NoSQL Data Stores:
Blog Posts | Free Software | Articles, Papers, Presentations| Documentations, Tutorials, Lecture Notes | PhD and Master Thesis

- ODBMS.org free resources on Cloud Data Stores:
Blog Posts | Lecture Notes| Articles and Presentations| PhD and Master Thesis|

———————–

Follow ODBMS.org on Twitter: @odbmsorg

##

Sep 23 13

Data Analytics at NBCUniversal. Interview with Matthew Eric Bassett.

by Roberto V. Zicari

“The most valuable thing I’ve learned in this role is that judicious use of a little bit of knowledge can go a long way. I’ve seen colleagues and other companies get caught up in the “Big Data” craze by spend hundreds of thousands of pounds sterling on a Hadoop cluster that sees a few megabytes a month. But the most successful initiatives I’ve seen treat it as another tool and keep an eye out for valuable problems that they can solve.” –Matthew Eric Bassett.

I have interviewed Matthew Eric Bassett, Director of Data Science for NBCUniversal International.
NBCUniversal is one of the world’s leading media and entertainment companies in the development, production, and marketing of entertainment, news, and information to a global audience.
RVZ

Q1. What is your current activity at Universal?

Bassett: I’m the Director of Data Science for NBCUniversal International. I lead a small but highly effective predictive analytics team. I’m also a “data evangelist”; I spend quite a bit of my time helping other business units realize they can find business value from sharing and analyzing their data sources.

Q2. Do you use Data Analytics at Universal and for what?

Bassett: We predict key metrics for the different businesses – everything from television ratings, to how an audience will respond to marketing campaigns, to the value of a particular opening weekend for the box office. To do this, we use machine learning regression and classification algorithms, semantic analysis, monte-carlo methods, and simulations.

Q3. Do you have Big Data at Universal? Could you pls give us some examples of Big Data Use Cases at Universal?

Bassett: We’re not working with terabyte-scale data sources. “Big data” for us often means messy or incomplete data.
For instance, our cinema distribution company operates in dozens of countries. For each day in each one, we need to know how much money was spent and by whom -and feed this information into our machine-learning simulations for future predictions.
Each country might have dozens more cinema operators, all sending data in different formats and at different qualities. One territory may neglect demographics, another might mis-report gross revenue. In order for us to use it, we have to find missing or incorrect data and set the appropriate flags in our models and reports for later.

Automating this process is the bulk of our Big Data operation.

Q4. What “value” can be derived by analyzing Big Data at Universal?

Bassett: “Big data” helps everything from marketing, to distribution, to planning.
“In marketing, we know we’re wasting half our money. The problem is that we don’t know which half.” Big data is helping us solve that age-old marketing problem.
We’re able to track how the market is responding to our advertising campaigns over time, and compare it to past campaigns and products, and use that information to more precisely reach our audience (a bit how the Obama campaign was able to use big data to optimize its strategy).

In cinema alone, the opening weekend of a film can affect gross revenue by seven figures (or more), so any insight we can provide into the most optimal time can directly generate thousands or millions of dollars in revenue.

Being able to distill “big data” from historical information, audiences responses in social media, data from commercial operators, et cetera, into a useable and interactive simulation completely changes how we plan our strategy for the next 6-15 months.

Q5. What are the main challenges for big data analytics at Universal ?

Bassett: Internationalization, adoption, and speed.
We’re an international operation, so we need to extend our results from one country to another.
Some territories have a high correlation between our data mining operation and the metrics we want to predict. But when we extend to other territories we have several issues.
For instance, 1) it’s not as easy for us to do data mining on unstructured linguistic data (like audience’s comments on a youtube preview) and 2) User-generated and web analytics data is harder to find (and in some cases nonexistent!) in some of our markets, even if we did have a multi-language data mining capability. Less reliable regions, send us incoming data or historicals that are erroneous, incomplete, or simply not there – see my comment about “messy data”.

Reliability with internationalization feeds into another issue – we’re in an industry that historically uses qualitative and not quantitative processes. It takes quite a bit of “evangelicalism” to convince people what is possible with a bit of statistics and programming, and even after we’ve created a tool for a business, it takes some time for all the key players to trust and use it consistently.

A big part of accomplishing that is ensuring that our simulations and predictions happen fast.
Naturally, our systems need to be able to respond to market changes (a competing film studio changes a release date, an event in the news changes television ratings, et cetera) and inform people what happens.
But we need to give researchers and industry analysts feedback instantly – even while the underlying market is static – to keep them engaged. We’re often asking ourselves questions like “how can we make this report faster” or “how can we speed up this script that pulls audience info from a pdf”.

Q6. How do you handle the Big Data Analytics “process” challenges with deriving insight?
For example when:

  • -capturing data
  • -aligning data from different sources (e.g., resolving when two objects are the same)
  • -transforming the data into a form suitable for analysis
  • -modeling it, whether mathematically, or through some form of simulation
  • -understanding the output
  • -visualizing and sharing the results

Bassett: We start with the insight in mind: What blind-spots do our businesses have, what questions are they trying to answer and how should that answer be presented? Our process begins with the key business leaders and figuring out what problems they have – often when they don’t yet know there’s a problem.

Then we start our feature selection, and identify which sources of data will help achieve our end goal – sometimes a different business unit has it sitting in a silo and we need to convince them to share, sometimes we have to build a system to crawl the web to find and collect it.
Once we have some idea of what we want, we start brainstorming about the right methods and algorithms we should use to reveal useful information: Should we cluster across a multi-variate time series of market response per demographic and use that as an input for a regression model? Can we reliably get a quantitative measure of a demographics engagement from sentiment analysis on comments? This is an iterative process, and we spend quite a bit of time in the “capturing data/transforming the data” step.
But it’s where all the fun is, and it’s not as hard as it sounds: typically, the most basic scientific methods are sufficient to capture 90% of the business value, so long as you can figure out when and where to apply it and where the edge cases lie.

Finally, we have an another excited stage: find surprising insight from the results.
You might start by trying to get a metric for risk in cinema, and you might find a metric for how the risk changes for releases that target a specific audience in the process – and this new method might work for a different business.

Q7. What kind of data management technologies do you use? What is your experience in using them? Do you handle un-structured data? If yes, how?

Bassett: For our structured, relational data, we make heavy use of MySQL. Despite collecting and analyzing a great deal of un-structured data, we haven’t invested much in a NoSQL or related infrastructure. Rather, we store and organize such data as raw files on Amazon’s S3 – it might be dirty, but we can easily mount and inspect file systems, use our Bash kung-fu, and pass S3 buckets to Hadoop/Elastic MapReduce.

Q8. Do you use Hadoop? If yes, what is your experience with Hadoop so far?

Bassett: Yes, we sometimes use Hadoop for that “learning step” I described earlier, as well as batch jobs for data mining on collected information. However, our experience is limited to Amazon’s Elastic MapReduce, which makes the whole process quite simple – we literally write our map and reduce procedures (in whatever language we chose), tell Amazon where to find the code and the data, and grab some coffee while we wait for the results.

Q9. Hadoop is a batch processing system. How do you handle Big Data Analytics in real time (if any)?

Bassett: We don’t do any real-time analytics…yet. Thus far, we’ve created a lot of value from simulations that responds to changing marketing information.

Q10 Cloud computing and open source: Do you they play a role at Universal? If yes, how?

Bassett: Yes, cloud computing and open source play a major role in all our projects: our whole operation makes extensive use of Amazon’s EC2 and Elastic MapReduce for simulation and data mining, and S3 for data storage.

We’re big believers in functional programming – many projects start with “experimental programming” in Racket (a dialect of the Lisp programming
language) and often stay there into production.

Additionally, we take advantage of the thriving Python community for computational statistics: Ipython notebook, NumPy, SciPi, NLTK, et cetera.

Q11 What are the main research challenges ahead? And what are the main business challenges ahead?

Bassett: I alluded to some already previously: collecting and analyzing multi-lingual data, promoting the use of predictive analytics, and making things fast.

Recruiting top talent is frequently a discussion among my colleagues, but we’ve been quite fortunate in this regards. (And we devote a great deal of time in training for machine learning and big data methods.)

Qx Anything else you wish to add?

Bassett: The most valuable thing I’ve learned in this role is that judicious use of a little bit of knowledge can go a long way. I’ve seen colleagues and other companies get caught up in the “Big Data” craze by spend hundreds of thousands of pounds sterling on a Hadoop cluster that sees a few megabytes a month. But the most successful initiatives I’ve seen treat it as another tool and keep an eye out for valuable problems that they can solve.

Thanks!

—–

Matthew Eric Bassett -Director of Data Science, NBCUniversal International
Matthew Eric Bassett is a programmer and mathematician from Colorado and started his career there building web and database applications for public and non-profit clients. He moved to London in 2007 and worked as a consultant for startups and small businesses. In 2011, he joined Universal Pictures to work on a system to quantify risk in the international box office market, which led to his current position leading a predictive analytics “restructuring” of NBCUniversal International.
Matthew holds an MSci in Mathematics and Theoretical Physics from UCL and is currently pursuing a PhD in Noncommutative Geometry from Queen Mary, University of London, where he is discovering interesting, if useless, applications of his field to number theory and machine learning.

Resources

- How Did Big Data Help Obama Campaign? (Video Bloomberg TV)

- Google’s Eric Schmidt Invests in Obama’s Big Data Brains (Bloomberg Businessweek Technology)

- Cloud Data Stores – Lecture Notes: “Data Management in the Cloud”. Michael Grossniklaus, David Maier, Portland State University.
Lecture Notes | Intermediate/Advanced | English | DOWNLOAD ~280 slides (PDF)| 2011-12|

Related Posts

- Big Data from Space: the “Herschel” telescope. August 2, 2013

- Cloud based hotel management– Interview with Keith Gruen July 25, 2013

- On Big Data and Hadoop. Interview with Paul C. Zikopoulos. June 10, 2013

Follow ODBMS.org on Twitter: @odbmsorg

##

Sep 1 13

On Linked Data. Interview with John Goodwin.

by Roberto V. Zicari

“Semantic technologies may be unfamiliar, but when you have used them for a while you will realise they are no harder than many other technologies…in fact I would argue they are easier.”– John Goodwin.

On the topics of Semantic web technologies, ontology engineering, and linked data, I have interviewed John Goodwin. John is Principal Scientist in the Research Department of Ordnance Survey, which is Great Britain’s National Mapping Authority.

RVZ

Q1. You are a senior data scientist at the Ordnance Survey, Great Britain’s national mapping agency. What is your role there?

John Goodwin: I am a Principal Scientist in the Research Department of Ordnance Survey, which is Great Britain’s National Mapping Authority [note: we are authorities now...not agencies].

I have worked in research for Ordnance Survey for over 10 years now, and my research was mainly focused in semantic web technologies, ontology engineering and linked data. The Principal Scientist role is a fairly new one for me, and as part of this role I am now responsibly for a stream of research work around data management, data delivery and web services. This involves looking at new and novel technologies that ensure we have the correct infrastructure and data models to meet the challenges of the future. Furthermore, it is investigating new ways we can serve our our data to the end customer.

Q2. Do you have a Big Data problem at the at the Ordnance Survey? Could you please give us some examples of Big Data Use Cases?

John Goodwin:Hmmm, that is debatable. Ordnance Survey certainly has big ‘data problems’ but I don’t know if they qualify as ‘big data’ problems. I have heard Big Data defined as any data that won’t fit into Excel (which is a definition I personally hate), and if that is the case then we certainly have ‘Big Data’. Ordnance Survey currently stores information about half a billion topographic features, and 27.5 million geocoded address (with around 500,000 changes a year). So we may not have the sheer volume of data that some folk have, but I believe the combination of volume and complexity means that performing analysis over this data or running queries would certainly be a ‘Big Data’ problem.
For example, if you wanted to calculate the number of postboxes in Scotland, find the length of all roads in Great Britain you could be waiting some time using traditional database solutions.

Q3. The vision of the Semantic Web is the one where web pages contain self describing data that machines will be able to navigate them as easily as humans do now. What are the main benefits? Who could profit most from the Semantics Web?

John Goodwin: I think an immediate benefit is the ability to provide more structured data to search engines so that they can provide better search services. Structured web content means more meaningful search results and offers new ways to summarise and present summaries of pages in a search engine.

Q4. Who is currently using Semantic Web technologies and how? Could you please give us some examples of current commercial projects?

John Goodwin: One interesting example is a company called Garlik (now part of Experian). Garlik provides services to protect people from identify theft and financial fraud. They use semantic web technologies to integrate a number of different datasets, and provide a flexible way to integrate new datasets so they can perform queries across these datasets to find potential victims more easily. The BBC are a great users of linked data technology and used triplestore technologies as part of their content management systems for their World Cup and Olympics websites. Again the flexibility of the technology, and ability to link data across the whole of the BBC proved invaluable.

We are using linked data technologies at Ordnance Survey in research projects to look at way of integrating our data with third party data.

The major search engines are back an initiative called schema.org which will provide a unified schema for structure data in web content, and this has the potential (as mentioned above) to provide a richer search experience.

Q5. Do you use Linked Data? What are the main benefits of Linked Data in your opinion?

John Goodwin: I am a big supporter of linked data, and this has been the focus of my work for the last few years. I have used it in research projects and also produced the Ordnance Survey linked data.

Linked data is great for data integration – a common data language makes it easy (or rather easier) to bring a number of disparate datasets together. It is also more flexible than traditional relational database technologies. Like other NoSQL technologies linked data can be seen as ‘schemaless’ to some extent. This means if you want to change the datamode by, say, adding new attributes or properties it is very easy to do so. Furthermore, and this is a more personal thing, I find graphs to be the most natural way to think about data. It feel far more intuitive and I have to say I think querying graph data using SPARQL is far easier than querying relation data using SQL (especially if you have a lot of joins).

Q6. What data management technologies are best suited to model and query Linked Open Data?

John Goodwin: Linked Open Data is built around W3C standards such as RDF (resource description frame) – which is the data language of choice on the linked data web (although some people like to debate whether or not RDF is needed for linked data). RDF is to the web of data as HTML is to the web of documents…or at least that is how I see it. RDF has its own query language called SPARQL. A large number of programming libraries (e.g. Jena) are emerging to handle RDF. Furthermore, RDF can be stored in databases called triplestores and there are many triplestores to choose from. I am not in a position to advocate one triplestore over another but there are a large number of great technologies being developed by SMEs and more traditional database vendors alike. Furthermore, there are a number of open source options. We have experimented with a number of them at Ordnance Survey.

Q7. How do you integrate data from different sources that are not in Linked Open Data format (e.g. relational, raw data, etc.)?

John Goodwin: So far by converting the underlying data to RDF. Most relational data is a simple script away from being RDF. Tools do exist to help ‘triplify’ the data, but if I am honest I find that most of the time it is easier to write a quick Python script to do the job.
OpenRefine is a useful tool that lets you clean up csv data and has a plugin that allow export of data to RDF. OpenRefine additionally has the benefit of being able to work with reconciliation APIs. If a linked data site offers a reconciliation API you can use it with OpenRefine to, for example, convert a column of cities names or postcodes to URIs in the Ordnance Survey linked data. This is useful when you need to create explicit links to other datasets. For example, if you had a spreadsheet with place names like ‘City of Southampton’ you could use OpenRefine and the Ordnance Survey linked data reconciliation API to turn ‘City of Southampton’ into its URI.

Q8. What are the most promising application domains where you can apply RDF triple store technology such as AllegroGraph and Virtuoso?

John Goodwin: I think any domain where you either want to integrate lots of disparate datasets or you want a data model that is flexible, and where schema evolution might be a problem. I think geospatial is a promising domain as ‘everything happens somewhere’ and location provide a useful integration hub for many datasets. Semantic web technologies have also been used widely in the bioinformatics domain. The BBC are another great usecase – they use the technology to integrate data across their whole enterprise. This brings together data from news, radio, sport, television and music and allows new and exciting ways to explore the data.
I dare say it is also a technology that will prove useful/interesting to certain three letter American government agencies that have made the news recently :)

Q9. Do you use Data Analytics at the Ordnance Survey and for what?

John Goodwin: I would say currently we don’t really – thought it depends what you mean by analytics. We are largely concerned with collecting and maintaining data, and then shipping this out as products and services. We have experimented with an IBM® Netezza® appliance to perform queries over our data that would have taken too much time in traditional databases to answer questions such as ‘how many post boxes are there in Great Britain?’.

Q10. Can you do data analytics using Linked Open Data? If yes how?

John Goodwin: I think again it depends what is meant by analytics. Linked data offers a great way to bring lots of datasets together and then, maybe, materialise a view of those integrated datasets that could then be used to perform some analytics. Many people are doing ‘graph analytics’, and given that linked data is a graph I think there is some interesting work to be done in looking at the intersections of graph/network theory and linked data.

Q11. What are the main current obstacles for the adoption of Semantic Web technologies in the Enterprise?

John Goodwin: I think two main obstacles. The first one is a perception that RDF and linked data are hard, and somehow we need to overcome with perception. Lots of things in the ICT domain are hard…RDBMS is hard, C++ is hard etc. Semantic technologies may be unfamiliar, but when you have used them for a while you will realise they are no harder than many other technologies…in fact I would argue they are easier. I know a lot of developers who have moved onto using SPARQL and after a few months using it find it much easier to understand that SQL. Furthermore, I think it is harder to hire people with expertise in these technologies – there are still more people skilled up in traditional RDBMS and other newer NoSQL technologies like MongoDB.

I think the second obstacle is that semantic web technologies are, obviously, not going to be as mature as a good old relational database. There are some great triplestores out there, and there are enterprises who have successfully incorporated them (the BBC are a great example) but being a relatively new technology I suspect many enterprises are nervous to invest.

——————-
John Goodwin went to university at Royal Holloway and Bedford New College (University of London – based in Egham, Surrey) and graduated in 1992 with a 1st class honours degree in mathematics. Following that he moved to Cambridge and studied Part III of the Mathematics Tripos at the Department of Applied Maths and Theoretical Physics (University of Cambridge) where he obtained a Certificate of Advanced Study in Mathematics. John then moved to the University of Southampton to start his PhD.
He graduated in 1997 with a PhD in “The Cauchy Problem in Spacetimes with Closed Timelike Curves” (which can very roughly be paraphrased as ‘do timemachines blow up when you turn them on?’). In 1998 John left academia to start work at Ordnance Survey (located at postcode SO16 0AS) as a systems developer. He left Ordnance Survey in 2000 to start work at a small software company called Neusciences where he gained experience in various A.I. techniques. After just ten months at Neusciences John returned to Ordnance Survey to work in the research department where his research concentrated on the semantic web, ontologies and linked data. On the back of this research John produced the current Ordnance Survey linked data. He is currently still at Ordnance Survey and working as a Principal Scientist, where he leads research (at a technical and strategic level) into data managment, data delivery and services.
John currently chair the UK location council linked data working group and participate in the UK Government Linked Data working group.

Related Posts

- On Hybrid Relational Databases. Interview with Kingsley Uyi Idehen. May 13, 2013

- Graphs vs. SQL. Interview with Michael Blaha. April 11, 2013

- On Big Graph Data. August 6, 2012

Resources

ODBMS.org: free resources on Graphs and Data Stores
Blog Posts | Free Software | Articles, Papers, Presentations| Tutorials, Lecture Notes

——————————–
Follow ODBMS.org on Twitter: @odbmsorg

Aug 19 13

On NoSQL. Interview with Rick Cattell.

by Roberto V. Zicari

” There aren’t enough open source contributors to keep projects competitive in features and performance, and the companies supporting the open source offerings will have trouble making enough money to keep the products competitive themselves. Likewise, companies with closed source will have trouble finding customers willing to risk a closed source (or limited open source) solution. It will be interesting to see what happens. But I don’t see NoSQL going away, there is a well-established following.” –Rick Cattell.

I have asked Rick Cattell, one of the leading independent consultants in database systems, a few questions on NoSQL.

RVZ

Q1. For years, you have been studying the NoSQL area and writing articles about scalable databases. What is new in the last year, in your view? What is changing?

Rick Cattell: It seems like there’s a new NoSQL player every month or two, now!
It’s hard to keep track of them all. However, a few players have become much more popular than the others.

Q2. Which players are those?

Rick Cattell: Among the open source players, I hear the most about MongoDB, Cassandra, and Riak now, and often HBase and Redis. However, don’t forget that the proprietary players like Amazon, Oracle, and Google have NoSQL systems as well.

Q3. How do you define “NoSQL”?

Rick Cattell: I use the term to mean systems that provide a simple operations like key/value storage or simple records and indexes, and that focus on horizontal scalability for those simple operations. Some people categorize horizontally scaled graph databases and object databases to be “NoSQL” as well. However, those systems have very different characteristics. Graph databases and object databases have to efficiently break connections up over distributed servers, and have to provide operations that somehow span servers as you traverse the graph. Distributed graph/object databases have been around for a while, but efficient distribution is a hard problem. The NoSQL databases simply distribute (or shard) each data type based on a primary key; that’s easier to do efficiently.

Q4. What other categories of systems do you see?

Rick Cattell: Well, there are systems that focus on horizontal scaling for full SQL with joins, which are generally called “NewSQL“, and systems optimized for “Big Data” analytics, typically based on Hadoop map/reduce. And of course, you can also sub-categorize the NoSQL systems based on their data model and distribution model.

Q5. What subcategories would those be?

Rick Cattell: On data model, I separate them into document databases like MongoDB and CouchBase, simple key/value stores like Riak and Redis, and grouped-column stores like HBase and Cassandra. However, a categorization by data model is deceptive, because they also differ quite a bit in their performance and concurrency guarantees.

Q6: Which systems perform best?

Rick Cattell: That’s hard to answer. Performance is not a scale from “good” to “bad”… the different systems have better performance for different kinds of applications. MongoDB performs incredibly well if all your data fits in distributed memory, for example, and Cassandra does a pretty good job of using disk, because of its scheme of writing new data to the end of disk files and consolidating later.

Q7: What about their concurrency guarantees?

Rick Cattell: They are all over the place on concurrency. The simplest provide no guarantees, only “eventual consistency“. You don’t know which version of data you’ll get with Cassandra. MongoDB can keep a “primary” replica consistent if you can live with their rudimentary locking mechanism.
Some of the new systems try to provide full ACID transactions. FoundationDB and Oracle NoSQL claim to do that, but I haven’t yet verified that. I have studied Google’s Spanner paper, and they do provide true ACID consistency in a distributed world, for most practical purposes. Many people think the CAP theorem makes that impossible, but I believe their interpretation of the theorem is too narrow: most real applications can have their cake and eat it too, given the right distribution model. By the way, graph/object databases also provide ACID consistency, as does VoltDB, but as I mentioned I consider them a different category.

Q8: I notice you have an unpublished paper on your website, called 2x2x2 Requirements for Scalability. Can you explain what the 2x2x2 means?

Rick Cattell: Well, the first 2x means that there are two different kinds of scalability: horizontal scaling over multiple servers, and vertical scaling for performance on a single server. The remaining 2×2 means that there are two key features needed to achieve the horizontal and vertical scaling, and for each of those, there are two additional things you have to do to make the features practical.

Q9: What are those key features?

Rick Cattell: For horizontal scaling, you need to partition and replicate your data. But you also need automatic failure recovery and database evolution with no downtime, because when your database runs on 200 nodes, you can’t afford to take the database offline and you can’t afford operator intervention on every failure.
To achieve vertical scaling, you need to take advantage of RAM and you need to avoid random disk I/O. You also need to minimize the overhead for locking and latching, and you need to minimize network calls between servers. There are various ways to do that. The best systems have all eight of these key features. These eight features represent my summary of scalable databases in a nutshell.

Q10: What do you see happening with NoSQL, going forward?

Rick Cattell: Good question. I see a lot of consolidation happening… there are too many players! There aren’t enough open source contributors to keep projects competitive in features and performance, and the companies supporting the open source offerings will have trouble making enough money to keep the products competitive themselves. Likewise, companies with closed source will have trouble finding customers willing to risk a closed source (or limited open source) solution.
It will be interesting to see what happens. But I don’t see NoSQL going away, there is a well-established following.

————–
R. G. G. “Rick” Cattell is an independent consultant in database systems.
He previously worked as a Distinguished Engineer at Sun Microsystems, most recently on open source database systems and distributed database scaling. Dr. Cattell served for 20+ years at Sun Microsystems in management and senior technical roles, and for 10 years in research at Xerox PARC and at Carnegie-Mellon University. Dr. Cattell is best known for his contributions in database systems and middleware, including database scalability, enterprise Java, object/relational mapping, object-oriented databases, and database interfaces. He is the author of several dozen papers and five books, and a co-inventor of six U.S. patents.
At Sun he instigated the Enterprise Java, Java DB, and Java Blend projects, and was a contributor to a number of Java APIs and products. He previously developed the Cedar DBMS at Xerox PARC, the Sun Simplify database GUI, and SunSoft’s CORBA-database integration.
He is a co-founder of SQL Access (a predecessor to ODBC), the founder and chair of the Object Data Management Group (ODMG), the co-creator of JDBC, the author of the world’s first monograph on object/relational and object databases, a recipient of the ACM Outstanding PhD Dissertation Award, and an ACM Fellow.

Related Posts

-On Oracle NoSQL Database –Interview with Dave Segleau. July 2, 2013

-On Real Time NoSQL. Interview with Brian Bulkowski. May 21, 2013

Resources

- Rick Cattell home page.

- ODBMS.org Free Downloads and Links
In this section you can download free resources covering the following topics:
Big Data and Analytical Data Platforms
Cloud Data Stores
Object Databases
NoSQL Data Stores
Graphs and Data Stores
Object-Oriented Programming
Entity Framework (EF) Resources
ORM Technology
Object-Relational Impedance Mismatch
NewSQL, XML, RDF Data Stores, RDBMS

Follow ODBMS.org on Twitter: @odbmsorg

##

Aug 2 13

Big Data from Space: the “Herschel” telescope.

by Roberto V. Zicari

” One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on”–Jon Brumfitt.

On May 14, 2009, the European Space Agency launched an Arianne 5 rocket carrying the largest telescope ever flown: the “Herschel” telescope, 3.5 meters in diameter.

I first did an interview with Dr. Jon Brumfitt, System Architect & System Engineer of Herschel Scientific Ground Segment, at the European Space Agency in March 2011. You can read that interview here.

Two years later, I wanted to know the status of the project. This is a follow up interview.

RVZ

Q1. What is the status of the mission?

Jon Brumfitt: The operational phase of the Herschel mission came to an end on 29th April 2013, when the super-fluid helium used to cool the instruments was finally exhausted. By operating in the far infra-red, Herschel has been able to see cold objects that are invisible to normal telescopes.
However, this requires that the detectors are cooled to an even lower temperature. The helium cools the instruments down to 1.7K (about -271 Celsius). Individual detectors are then cooled down further to about 0.3K. This is very close to absolute zero, which is the coldest possible temperature. The exhaustion of the helium marks the end of new observations, but it is by no means the end of the mission.
We still have a lot of work to do in getting the best results from the data processing to give astronomers a final legacy archive of high-quality data to work with for years to come.

The spacecraft has been in orbit around a point known as the second Lagrangian point “L2″, which is about 1.5 million kilometres from Earth (around four times as far away as the Moon). This location provided a good thermal environment and a relatively unrestricted view of the sky. The spacecraft cannot be left in this orbit because regular correction manoeuvres would be needed. Consequently, it is being transferred into a “parking” orbit around the Sun.

Q2. What are the main results obtained so far by using the “Herschel” telescope?

Jon Brumfitt: That is a difficult one to answer in a few sentences. Just to take a few examples, Herschel has given us new insights into the way that stars form and the history of star formation and galaxy evolution since the big-bang.
It has discovered large quantities of cold water vapour in the dusty disk surrounding a young star, which suggests the possibility of other water covered planets. It has also given us new evidence for the origins of water on Earth.
The following are some links giving more detailed highlights from the mission:

- Press
- Results
- Press Releases
- Latest news

With its 3.5 metre diameter mirror, Herschel is the largest space telescope ever launched. The large mirror not only gives it a high sensitivity but also allows us to observe the sky with a high spatial resolution. So in a sense every observation we make is showing us something we have never seen before. We have performed around 35,000 science observations, which have already resulted in over 600 papers being published in scientific journals. There are many years of work ahead for astronomers in interpreting the results, which will undoubtedly lead to many new discoveries.

Q3. How much data did you receive and process so far? Could you give us some up to date information?

Jon Brumfitt: We have about 3 TB of data in the Versant database, most of which is raw data from the spacecraft. The data received each day is processed by our data processing pipeline and the resulting data products, such as images and spectra, are placed in an archive for access by astronomers.
Each time we make a major new release of the software (roughly every six months at this stage), with improvements to the data processing, we reprocess everything.
The data processing runs on a grid with around 35 nodes, each with typically 8 cores and between 16 and 256 GB of memory. This is able to process around 40 days worth of data per day, so it is possible to reprocess everything in a few weeks. The data in the archive is stored as FITS files (a standard format for astronomical data).
The archive uses a relational (PostgreSQL) database to catalogue the data and allow queries to find relevant data. This relational database is only about 60 GB, whereas the product files account for about 60 TB.
This may reduce somewhat for the final archive, once we have cleaned it up by removing the results of earlier processing runs.

Q4. What are the main technical challenges in the data management part of this mission and how did you solve them?

Jon Brumfitt: One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on.

The lifetime of Herschel will have been 18 years from the start of software development to the end of the post-operations phase.
We designed a single system to meet the needs of all mission phases, from early instrument development, through routine in-flight operations to the end of the post-operations phase. Although the spacecraft was not launched until 2009, the database was in regular use from 2002 for developing and testing the instruments in the laboratory. By using the same software to control the instruments in the laboratory as we used to control them in flight, we ended up with a very robust and well-tested system. We call this approach “smooth transition”.

The development approach we adopted is probably best classified as an Agile iterative and incremental one. Object orientation helps a lot because changes in the problem domain, resulting from changing requirements, tend to result in localised changes in the data model.
Other important factors in managing change are separation of concerns and minimization of dependencies, for example using component-based architectures.

When we decided to use an object database, it was a new technology and it would have been unwise to rely on any database vendor or product surviving for such a long time. Although work was under way on the ODMG and JDO standards, these were quite immature and the only suitable object databases used proprietary interfaces.
We therefore chose to implement our own abstraction layer around the database. This was similar in concept to JDO, with a factory providing a pluggable implementation of a persistence manager. This abstraction provided a route to change to a different object database, or even a relational database with an object-relational mapping layer, should it have proved necessary.

One aspect that is difficult to abstract is the use of queries, because query languages differ. In principle, an object database could be used without any queries, by navigating to everything from a global root object. However, in practice navigation and queries both have their role. For example, to find all the observation requests that have not yet been scheduled, it is much faster to perform a query than to iterate by navigation to find them. However, once an observation request is in memory it is much easier and faster to navigate to all the associated objects needed to process it. We have used a variety of techniques for encapsulating queries. One is to implement them as methods of an extent class that acts as a query factory.

Another challenge was designing a robust data model that would serve all phases of the mission from instrument development in the laboratory, through pre-flight tests and routine operations to the end of post-operations. We approached this by starting with a model of the problem domain and then analysing use-cases to see what data needed to be persistent and where we needed associations. It was important to avoid the temptation to store too much just because transitive persistence made it so easy.

One criticism that is sometimes raised against object databases is that the associations tend to encode business logic in the object schema, whereas relational databases just store data in a neutral form that can outlive the software that created it; if you subsequently decide that you need a new use-case, such as report generation, the associations may not be there to support it. This is true to some extent, but consideration of use cases for the entire project lifetime helped a lot. It is of course possible to use queries to work-around missing associations.

Examples are sometimes given of how easy an object database is to use by directly persisting your business objects. This may be fine for a simple application with an embedded database, but for a complex system you still need to cleanly decouple your business logic from the data storage. This is true whether you are using a relational or an object database. With an object database, the persistent classes should only be responsible for persistence and referential integrity and so typically just have getter and setter methods.
We have encapsulated our persistent classes in a package called the Core Class Model (CCM) that has a factory to create instances. This complements the pluggable persistence manager. Hence, the application sees the persistence manager and CCM factories and interfaces, but the implementations are hidden.
Applications define their own business classes which can work like decorators for the persistent classes.

Q5. What is your experience in having two separate database systems for Herschel? A relational database for storing and managing processed data products and an object database for storing and managing proposal data, mission planning data, telecommands and raw (unprocessed) telemetry?

Jon Brumfitt: There are essentially two parts to the ground segment for a space observatory.
One is the “uplink” which is used for controlling the spacecraft and instruments. This includes submission of observing proposals, observation planning, scheduling, flight dynamics and commanding.
The other is the “downlink”, which involves ingesting and processing the data received from the spacecraft.

On some missions the data processing is carried out by a data centre, which is separate from spacecraft operations. In that case there is a very clear separation.
On Herschel, the original concept was to build a completely integrated system around an object database that would hold all uplink and downlink data, including processed data products. However, after further analysis it became clear that it was better to integrate our product archive with those from other missions. This also means that the Herschel data will remain available long after the project has finished. The role of the object database is essentially for operating the spacecraft and storing the raw data.

The Herschel archive is part of a common infrastructure shared by many of our ESA science projects. This provides a uniform way of accessing data from multiple missions.
The following is a nice example of how data from Herschel and our XMM-Newton X-ray telescope have been combined to make a multi-spectral image of the Andromeda Galaxy.

Our archive, in turn, forms part of a larger international archive known as the “Virtual Observatory” (VO), which includes both space and ground-based observatories from all over the world.

I think that using separate databases for operations and product archiving has worked well. In fact, it is more the norm rather than the exception. The two databases serve very different roles.
The uplink database manages the day-to-day operations of the spacecraft and is constantly being updated. The uplink data forms a complex object graph which is accessed by navigation, so an object database is well suited.
The product archive is essentially a write-once-read-many repository. The data is not modified, but new versions of products may be added as a result of reprocessing. There are a large number of clients accessing it via the Internet. The archive database is a catalogue containing the product meta-data, which can be queried to find the relevant product files. This is better suited to a relational database.

The motivation for the original idea of using a single object database for everything was that it allowed direct association between uplink and downlink data. For example, processed products could be associated with their observation requests. However, using separate databases does not prevent one database being queried with an observation identifier obtained from the other.
One complication is that processing an observation requires both downlink data and the associated uplink data.
We solved this by creating “uplink products” from the relevant uplink data and placing them in the archive. This has the advantage that external users, who do not have access to the Versant database, have everything they need to process the data themselves.

Q6. What are the main lessons learned so far in using Versant object database for managing telemetry data and information on steering and calibrating scientific on-board instruments?

Jon Brumfitt: Object databases can be very effective for certain kinds of application, but may have less benefit for others. A complex system typically has a mixture of application types, so the advantages are not always clear cut. Object databases can give a high performance for applications that need to navigate through a complex object graph, particularly if used with fairly long transactions where a significant part of the object graph remains in memory. Web (JavaEE) applications lose some of the benefit because they typically perform many short transactions with each one performing a query. They also use additional access layers that result in a system which loses the simplicity of the transparent persistence of an object database.

In our case, the object database was best suited for the uplink. It simplified the uplink development by avoiding object-relational mapping and the complexity of a design based on JDBC or EJB 2. Nowadays with JPA, relational databases are much easier to use for object persistence, so the rationale for using an object database is largely determined by whether the application can benefit from fast navigational access and how much effort is saved in mapping. There are now at least two object database vendors that support both JDO and JPA, so the distinction is becoming somewhat blurred.

For telemetry access we query the database instead of using navigation, as the packets don’t fit neatly into a single containment hierarchy. Queries allows packets to be accessed by many different criteria, such as time, instrument, type, source and so on.
Processing calibration observations does not introduce any special considerations as far as the database is concerned.

Q7. Did you have any scalability and or availability issues during the project? If yes, how did you solve them?

Jon Brumfitt: Scalability would have been an important issue if we had kept to the original concept of storing everything including products in a single database. However, using the object database for just uplink and telemetry meant that this was not a big issue.

The data processing grid retrieves the raw telemetry data from the object database server, which is a 16-core Linux machine with 64 GB of memory. The average load on the server is quite low, but occasionally there have been high peak loads from the grid that have saturated the server disk I/O and slowed down other users of the database. Interactive applications such as mission planning need a rapid response, whereas batch data processing is less critical. We solved this by implementing a mechanism to spread out the grid load by treating the database as a resource.

Once a year, we have made an “Announcement of Opportunity” for astronomers to propose observations that they would like to perform with Herschel. It is only human nature that many people leave it until the last minute and we get a very high peak load on the server in the last hour or two before the deadline! We have used a separate server for this purpose, rather than ingesting proposals directly into our operational database. This has avoided any risk of interfering with routine operations. After the deadline, we have copied the objects into the operational database.

Q8. What about the overall performance of the two databases? What are the lessons learned?

Jon Brumfitt: The databases are good at different things.
As mentioned before, an object database can give a high performance for applications involving a complex object graph which you navigate around. An example is our mission planning system. Object persistence makes application design very simple, although in a real system you still need to introduce layers to decouple the business logic from the persistence.

For the archive, on the other hand, a relational database is more appropriate. We are querying the archive to find data that matches a set of criteria. The data is stored in files rather than as objects in the database.

Q9. What are the next steps planned for the project and the main technical challenges ahead?

Jon Brumfitt: As I mentioned earlier, the coming post-operations phase will concentrate on further improving the data processing software to generate a top-quality legacy archive, and on provision of high-quality support documentation and continued interactive support for the community of astronomers that forms our “customer base”. The system was designed from the outset to support all phases of the mission, from early instrument development tests in the laboratory, though routine operations to the end of the post-operations phase of the mission. The main difference moving into post-operations is that we will stop uplink activities and ingesting new telemetry. We will continue to reprocess all the data regularly as improvements are made to the data processing software.

We are currently in the process of upgrading from Versant 7 to Versant 8.
We have been using Versant 7 since launch and the system has been running well, so there has been little urgency to upgrade.
However, with routine operations coming to an end, we are doing some “technology refresh”, including upgrading to Java 7 and Versant 8.

Q10. Anything else you wish to add?

Jon Brumfitt: These are just some personal thoughts on the way the database market has evolved over the lifetime of Herschel. Thirteen years ago, when we started development of our system, there were expectations that object databases would really take off in line with the growing use of object orientation, but this did not happen. Object databases still represent rather a niche market. It is a pity there is no open-source object-database equivalent of MySQL. This would have encouraged more people to try object databases.

JDO has developed into a mature standard over the years. One of its key features is that it is “architecture neutral”, but in fact there are very few implementations for relational databases. However, it seems to be finding a new role for some NoSQL databases, such as the Google AppEngine datastore.
NoSQL appears to be taking off far quicker than object databases did, although it is an umbrella term that covers quite a few kinds of datastore. Horizontal scaling is likely to be an important feature for many systems in the future. The relational model is still dominant, but there is a growing appreciation of alternatives. There is even talk of “Polyglot Persistence” using different kinds of databases within a system; in a sense we are doing this with our object database and relational archive.

More recently, JPA has created considerable interest in object persistence for relational databases and appears to be rapidly overtaking JDO.
This is partly because it is being adopted by developers of enterprise applications who previously used EJB 2.
If you look at the APIs of JDO and JPA they are actually quite similar apart from the locking modes. However, there is an enormous difference in the way they are typically used in practice. This is more to do with the fact that JPA is often used for enterprise applications. The distinction is getting blurred by some object database vendors who now support JPA with an object database. This could expand the market for object databases by attracting some traditional relational type applications.

So, I wonder what the next 13 years will bring! I am certainly watching developments with interest.
——

Dr Jon Brumfitt, System Architect & System Engineer of Herschel Scientific Ground Segment, European Space Agency.

Jon Brumfitt has a background in Electronics with Physics and Mathematics and has worked on several of ESA’s astrophysics missions, including IUE, Hipparcos, ISO, XMM and currently Herschel. After completing his PhD and a post-doctoral fellowship in image processing, Jon worked on data reduction for the IUE satellite before joining Logica Space and Defence in 1980. In 1984 he moved to Logica’s research centre in Cambridge and then in 1993 to ESTEC in the Netherlands to work on the scientific ground segments for ISO and XMM. In January 2000, he joined the newly formed Herschel team as science ground segment System Architect. As Herschel approached launch, he moved down to the European Space Astronomy Centre in Madrid to become part of the Herschel Science Operations Team, where he is currently System Engineer and System Architect.

Related Posts

- The Gaia mission, one year later. Interview with William O’Mullane. January 16, 2013

- Objects in Space: “Herschel” the largest telescope ever flown. March 18, 2011

Resources

- Introduction to ODBMS By Rick Grehan

- ODBMS.org Resources on Object Database Vendors.

—————————————
You can follow ODBMS.org on Twitter : @odbmsorg

##

Jul 25 13

Cloud based hotel management– Interview with Keith Gruen

by Roberto V. Zicari

” Our customers have understood that their data is much safer in our secure and professionally-managed cloud environment than it was on the server in the basement of the hotel.”– Keith Gruen.

I heard of hetras, a German-Austrian company that created a cloud-based hotel management product for global hotel chains. I wanted to know more. I therefore interviewed Keith Gruen, co-founder and managing director of hetras GmbH.

RVZ

Q1. What is the business of Hetras?

Keith Gruen: hetras is the first company to build a fully cloud-based application for hotels and global chains of all sizes. Particularly suited for the new generation of hotels with a high degree of automation, the hetras hotel management software combines property management system (PMS) with powerful distribution and channel management into a unified application.
The product is offered on a SaaS basis, meaning that hotels pay an all-inclusive flat fee per month per room. Built from the ground up for the internet generation, hetras offers a refreshing new user experience and users only need a tablet or browser with a standard internet connection.

Q2. What are the new generation of hotels? How do they differ from conventional ones?

Keith Gruen: New generation hotels and chains offer a high quality hotel experience in a prime location but at a budget price. They can achieve low prices through elimination of services such as full-service restaurant, room service, SPA, conference and banqueting, valet parking, porters and concierge. These hotels also save money by reducing or eliminating reservation and front desk staff. As a result, the hotels encourage or even require self-service by the guests, including self-reservation via the hotel website, self-check-in and self-checkout via kiosks or mobile apps and self-management of preferences and frequent guest programs. New generation hotels typically place a large focus on high design, top quality bedding, high-tech rooms and excellent showers. Rooms are generally small but stylish and efficient.

Q3. What are the specific data management requirements for these new generation of hotels?

Keith Gruen: New generation hotels have a high degree of automation requirements and integration with third party systems. Integration with online reservation portals such as booking.com and Expedia as well as GDS (global distribution systems) have to be seamless and always up-to-date. Credit card authorization and other payment systems have to work without any human intervention.
The check-in and check-out kiosks or apps have to be intuitive and fast. Revenue management, i.e. establishing rates, restrictions, policies has to be automatic and reliable. As guests are doing more and more of the work, the user experience has to be magnitudes better than any hotel system in the past.

Q4. You developed a Cloud-based Hotel Management System as an Internet-based software-as-a-service. Which hotels currently use your system and for what?

Keith Gruen: Our customers include several new generation hotel groups, such as citizenM (based in Amsterdam), OKKO (Paris), BLOC (UK) and ADDUCCO (Romania). In addition, a number of independent hotels around Europe also use hetras.

Q5. How do you handle data processing, sorting, storage and retrieval?

Keith Gruen: Our application is built on a Microsoft stack. We use MS-SQL Server as a primary database plus ExtremeDB from McObject for some speed-critical functions.

Q6. Could you be more specific and explain what are these “speed-critical functions” and why do you use an in-memory database for that?

Keith Gruen: Our most speed-critical functions are rate and availability look-ups.

Q7. Rate/availability lookups are complex queries, and demand high performance. Do you handle such queries in MS-SQL or with ExtremeDB?

Keith Gruen: ExtremeDB.

Q8. Most rate/reservation queries are not done by humans but by machines. How do you handle requests from global distribution systems (GDSs)- such as Sabre, Amadeus, Worldport and others-used across the travel industry to check rates and book reservations? What kind of data requirements do they have?

Keith Gruen: The GDS tend to query the hotel reservation systems via robots on a frequent basis. This keeps their cache up-to-date and they can then offer nearly real-time rates and availability to the travel agents and end-users who query their system. Checking if a hotel is available or not is not a single query. The GDS has to check every individual date and for every reasonable length of stay. Because of the peculiarities of the hotel business, a hotel might be sold out for a single-night’s stay on Monday night and Tuesday night, but a guest who wants to stay both nights may still be offered a room. Furthermore, the GDS have to query every different room type. Standard rooms could be sold out while a few deluxe rooms remain. The GDS, most of which are based on 1960s technology, are not capable to poll for changes. They simply query everything on a regular basis. All in all, GDSs are known to make up to 70,000 queries for every single confirmed reservation.

Q9. How did you design your system to achieve scalability (scale out and scale up)?

Keith Gruen: We use virtualized environments. So far, we have scaled up by adding RAM and CPU to our machines.
We have two fully redundant sets of hardware with a load balancer. We could add a third set if necessary.

Q10: Which virtualized environments do you use? Don`t you have performance issue if you use virtualization?

Keith Gruen: We use VMWare. Virtualization has not caused us any performance issues.

Q11. Why do you use an in-memory database system for your system and for what?

Keith Gruen: We use the In-Memory Database (IMDB) especially for the queries as described above. We call this our “quote” module. We do not write to the IMDB. When a reservation is finally completed, we write directly to the main MS-SQL database, which in turn updates the IMDB. The data in the IMDB is configured to answer the most common queries very quickly.

Q12. What are the potential bottlenecks for your system?

Keith Gruen: Hotel staff can theoretically launch batch processes or large-scale searches across vast amounts of data that would slow down the system for all other hotels as well. This is a drawback of having a single-instance multi-tenant architecture.

Q13. Are there any concerns by customers to have their personal data stored in the Cloud?

Keith Gruen: No. Our customers have understood that their data is much safer in our secure and professionally-managed cloud environment than it was on the server in the basement of the hotel.

———————–
Keith Gruen is a co-founder and managing director of hetras GmbH, a German-Austrian company that created the first cloud-based hotel management product for global hotel chains. Mr. Gruen is also the founder of Fidelio Software, which built the market-leading Hotel PMS in the 1980’s and 90’s. He later sold the company to Micros. Mr. Gruen co-founded NXN, a developer of computer game technology and Kappa IT, a venture capital firm that invested in technology startups. Prior to hetras, he led corporate development for Conject AG, a developer of software for the real estate and construction industry. Mr. Gruen graduated from Brown University in Mathematics and Computer Science.

Related Posts

-In-memory OLTP database. Interview with Asa Holmstrom. December 27, 2012

-In-memory database systems. Interview with Steve Graves, McObject. March 16, 2012

Resources

-ODBMS.org: Resources on Big Data Analytics, NewSQL, NoSQL, Object Database Vendors

Follow ODBMS.org on Twitter: @odbmsorg
##

Jul 11 13

Isis2: A new Open Platform for Data Replication in the Cloud.–Interview with Ken Birman.

by Roberto V. Zicari

“My target is to be the MapReduce solution for the world’s realtime problems.”– Ken Birman.

I had the pleasure to interview Ken Birman, Professor of Computer Science at Cornell University. Ken is a pioneer and one of the leading researchers in the field of distributed systems. Ken has been working recently on a new system called Isis2 (isis2.codeplex.com). I asked him a few questions on Isis2.

RVZ

Q1. Recently, you have been working on the Isis2 project. What is it?

Ken Birman: Isis2 started as a bit of a hobby for me, and the story actually dates back almost 25 years.
Early in my career, I created a system called the Isis Toolkit, using a model we called virtual synchrony to support strongly consistent replication for services running on clusters of various kinds. We had quite a success with that version of Isis, and it was the basis of the applications I mentioned above – I started a company and we did very well with it. In fact, for more than a decade there was never a disruptive failure at the New York Stock Exchange: components crashed now and then, obviously, but Isis would help guide the solution to a clean recovery and the traders were never aware of any issues. The same is true for the French ATC system or the US Navy AEGIS.

Now this model, virtual synchrony, has deep connections both to the ACID models used in database settings, and to the state machine replication model supported by protocols like Lamport’s Paxos. Indeed, recently we were able to show a bisimulation of the virtually synchronous protocol called gbcast (dating to around 1985) and a version of Paxos for dynamically reconfigurable systems. In some sense, gbcast was the first Paxos – Leslie Lamport says we should have named the protocol “virtually synchronous Paxos”.
(Of course if we had, I suspect that he would have named his own protocol something else!). I could certainly do the same with respect to database serializability – basically, virtual synchrony is like the ACID model, but aimed at “groups of processes” that might use purely in-memory replication. In effect my protocols were optimized ones aimed at supporting ACI but without the D.

Anyhow, over the years, the Isis Toolkit matured and people moved on. I moved on too, and worked on topics involving gossip communication, and security. But eventually this came to frustrate me: I felt that my own best work was being lost to disuse. And so starting four years ago, I decided to create a new and more modern Isis for the cloud. I’m calling it Isis2, and it uses the model Leslie and I talked about (virtually synchronous state machine replication). I’ve implemented the whole thing from scratch, using modern object-oriented languages and tools.

So Isis2 is a new open platform for data replication in the cloud, aiming at cases where a developer might be building some sort of service that would end up running on a cloud platform like Eucalyptus, Amazon EC2, etc. My original idea was to create the ultimate library for distributed computing, but also to make it as easy to use as the GUI builders that our students use in their first course on object oriented programming: powerful technologies but entirely accessed via very simple ideas, such as attaching an event handler to a suitable object. A first version of the Isis2 system became available about two years ago, and I’ve been working on it as much as possible since then, with new releases every couple of months.

But I’ve also slightly shifted my focus, in ways that are bringing Isis2 closer and closer to the big-data and ODBMS community.
I realized that unlike the situation 20 years ago, today the people who most need tools for replicating data with strong consistency are mostly working to create in-memory big-data platforms for the cloud, and then running large machine-learning algorithms on them.
For example, one important target for Isis2 is to support the smart grid here in the United States. So one imagines an application capturing all sorts of data in real-time, from what might be a vast number of sensing points, pulling this into the cloud, and then running a machine learning algorithm on the resulting in-memory data set. You replicate such a service for high availability and to gain faster read-only performance, and then run a distributed algorithm that learns the parameters to some model: perhaps a convex optimization, perhaps a support vector, etc.

I’ve run into other applications with a similar structure. For example, self-driving cars and other AUVs need to query a rapidly evolving situational database, asking questions such as “what road hazards should I be watching for the next 250 meters?” The community doing large-scale simulations for problems in particle physics needs to solve the “nearby particle” problem: “now that I’m at location such-and-such, what particles are close enough to interact with me?” One can make quite a long list. All of these demand rapid updates, in place, to a database that is living in-memory and being queried frequently.

With this in mind, I’ve been extending the Isis2 system to enlarge its support for key-value data, spread within a service and replicated, but with each item residing at just small subsets of the total set of members (“sharded”). My angle has been to offer strong consistency (not full transactions, because those involve locking and become expensive, but a still-powerful “one-shot atomic actions” model). Because Isis2 can offer these guarantees for a workload that includes a mix of updates and queries, the community working on graphical learning and other key-value learning problems where data might evolve even as the system is tracking various properties has shown strong interest in the platform.

Q2. You claim that Isis2 is a new option for cloud computing that can enable reliable, secure replication of data even in the highly elastic first-tier of the cloud. How does it different from existing commercial products such as NoSQL, Amazon Web Services etc.?

Ken Birman: Well, there are really a few differences. One is that Isis2 is a library. So if you compare with other technologies, the closest matches are probably Graphlab from CMU and Spark, Berkeley’s in-memory storage system for accelerating Hadoop computations. A second big difference is that Isis2 has a very strong consistency model, putting me at the other end of the spectrum relative to NoSQL. As for Web Services, well, the thinking is that these services you use Isis2 to create would often be web services, accessed over the web via some representative, which then turns around and interacts with other members using the Isis2 primatives.

Q3. How do you handle Big Data in Isis2

Ken Birman: I’m adding more and more features, but there are two that should be especially interesting to your readers. One is the Isis2 DHT: a key-value store spread over a group of nodes, such that any given key maps to some small subset of the members (a shard). So keys might, for example, be node-ids for a graph, and the values could represent data associated with that node: perhaps a weight, in and out edges to other nodes, etc. The model is incredibly flexible: a key can be any object you like, and the values can also be arbitrary objects. You even can control the mapping of keys to group members, by implementing a specialized version of GetHashCode().

What I do is to allow atomic actions on sets of key-value pairs. So you can insert a collection of key-value pairs, or query a set of them, or even do an ordered multicast to just the nodes that host some set of keys. These actions occur on a consistent cut, giving a form of action by action atomicity. And of course, the solution is also fault-tolerant and even offers a strong form of security via AES-256 encryption, if desired.

What this enables is a powerful new form of distributed aggregation, in which one can guarantee that a query result reflects exactly-once contributions from each matching key-value pair. You can also insert new key-value tuples as part of one of these actions (although those occur as new atomic actions, ordered after the one that initiated them), or even reshuffle the mapping of keys to nodes, by “reconfiguring” the group to use a new GetHashCode() method.

You can also use LINQ in these groups, for example to efficiently query the “slice” of a key-value set that mapped to a particular set of nodes.

As I mentioned, I also have a second big-data feature that should become available later this summer: a tool for moving huge objects around within a cluster of cloud nodes. So suppose that an application is working with gigabyte memory-mapped files.
Sending these around inside messages could be very costly, and will cause the system to stutter because anything ordered after that send or multicast might have to wait for the gigabyte to transfer. With this new feature, I do out-of-band movement of the big objects in a very efficient way, using IP multicast if available (and a form of fat-tree TCP mesh if not), and then ship just a reference to the big object in my messages. This way the in-band communication won’t stutter, and the gigabyte object gets replicated using very efficient out-of-band protocols. By the end of the summer, I’m hoping to have the world’s fastest tool for replicating big memory-mapped objects in the cloud. (Of course, there can be quite a gap between “hoping” and “will have”, but I’m optimistic).

Q4. Isis2 uses a distributed sharding scheme. What is it?

Ken Birman: Again, there are a few answers – if Isis2 has a flaw, I suspect that it comes down to trying to offer every possible cool mechanism to every imaginable developer. Just like Microsoft .NET this can mean that you end up with too many stories. (You won’t be surprised to learn that Isis2 is actually built in .NET, using C#, and hence usable from any .NET language. We use Mono to compile it for use on Linux. And it does have a bit of that .NET flavor).

Anyhow, one scheme is the sharding approach mentioned above. The user takes some group – perhaps it has 10,000 members in it and runs on 10,000 virtual machines on Amazon EC2. And they ask Isis2 to shard the group into shards of some size, maybe 5 or 10 – you pick a factor aimed at soaking up the query load but not imposing too high an update cost.

So in that case I might end up with 1000 or 2000 subgroups: those would be my shards. The mapping is very explicit: given a key-value pair with key K, I compute K.GetHashCode()%NS where NS is the number of shards obtained from the target group size (call it N) and the target shard size (call this S): hashcode%(N/S). And that tells me which shard will hold your key-value pair.
Given the group membership, which is consistent in Isis2, my protocols just count off from left to right to find the members that host this shard. The atomic update or query reaches out to those members: I involve all of them if the action is an update, and I load-balance over the shard for queries.

The other mechanism is coarser-grained: one Isis2 application can actually support multiple process groups. And each of those groups can be a DHT, sharded separately. This is convenient if an application has one long-lived in-memory group for the database, but wants to create temporary data structures too. What you can do is to use a separate temporary group for those temporary key-value pairs or structures. Then when your computation is finished, you have an easy way to clean up: you just tell Isis2 to discard the temporary group and all its contents will evaporate, like magic.

Q5. Isis2 sounds like MapReduce. Is this right? What are the differences and what are the similarities?

Ken Birman: Roberto, you are exactly right about this. While you can treat the Isis2 DHT as a key-value store, a natural style of computing would be to use the kinds of code one sees with iterative MapReduce applications. Isis2 is in-memory, of course (although nothing stops you from pairing it with a persistent database or log – in fact I provide tools for doing that). But you could do this, and if you did, Isis2 would be acting a lot like Spark.

I think the main point is that whereas a system like Spark just speeds MapReduce up by caching partial results, and really works only for immutable or append-only data sets, Isis2 can support dynamic updates while the queries are running. My target is to be the MapReduce solution for the world’s realtime problems. Moreover, I’m doing this for people who need strong consistency.
Get the answer wrong in the power grid and you cause a power outage or damage a transformer (does anyone remember how Christ Church took itself off the New Zealand power grid for a month?) Get the answer wrong in a self-driving car, and it might have an accident. So for situations in which data is rapidly evolving, Isis2 offers a way to do highly scalable queries even as you support updates in a highly scalable way too.

I actually see the similarity to MapReduce as a good thing. To me this model has been incredibly popular, because people find it easy to use. I can leverage that popularity.

The bigger contrast is with true transactional databases. Isis2 does support locking, but not full-scale transactions. It would be dishonest to say that I don’t think transactions can run at massive scale – in fact I’m working with a post-doc, Ittay Eyal, on an extension of Isis2 that we call ACID-RAIN that has a full transactional model. And I know of others who are doing competing systems – Marcos Aguilera at Microsoft, for example, who is hard at work on his follow-on to the famous Sinfornia system.

But I don’t think we need the full costs of ACID transactions for in-memory key-value stores, and this has pushed me towards a lock-free computing model for this part of Isis2, and towards the kind of weaker atomicity I offer: guarantees for individual actions on sets of key-value pairs, but with each step in your computation treated as a separate one-shot transaction.

Q6. You claim to be able to run consistent queries even on rapidly changing data, and yet scales as well as any sharding scheme. Please explain how?

Ken Birman: The question centers on semantics: what do I mean by a consistent query? For me, a query that ran on a consistent cut (like with snapshot isolation) and gives a result that reflects exactly one contribution from each matching key-value pair, is a “consistent” query. This is definitely what you want for some purposes. But if you want to define consistency to mean for full transactions, I’m not taking that last step. I offer the building blocks – locking, atomic multicast, etc. You could easily run a transaction and do a 2-phase commit at the end. But I just doubt that this would perform well and so I haven’t picked it as my main consistency model.

Q7. Why using LINQ as a query language and not SQL?

Ken Birman: I’m using LINQ, but maybe not in the way you are thinking. There are some projects that do use LINQ as their main user API. In the Isis2 space, Naiad and the Dryad system that preceded it would be famous examples. But I find it hard to work with systems that “map” my LINQ query to a distributed key-value structure. I’ve always favored simpler building blocks.

So the way I’m using LINQ is more limited. For me, a user might issue some sort of query, and at the first step this looks like a multicast: you specify the target group or the target shards (depending on whether you want every member or just the ones associated with certain keys), and the information you offered as arguments to the query or multicast are automatically translated to an efficient out-form and transmitted to the relevant members. On the members an upcall event occurs to a handler the developer coded, perhaps in C# or perhaps in some other .NET language like C++/CLI, Visual Basic, Java, etc. I think .NET has something like 40 languages you can use – I myself work mostly in C#.

So this C# handler is invoked with the specified arguments, in parallel, on the set of group members that matched the target you specified. Each one of them has a slice of the full DHT: the subset of key-value pairs that mapped to that particular member, in the way we talked about earlier – a mapping you as the user can control, and even modify dynamically (if you do, Isis2 reshuffles the key-value pairs appropriately).

What I do with LINQ is to let your code access this slice of the DHT as a collection on which you could run a LINQ query. And in fact, as you may know, LINQ has an SQL front end, so you could even use SQL on these collections. So the C# handler has this potentially big set of key-value tuples, the full set for that slice (remember that I guarantee consistency!), and with LINQ each of those members now can compute its share of the answer.

What happens next is up to you. You could use this answer to insert new key-value tuples: that would be like the shuffle and reduce step in MapReduce. You could send the answers to the query initiator, as a reply – Isis2 supports 1-K queries that get lists of K results back, and the initiator could then aggregate the results (perhaps using LINQ at that step too, or SQL). And finally I have a tree-structured aggregation option, where I build a binary tree spanning the participants and combine the subresults using user-specified code, so that just one aggregated result comes out, after log(N) delay. That last option would be sensible if you might end up with a very large number of answers, or really big ones.

Q8. How fault-tolerant is Isis2?

Ken Birman: Virtual synchrony is very powerful as a tool for building pretty much any fault-tolerance approach one likes (people who prefer Paxos can reread that sentence as “the dynamic state machine model…” and people who think in terms of ACID transactions can see this as “ACID-based SQL systems…”). The model I favor is one in which updates are always applied in an all-or-nothing way, but queries might be partially completed and then abandoned (“aborted”) if a failure occurs.

So with Isis2, the basic guarantee is that an atomic multicast or query reaches all the operational group members that it is supposed to reach, and in a total order with respect to conflicting updates. Queries either reflect exactly once contributions from each matching key-value pair, or they abort (if a failure disrupts them), and you need to reissue the request in the new group membership – the new process group “view”.

Q9. What about Hadoop? Do you plan to have an interface to Hadoop?

Ken Birman: I’ve suggested this topic to a few of my PhD students but up to now, none has “bit”. I think my group tends to attract students who have more of a focus on lower levels of the infrastructure. But with our work on the smart power grid, this could change; I’m starting to interact much more with students who come from the machine learning community and who might find that step very appealing. It wouldn’t really be all that hard to do.

Q10. Isis2 is open source. How can developers contribute?

Ken Birman: This is a great question. The basic system is open source under a free 3-clause BSD license. You can access it here: isis2.codeplex.com. – I have a big user manual there, and one of those compiled HTML documentation pages for each of my APIs, and I’m going to be doing some form of instructional MOOC soon, so there should end up being ten or so short videos showing how to program with the system.

Initially, I think that people who would want to play with Isis2 might be best off limiting themselves to working with the system but not changing my code directly. My worry is that the code really is quite subtle and while I would love to have help, my experience here at Cornell has been that even well-intentioned students have made a lot of mistakes by not really understanding my code and then trying to change it. I’m sorry to say that as of now, not a line of third party code has survived – not because I don’t want help (I would love help), but because so far, all the third-party code has died horribly when I really tested it carefully!

But over time, I’m hoping that Isis2 could become more and more of a community tool. In fact complex or not, I do think others could definitely master it and help me debug and extend it. They just need to move no faster than the speed of light, which is kind of slow where large, complex tools are concerned. Building things that work “over” Isis2 but don’t change it is an especially good path: I’ll be happy to fix bugs other people identify, and then your add-ons can become third party extensions without being somehow key parts of the system. Then as things mature (keep in mind that Isis2 itself is nearly four years old right now), things could gradually migrate into the core release.

I think this is how other big projects tend to evolve towards an open development model: nobody trusts a contributor who shows up one day and announces that he wants to rewrite half the system. But if that person hangs around for a while and proves his or her talents over time, they end up in the inner circle. So I’m not trying to be overprotective, but I do want the system to be incredibly robust.

This is how we’re building the ACID-RAIN system I mentioned. Ittay Eyal owns the architecture of that system, but he has no interest at all in replicating things already available in Isis2, which after all is quite a big and powerful system. So he’s using it but rather than building ACID-RAIN by modifying Isis2, his system will be more of an application that uses Isis2. To the extent that he needs things Isis2 is lacking, I can build them for him. But later, if ACID-RAIN becomes the world’s ultimate SQL solution for the cloud, maybe Isis2 and ACID-RAIN would merge in some way. Over time I have no doubt at all that a talented developer like Ittay could become as expert as he needs to be even with my most obscure stuff.

And the fact is that I do need help. Tuning Isis2 to work really well in these massive settings is a hard challenge for me; more hands would definitely help. Right now I’m in the middle of porting it to work well with Infiniband connections, for example. You might thing such a step would be trivial, and in a way it is: I just need to adapt the way that I talk to my network sockets. But in fact this has all sorts of implications for flow control, and timing in many protocols. A small step becomes a big task. (I’m kind of shuddering at the thought of needing to move to IPv6!) I can support the system now, with 3000 downloads to date but relatively few really active users. If someday I have lots of users and a StackOverflow.com tag of my very own, I’ll need a hand!

Qx Anything else you wish to add?

Ken Birman: Roberto, I just want to thank you for the great job you do with the ODBMS.org blog. I think it has become a great resource, and I hope this little interview gets a few of your readers interested in fooling around with Isis2! Tell them to feel free to contact me at Cornell, or ken@cs.cornell.edu/

—————–
Ken Birman is the N. Rama Rao Professor of Computer Science at Cornell. An ACM Fellow and the winner of the IEEE Tsutomu Kanai award, Ken has written 3 textbooks and published more than 150 papers in prestigious journals and conferences. Software he developed operated the New York Stock Exchange for more than a decade without trading disruptions, and played central roles in the French Air Traffic Control System (now expanding into much of Europe) and the US Navy AEGIS warship. Other technologies from his group found their way into IBM’s Websphere product, Amazon’s EC2 and S3 systems, Microsoft’s cluster management solutions. His latest system, Isis2 (isis2.codeplex.com) helps developers create secure, strongly consistent and scalable cloud computing solutions.

Resources

- Ken Birmann`s Publication Listings.

- ODBMS.org Cloud Data Stores – Lecture Notes: “Data Management in the Cloud” by Michael Grossniklaus, David Maier, Portland State University.
Lecture Notes | Intermediate/Advanced | English | DOWNLOAD ~280 slides (PDF)| 2011-12|

Related Posts

- On Big Data: Interview with Dr. Werner Vogels, CTO and VP of Amazon.com. November 2, 2011

Follow ODBMS.org on Twitter: @odbmsorg
##

Jul 2 13

On Oracle NoSQL Database –Interview with Dave Segleau.

by Roberto V. Zicari

“We went down the path of building Oracle NoSQL database because of explicit request from some of our largest Oracle Berkeley DB installations that wanted to move away from maintaining home grown sharding implementations and very much wanted an out of box technology that can replicate the robustness of what they had built “out of box” ” –Dave Segleau.

On October 3, 2011 Oracle announced the Oracle NoSQL Database, and on December 17, 2012, Oracle shipped Oracle NoSQL Database R2. I wanted to know more about the status of the Oracle NoSQL Database. I have interviewed Dave Segleau, Director of Product Management,Oracle NoSQL Database.

RVZ

Q1. Who is currently using Oracle NoSQL Database, and for what kind of domain specific applications? Please give us some examples.

Dave Segleau: There are a range of users from segments such as Web-scale Transaction Processing, to Web-scale Personalization and Real-time Event Processing. To pick the area where I would say we see the largest adoption, it would be the Real-time Event Processing category. This is basically the use case that covers things like Fraud Detection, Telecom Services Billing, Online Gaming and Mobile Device Management.

Q2. What is new in Oracle NoSQL Database R2?

Dave Segleau: We added significant enhancements to NoSQL Database in the areas of Configuration Management/Monitoring (CM/M), APIs and Application Developer Usability, as well as Integration with the Oracle technology stack.
In the area of CM/M, we added “Smart Topology” ( an automated capacity and reliability-aware data storage allocation with intelligent request routing), configuration elasticity and rebalancing, JMX/SNMP support. In the area of APIs and Application Developer Usability we added a C API, support for values as JSON objects (with AVRO serialization), JSON schema definitions, and a Large Object API (including a highly efficient streaming interface). In the area of Integration we added support for accessing NoSQL Database data via Oracle External Tables (using SQL in the Oracle Database), RDF Graph support in NoSQL Database, Oracle Coherence as well as integration with Oracle Event Processing.

Q3. How would you compare Oracle NoSQL with respect to other NoSQL data stores, such as CouchDB, MongoDB, Cassandra and Riak?

Dave Segleau: The Oracle NoSQL Database is a key-value store, although it also supports JSON as a value type similar to a document store. Architecturally it is closer to Riak, Cassandra and the Amazon Dynamo-based implementations, rather than the other technologies, at least at the highest level of abstraction. With regards to features, Oracle NoSQL Database shares a lot of commonality with Riak. Our performance and scalability characteristics are showing up with the best results in YCSB benchmarks.

Q4. What is the implication of having Oracle Berkeley DB Java Edition as the core engine for the Oracle NoSQL database?

Dave Segleau: It means that Oracle NoSQL Database provides a mission-critical proven database technology at the heart of the implementation. Many of the other NoSQL databases use relatively new implementations for data storage and replication. Databases in general, and especially distributed parallel databases, are hard to implement well and achieve high product quality and reliability. So we see the use of Oracle Berkeley DB, a pervasively deployed database engine for 1000′s of mission-critical applications, as a big differentiation. Plus, many of the early NoSQL technologies are based on Oracle Berkeley DB, for example LinkedIn’s Voldemort, Amazon’s Dynamo and other popular commercial and enterprise social media products like Yammer.
The bottom line is that we went down the path of building Oracle NoSQL database because of explicit request from some of our largest Oracle Berkeley DB installations that wanted to move away from maintaining home grown sharding implementations and very much wanted an out of box technology that can replicate the robustness of what they had built “out of box”.

Q5. What is the relationships between the underlying “cleaning” mechanism to free up unused space in Oracle Berkeley DB, and the predictability and throughput in Oracle NoSQL Database?

Dave Segleau: As mentioned in the previous section, Oracle NoSQL Database uses Oracle Berkeley DB Java Edition as the key-value storage mechanism within the Storage Nodes. Oracle Berkeley DB Java Edition uses a no-overwrite log file system to store the data and a configurable multi-threaded background log cleaner task to compact and clean log files and free up unused disk space. The Oracle Berkeley DB log cleaner has underdone many years of in-house and real world high volume validation and tuning. Oracle NoSQL Database pre-defines the BDB cleaner parameters for optimal configuration for this particular use case. The cleaner enhances system throughput and predictability by a) running as a low level background task, b) being preconfigured to minimize impact on the running system. The combination of these two characteristics leads to more predictable system throughput.

Several other NoSQL database products have implemented heavy weight tasks to compact, compress and free up disk space. Running them definitely impacts system throughput and predictability. From our point of view, not only do you want a NoSQL database that has excellent performance, but you also need predictable performance. Routine tasks like Java GCs and disk space management should not cause major impacts to operational throughput in a production system.

Q7. Oracle NoSQL data model is using the concepts of “major” and “minor” key path. Why?

Dave Segleau: We heard from customers that they wanted both even distribution of data as well as co-location of certain sets of records. The Major/Minor key paradigm provides the best of both worlds. The Major key is the basis for the hash function which causes Major key values to be evenly distributed throughput the key-value data store. The Minor key allows us to cluster multiple records for a given Major key together in the same storage location. In addition to being very flexible, it also provided additional benefits:
a) A scalable two-tier indexing structure. A hash map of Major Keys to partitions that contain the data, and then a B-tree within each partition to quickly locate Minor key values.
b) Minor keys allow us to perform efficient lookups and range scans within a Major key. For example, for userID 1234 (Major key), fetch all of the products that they browsed from January 1st to January 15th (Minor key).
c) Because all of the Minor key records for a given Major key are co-located on the same storage location, this becomes our basic unit of ACID transactions, allowing applications to have a transaction that spans a single record, multiple records or even multiple write operations on multiple records for a given major key.

This degree of flexibility is often lacking in other NoSQL database offerings.

Q8. Oracle NoSQL database is a distributed, replicated key-value store using a shared-nothing master-slave architecture. Why did you choose to have a master node architecture? How does it compare with other systems which have no master?

Dave Segleau: First of all, lets clarify that each shard has a master (multi master) and it is an elected master based system. The Oracle NoSQL Database topology is deployed with user-specified replication factor (how many copies of the data should the system maintain) and then using a PAXOS based mechanism, a master is elected. It is quite possible that a new master is elected under certain operating conditions. Plus, if you throw more hardware resources at the system, those “masters” will shift the data for which they are responsible, again to achieve the optimal latency profile. We are leveraging the enterprise grade replication technology that is widely deployed via the Oracle Berkeley DB Java Edition. Also, by using an elected master implementation, we can provide a fully ACID transaction on an operation by operation basis

Q9. It is known that when the master node for a particular key-value fails (or because of a network failure), some writes may get lost. What is the implication from an application point of view?

Dave Segleau: This is a complex question in that it depends largely on the type of durability requested for the operation and that is controlled by the developer. In general though, committed transactions acknowledged by a simple majority of nodes (our default durability) are not lost when a master fails. In the case of less aggressive durability policies, in-flight transactions that have been subject to network, disk, server failures, are handled similar to process failure in other database implementations, the transactions are rolled back. However, a new master will quickly be elected and future requests will go thru without a hitch. The applications can guard against such situations by handling exceptions and performing a retry.

Q10. Justin Sheehy of Basho in an interview said (1): “I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense” Would you recommend to your clients to use Oracle NoSQL database for banking applications?

Dave Segleau: Absolutely. The Oracle NoSQL Database offers a range of transaction durability and consistency options on a per operation basis. The choice of eventual consistency is best made on a case by case basis, because while using it can open up new levels of scalability and performance, it does come with some risk and/or alternate processes which have a cost. Some NoSQL vendors don’t provide the options to leverage ACID transactions where they make sense, but the Oracle NoSQL Database does.

Q11. Could you detail how Elasticity is provided in R2?

Dave Segleau: The Oracle NoSQL database slices data up into partitions within highly available replication groups. Each replication group contains an elected master and a number of replicas based on user configuration. The exact configuration will vary depending on the read latency /write throughput requirements of the application. The processes associated with those replication groups run on hardware (Storage Nodes) declared to the Oracle NoSQL Database. For elasticity purposes, additional Storage Nodes can be declaratively added to a running system, in which case some of the data partitions will be re-allocated onto the new hardware, thereby increasing the number of shards and the write throughput. Additionally, the number of replicas can be increased to improved read latency and increase reliability. The process of rebalancing data partitions, spawning new replicas, and forming new Replication Groups will cause those internal data partitions to automatically move around the Storage Nodes to take advantage of the new storage capacity.

Q12. What is the implication from a developer perspective of having a Avro Schema Support?

Dave Segleau: For the developer, it means better support for seamless JSON storage. There are other downstream implications, like compatibility and integration with Hadoop processing where AVRO is quickly becoming a standard not only for efficient wireline serialization protocols, but for HDFS storage. Also, AVRO is a very efficient serialization format for JSON, unlike other serialization options like BSON which tend to be much less efficient. In the future, Oracle NoSQL Database will leverage this flexible AVRO schema definition in order to provide features like table abstractions, projections and secondary index support.

Q13. How do you handle Large Object Support?

Dave Segleau: Oracle NoSQL Database provides a streaming interface for Large Objects. Internally, we break a Large Object up into chunks and use parallel operations to read and write those chunks to/from the database. We do it in an ordered fashion so that you can begin consuming the data stream before all of the contents are returned to the application. This is useful when implementing functionality like scrolling partial results, streaming video, etc. Large Object operations are restartable and recoverable. Let’s say that you start to write a 1 GB Large Object and sometime during the write operation a failure occurs and the write is only partially completed. The application will get an exception. When the application re-issues the Large Object operation, NoSQL resumes where it left off, skipping chunks that were already successfully written.
The Large Object chunking implementation also ensures that partially written Large Objects are not readable until they are completely written.

Q14. A NoSQL Database can act as an Oracle Database External Table. What does it mean in practice?

Dave Segleau: What we have achieved here is the ability to treat the Oracle NoSQL Database as a resource that can participate in SQL queries originating from an Oracle Database via standard SQL query facilities. Of course, the developer has to define a template that maps the “value” into a table representation. In release 2 we provide sample templates and configuration files that the application developer can use in order to define the required components. In the future, Oracle NoSQL Database will automate template definitions for JSON values. External Table capabilities give seamless access to both structured relational and unstructured NoSQL data using familiar SQL tools.

Q15. Why Atomic Batching is important?

Dave Segleau: If by Atomic Batching you mean the ability to perform more than one data manipulation in a single transaction, then atomic batching is the only real way to ensure logical consistency in multi-data update transactions. The Oracle NoSQL Database provides this capability for data beneath a given major key.

Q16 What are the suggested criteria for users when they need to choose between durability for lower latency, higher throughput and write availability?

Dave Segleau: That’s a tough one to answer, since it is so case by case dependent. As discussed above in the banking question, in general if you can achieve your application latency goals while specifying high durability, then that’s your best course of action. However, if you have more aggressive low-latency/high-throughput requirements, you may have to assess the impact of relaxing your durability constraints and of the rare case where a write operation may fail. It’s useful to keep in mind that a write failure is a rare event because of the inherent reliability built into the technology.

Q17. Tomas Ulin mentioned in an interview (2) that “with MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements”. Isn’t MySQL 5.6 in fact competing with Oracle NoSQL database?

Dave Segleau: MySQL is an SQL database with a KV API on top. We are a KV database. If you have an SQL application with occasional need for fast KV access, MySQL is your best option. If you need pure KV access with unlimited scalability, then NoSQL DB is your best option.

———
David Segleau is the Director Product Management for the Oracle NoSQL Database, Oracle Berkeley DB and Oracle Database Mobile Server. He joined Oracle as the VP of Engineering for Sleepycat Software (makers of Berkeley DB). He has more than 30 years of industry experience, leading and managing technical product teams and working extensively with database technology as both a customer and a vendor.

Related Posts

(1) On Eventual Consistency– An interview with Justin Sheehy. August 15, 2012.

(2) MySQL-State of the Union. Interview with Tomas Ulin. February 11, 2013.

Resources

- Charles Lamb’s Blog

- ODBMS.org: Resources on NoSQL Data Stores:
Blog Posts | Free Software | Articles, Papers, Presentations| Documentations, Tutorials, Lecture Notes | PhD and Master Thesis.

Follow ODBMS.org on Twitter: @odbmsorg
##