Oracle Big Data Spatial and Graph
Q&A with James Steiner, VP of product management, Oracle
Editor Roberto V. Zicari
Q1. You just announced Oracle Big Data Spatial and Graph. What is the difference (if any) from existing graph databases, such for example Neo4j?
We have a set of spatial functions, services and visualization components to support Hadoop workloads and for a number of traditional geospatial workloads well-suited to the Hadoop environment. While Big Data Spatial and Graph can be used to address a range of traditional GIS-related workloads, it also supports any HDFS data and does not enforce a “spatial-centric” approach to working with data.
For graph, we have a distributed property graph database that supports HBase and Oracle NoSQL Database with a rich set of built-in, in-memory graph analytic functions.
There are a number of unique characteristics in what we are delivering today, but it is also important to highlight that there are many things that are common with other graph database technologies. Like other property graphs, Oracle Big Data Spatial and Graph supports the Blueprints APIs, commonly used graph formats, Java APIs and popular scripting languages, bulk loading and text index integration, and an application developer console. It also includes solr/lucene text processing.
One of the challenges with graph database technologies is that they provide only a few, basic graph analysis operators and rely on the developer or data scientist to roll their own functions.
Oracle Big Data Spatial and Graph includes a multiuser, in-memory graph analytics engine with more than 35 high performance, parallel, analytic functions. It incorporates the leading Social Network Analysis (SNA) algorithms.
These algorithms include most of the commonly used functions for link prediction, detecting components and communities, ranking and walking, and path finding. These built-in graph functions eliminates the need to for data analysts to define, write, compile and test their own algorithms using an ad-hoc analysis tool.
Q2. How does Oracle Big Data Spatial and Graph relate to Apache Hadoop and NoSQL database technologies?
The Spatial features support any HDFS data. When applying the spatial functions to HDFS data, the programmer maps their data using a Hadoop input format. The spatial filtering, location proximity and containment analysis, etc. run as MapReduce jobs against any user data. Users can perform large-scale operations for data cleansing, preparation, and processing of imagery, sensor data, and raw input data with the raster services. Users can work with raster data stored in HDFS in virtually any file format, perform analysis such as mosaic and subset, write and carry out other analysis operations, visualize data, and manage workflows. Hadoop environments are ideally suited to storing and processing these high data volumes quickly and in parallel across MapReduce nodes.
What may be most interesting to the Hadoop user is an enrichment and categorization service, implemented as a MapReduce application. Users can take data with any location information, enrich it, and use it to harmonize their data.
For example, Oracle Big Data Spatial and Graph can look at datasets like Twitter feeds that include a zip code or street address, and the enrichment service will use a popular geographic hierarchy/place-name dataset included with Big Data Spation and Graph to add or update city, state, and country information. It can also filter or group results based on spatial relationships: for example, filtering customer data from logfiles based on how near one customer is to another, or finding how many customers are in each sales territory. These results can be visualized on a map with the included HTML5-based web mapping tool. Location can be used as a universal key across disparate data commonly found in Hadoop-based analytic solutions.
For the property graph in Big Data Spatial and Graph, the graph database is stored in HBase or Oracle NoSQL Database.
Hadoop provides flexible, scalable and persistent storage management that is not possible in single machine environments.
The graph database architecture is designed for highly scalable, distributed storage and supports graphs of nearly unlimited size.
Q3. Why supporting distributed property graph?
Oracle Big Data Spatial and Graph architecture is designed to overcome the scalability and performance issues that our enterprise customers are encountering when trying to deploy existing graph databases. Graph analysis can be computationally expensive because most graph analytic functions routinely involve touching most of the nodes in the graph.
As a result, the data access pattern is commonly non-sequential (random). The Big Data Graph overcomes this challenge by carrying out fast graph analytics in a parallel, in-memory space. Scalability is further enhanced by combining parallel, in-memory queries with a distributed (HBase or NoSQL) persistent storage layer.
The graph APIs introduce efficient access and filtering of the graph (or sub-graphs) from back-end storage into memory, resulting in an ideal platform for Big Data problems.
Essentially, we want to enable analysis of massive graphs. For example, when you deploy the property graph on Oracle Big Data Appliance, a single node of the BDA can manage a graph with 25 billion nodes and 200 billion edges. A graph this size could be used to model a friend of a friend graph for the population of the planet, with each person averaging 25 friends.
BDAs range from 6 to 18 nodes and multiple BDAs can be interconnected. So this graph on an Oracle Big Data Appliance can comprise trillions of nodes and edges – the kind of scale we expect to see when modeling and analyzing the Internet of Things.
Q4. What kind of spatial analysis functions and services do you offer?
Oracle Big Data Spatial and Graph has a pretty robust set of spatial analysis functions, but the vector functions are currently oriented towards the “location enrichment” of structured and semi-structured data for business applications.
A library of geographic hierarchy data covering worldwide countries, states, counties and cities is provided as a template out of the box, as well as named hierarchy datasets for text matching. You simply select the data set you wish to process, and the template with the geographic hierarchy of your choice.
Businesses can use spatial data as the basis for associating and linking disparate sales results to actual sales territories. Location information can also be used to track and categorize entities based on their proximity to another person, place, or object. Real-time applications such as location-based advertising are supported through the use of geo-fencing techniques. These techniques allow vendors to offer store specific promotions to mobile customers approaching their place of business.
Oracle Big Data Spatial and Graph APIs invoke commonly used spatial operations. You write a MapReduce job in your application that calls Java methods like: buffer or point in polygon – which executes these operations very quickly.
Users can specify query results to be written to HDFS.
Q5. Some graphs (like road, telco, water, and utility networks) have an inherent spatial component. Other types of graphs (Internet of Things, biological pathways, FOAF) may not. How do you handle this variety of data ?
You are correct in observing that there are a range of graph models. But one of the unique things about Oracle’s Big Data management offering is the way our Hadoop and NoSQL technologies and our relational database technologies complement one another. On Oracle Database, Oracle Spatial and Graph option includes a Network Data Model graph specifically designed with an inherent spatial component for road, telco, water and utility networks. These graphs are well-suited to the Oracle Database environment in terms of their size, tight association with transaction-oriented and operational applications, and compliance to corporate security policies common to applications built on Oracle Database.
The property graph in Big Data Spatial and Graph is designed to address the social network analysis, friend-of-a-friend and Internet of things workloads that are increasingly being addressed on the Hadoop/NoSQL platform.
Q6 What are the insight that can be provided by various forms of graph analysis?
The recent interest in graph databases is due to the rich insight provided by various forms of graph analysis, such as graph traversal, recommendations, finding communities and influencers, and pattern matching. For example, important relationships and patterns are found in social network data from Facebook, a listener’s music preferences from an online music service like Spotify, online shopper behavior on eBay, and bloggers and their relationship to followers and other bloggers.
These relationships can be readily structured as a graph – a set of vertexes and edges and properties that represent these relationships.
Q7. What about scalability and performance for your graphs analytics? Do you have any performance measures you can share with us?
We’ve designed the graph to use in-memory computational analysis and in-memory pattern matching for maximum performance.
The graph algorithms are pre-compiled for much faster execution. While we do not publish performance numbers as a rule, the commercial product is based on leading-edge work performed by Oracle Labs. In tests done on the underlying PGX algorithms that we have incorporated into Big Data Spatial and Graph, operations like PageRack, Triangle Counting and Path Queries can run multiple times faster than with some other distributed propery graph frameworks and property graph execution engines.
Qx Anything else you wish to add?
Oracle has nearly two decades of experience working with spatial and graph database technologies. We have combined this with cutting edge research from Oracle Labs to deliver advanced analytics for the NoSQL and Hadoop platform. This is part of a broad Big Data Management strategy to give our customer a wide range of choices when deploying workloads in their enterprises. Big Data products like Oracle Big Data Appliance, Big Data SQL, Big Data Connectors, Oracle R, Oracle NoSQL Database and Big Data Discovery along with Oracle Database 12c comprise a complete enterprise Big Data Management platform.
– Oracle Big Data Spatial and Graph (LINK to Download and Documentation)