Introducing Geospatial Capabilities in VoltDB v6.0
Introducing Geospatial Capabilities in VoltDB v6.0
Thursday, February 4, 2016 BY Chris Wolff
As part of our 6.0 release, we’ve added new data types and functions that let you work with geospatial data in your applications. If you care about where things are in the real world and how they relate to each other spatially, we have some new features that will help you out.
The New Data Types
We’ve added two new SQL data types that let you store the locations and regions that are important to your application: GEOGRAPHY and GEOGRAPHY_POINT. We’ll discuss the latter first.
The new data type GEOGRAPHY_POINT lets you store data that represents a point on the surface of the earth. This could be the location of a cell phone, a sensor, or anything, really. Here’s some DDL:
create table devices ( | |
id integer not null primary key, | |
location geography_point | |
); | |
insert into devices values ( | |
0, | |
pointFromText(‘POINT(-122.0264 36.9719)’) | |
); |
The above SQL creates a table with a GEOGRAPHY_POINT column and then inserts a row where the device id is zero, and the location field is a point at 36.9719° N, 122.0264° W. To create a point, we use a scalar function called pointFromPolygon. It produces a GEOGRAPHY_POINT value from string input that is formatted as “well-known text.” Well-known text (WKT) is a common convention for representing geometric data as strings.
Likewise, GEOGRAPHY is a data type that can be used to represent polygons whose vertices are points on the surface of the earth. You might wish to use this data type to represent political boundaries (such as states or counties), or any region of interest, with arbitrary shape and size.
create table regions ( | |
id integer not null primary key, | |
name varchar(128) not null, | |
border geography | |
); | |
insert into regions values ( | |
0, | |
‘The Bermuda Triangle’, | |
polygonFromText(‘POLYGON((-64.72 32.16, -80.41 25.30, -65.82 18.40, -64.72 32.16))’) | |
); |
The New Functions
New data types for representing points and polygons aren’t terribly useful without ways to query the data in a meaningful way. That’s why we’ve also added functions. The following examples use the schema described in the previous section.
CONTAINS
The new sql function CONTAINS answers the question, “Does this polygon contain that point?” Here’s an example:
— Which devices are within which borders? | |
select r.id, r.name, d.id | |
from devices as d | |
inner join regions as r | |
on contains(r.border, d.location); |
This query does a geospatial join – that is, relating rows in one table to rows in another, based on a geospatial relationship.
DISTANCE
The DISTANCE function answers the question, “How far is this point from that point or polygon?” Here’s an example:
— Given the id of a particular point, which points are within 1000 | |
— meters of that point? | |
select d2.id | |
from devices as d1 | |
inner join devices as d2 | |
on distance(d1.location, d2.location) <= 1000 | |
where d1.id = 0; |
The AREA function tells us the size of a polygon in square meters.
— Which regions have the highest density of devices? | |
— Returns the top 5 regions and devices per m^2. | |
select r.id, count(*) / area(r.border) as device_density | |
from devices as d | |
inner join regions as r | |
on contains(r.border, d.location) | |
group by r.id | |
order by device_density desc | |
limit 5; |
There are other functions described in our documentation, but these are the most important ones for answering interesting questions.
Putting It All Together
We’ve got an example of a geospatial app in the 6.0 kit. In the example, we simulate a mobile ad broker app. Consider the problem of serving particular ads to a mobile device user based on his or her location. This is a problem that requires high throughput, because there are a lot of mobile device users out there. It also requires a fast response. We want to deliver the ad while the user is still standing right in front of the store, with the device in hand, and we want to deliver it only once.
In the example, we maintain a number of bids made by businesses participating in our ad network. Each bid says how much the business will pay to serve an ad, the region (as a GEOGRAPHY column) in which the ad should be served, and a time window during which the bid is valid (such as during a sale):
CREATE TABLE bids | |
( | |
id bigint not null | |
, advertiser_id bigint not null | |
, region geography(16384) not null | |
, ts_start timestamp not null | |
, ts_end timestamp not null | |
, bid_amount float not null | |
, primary key (id) | |
); |
select id, advertiser_id, bid_amount | |
from bids | |
where current_timestamp between ts_start and ts_end | |
and contains(region, ?) | |
order by bid_amount desc, id | |
limit 1; |
The parameter in this query, passed in from the client, is the current location of the user’s device. The query finds the relevant bids for the current time and the user’s location, and returns the id of the bid that was for the highest dollar amount.
For a more complete discussion of the app, see the accompanying README: https://github.com/VoltDB/voltdb/tree/master/examples/geospatial(link is external)
Or if you’re interested in trying out the demo for yourself, download our VoltDB 6.0 kit and have a look.
Stay Tuned for Future Work
In upcoming versions we’ll introduce indexes on GEOGRAPHY columns that will allow very fast evaluation of CONTAINS predicates, so geospatial queries can be evaluated at the exceptional speeds to which our users have become accustomed.