On Geospatial Data, Time Series Data. Q&A with Brian Gilmore
What is geospatial data?
Geospatial Data is simply information about a physical location. We typically format this information as some combination of latitude, longitude, and altitude. Other times, systems generate XYZ coordinates alongside a single geographical “key” for point 0, which allows the encoding of spatial data into geospatial data in the fewest possible characters. Geospatial data can stand alone but usually annotates other information as metadata. For example, temperature readings from a weather station would likely include the geo coordinates of the weather station. This metadata would enable us to visualize the temperatures on a map and filter, group, and aggregate with location-tagged information from nearby weather stations.
What is time series data?
Time series data is a string of measurements (usually numeric, but not always) that occurs within and over time. For example, if we were to collect a temperature reading from our weather station every 5 minutes, we would want to append the reading’s date and time (timestamp) to put it in the context of time. This context allows us to organize our temperature readings into individual trends and to aggregate these readings within and across windows of time. For example, it would be impossible to calculate the average daily temperature over the last eight weeks without those timestamps.
How do they overlap?
There’s also a third, closely related class of data called “geo-temporal” (sometimes also called spatio-temporal). Here, both time and location are valuable contextual information, and together with the measurement, we can express state, value, space, and time within a single record. This context is especially valuable when location information changes along with time, for example, from a mobile asset. If we were to attach our weather station to a free-floating weather balloon, the temperature reading would be of little value without the key context of geo coordinates (including altitude) and timestamps.
How is geo-temporal data stored and analyzed?
It is common to store geo-temporal data in a table, usually a CSV or a relational database table or view. While it may seem logical (and simple) to keep timestamp, latitude, longitude, altitude, and temperature in rows and columns, this strategy can break down when scaling the database and deriving value from its contents. You can see how EnerKey solved this problem in their case study.
If you can imagine a fleet of 1000 weather balloons, you can start to see where we have a problem. Creating a CSV file or DB Table for each weather balloon, aggregating and comparing temperatures across weather stations becomes a massive JOIN. In addition, our timestamps are unlikely to match up cleanly when storing second or ms timestamps, and this will require lossy evaluation to normalize those timestamps and temperatures across our fleet. Adding a column for “Balloon ID” may look like a good solution, but get ready to dig into significant challenges grouping and cursoring to run more complex geo-temporal analysis — for example, comparing temperatures by time, location, and Balloon ID. Finally, imagine adding and analyzing other key data points from the balloon, such as wind speed and humidity and factoring those values into your calculations. No thanks!
Are there technologies that make this easier and better?
Specialized time series databases are purpose-built for these use cases, and some go beyond storage and recall of information by time and location. For example, if your project’s database performance and speed are key, find a time series database that enables the indexing of both timestamp and geolocation. Indexing time and space isn’t a small task, as both dimensions use logical windowing, for example, “daily temperature by country.” Effectively and efficiently converting raw time and location data into human-readable and understandable constructs is a wheel you don’t want to reinvent.
What comes next for geo-temporal data?
Beyond this, some of the best time series databases provide highly efficient geometric indexing as part of their ingest API, data pipeline, or query language. For example, evaluating geo-temporal data with a library like Google S2 or Uber H3 allows us to encode high-precision latitude and longitude data into new fields which represent hierarchical cell- and hash-based location groups. These hashes allow highly efficient filtering, nested grouping, and sorting of geo-temporal data. In addition, these hashes can enable advanced geospatial analytics via geofences, paths, and geometry edges — necessary capabilities for today’s and tomorrow’s IoT and Industrial IoT analytics use cases. When it comes to creating, training, and deploying ML models for important predictive use cases, every inch and every second matters. Making complex geo-temporal data both useful and valuable to humans and machines is critical to seeing a return on your database investment.
Brian Gilmore, Director of IoT Product Management, InfluxData
Brian Gilmore is Director of IoT Product Management at InfluxData, the creators of InfluxDB. He has focused the last decade of his career on working with organizations around the world to drive the unification of industrial and enterprise IoT with machine learning, cloud, and other truly transformational technology trends.
Sponsored by InfluxData.