On IoT and Time Series Databases. Q&A with Brian Gilmore.
“ As the IoT becomes mainstream and consumerized, many look at IoT as a threat. Everyone has heard some horror story about a vacuum cleaner bot-net or backdoor access to customer data through the automation system attached to the fish tank in the lobby. But, fundamentally, these are still IT systems, and the IT and cybersecurity best practices we’ve designed over the decades still apply.”– Brian Gilmore
1. What is the IoT, really?
The IoT is simply the next evolutionary phase of the internet. An era where physical machines (things) become full citizens of the world’s networks, producing and consuming information and communicating as peers with other devices, applications, and humans. Like many of the previous phases, the early days have been experimental, with much of the work done in the realm of science, academics, and niche technical industries.
As the IoT becomes mainstream and consumerized, many look at IoT as a threat. Everyone has heard some horror story about a vacuum cleaner bot-net or backdoor access to customer data through the automation system attached to the fish tank in the lobby. But, fundamentally, these are still IT systems, and the IT and cybersecurity best practices we’ve designed over the decades still apply. We see this regularly when it comes to monitoring — zero-day vulnerabilities aren’t specific to IoT. Using threat frameworks to test for known threats and applying anomaly detection to find new vulnerabilities is critical in IoT, just as it is elsewhere.
Ultimately, the IoT is an opportunity for technology practitioners to contribute to the digital transformation of enterprises and beyond. IoT professionals’ networking, programming, and analytics skills translate well from those built-in information technology and operational technology roles. As a result, IoT will slowly blend back into the more prominent megatrends of cloud, edge, and internet and will find its purpose at the intersection of technology and cultural and economic progress.
2. What role do time series databases take in the IoT ecosystem?
Considering the range of solutions that overlay the implementation of systems where IoT is a factor, metrics and events are at the core of everything. These two data types are fundamentally time series and represent the performance and operation of the system itself. Therefore, the instrumentation of every machine, process, and interaction, and the collection and storage of that data to be easily recalled, analyzed, aggregated, and forecasted is critical to the monitoring, managing, optimization and commercialization of the IoT investment itself.
Unfortunately, many IoT projects don’t start with this in mind. Many off-the-shelf IoT platforms come with some form of general-purpose database — something built for document or graph storage or even a relational database. While these databases can store and analyze timestamps, they often break down at production scale or run out of horsepower for truly transformational use cases. Time series databases bring critical functionality to the solution in both back-end architecture (time series databases rarely use row-based tables) and in the scripting languages they provide. In addition, practitioners and developers can find value in the tools and application integrations offered by the IoT and time series ecosystems the vendors work diligently to develop and maintain.
Planning for a time series database early and as a core component of the IoT solution stack can be the difference in making it out of the lab and into the real world. It can also prevent very expensive re-tooling later in the development process. It is essential to consider how you will deploy your time series database, how it will work in a distributed environment, whether it is open source, commercial, or both, and if it will lock you into a specific cloud vendor or region/availability zone. You want your time series database to enable your IoT solution, not stand in its way.
3. What are some typical time series analytics applied in IoT solutions?
Monitoring: Keeping a record of metrics (numeric and other indicators) over time and using that data to understand trends and how they relate to system availability, performance, and security. Monitoring may include counting how many times a CPU hit 100% in 24 hours or tracking the rise and fall of surface temperatures on a satellite as it orbits the earth. Monitoring can be done visually, through dashboarding tools, or automatically through the application of machine learning and other statistical analysis processes. Anomaly detection, forecasting, clustering, and categorical and continuous prediction all pull indicators from the noise of operational data and make those indicators actionable for operators and consumers.
Optimization: To optimize an IoT system, you must define the relationship between related but discrete time series streams and model those relationships into behavioral models (sometimes called Digital Twins). Stakeholders can then use these models to detect and remove operational inefficiencies. Optimization takes monitoring one step further and overlays additional context, which considers time series data in a more macro sense — whether through labeling, grouping, or hierarchical organization. Applying similar statistical and machine learning techniques to these higher-definition time series groups opens the door to causal analysis, what-if analysis, Monte Carlo Simulation and more to anticipate and prevent failure. In addition, these efforts improve the ability to tune and improve processes for cost reduction and customer satisfaction.
Commercialization: One of the great promises of IoT is that it’s an enabler of the “Great Servitization” of our global economy. As IT shifts to the cloud, practitioners are exposed to consumption-based business models. Here you pay for computing based on CPU time, storage based on disk utilization over time, and networking based on bandwidth utilization over time. The “over time” part should be clear; services like this monitor and track your consumption in a time series database and use those metrics to calculate your bill. Similar “pay for what you use” models are finding their way outside of the cloud providers and into transportation, housing, insurance, and more. Next time you use Uber or Airbnb, or use a pay-per-mile auto insurance service, think about the IoT and time series databases at work behind the scenes.
4. What other technologies run alongside time series databases in production IoT?
The most important thing to understand is that IoT alone isn’t a thing. IoT is one of many technologies that can drive significant value and transformative outcomes when implemented together and organized well. Time series databases are a vital component of both IoT solution architecture and success, but understanding how the whole ecosystem comes together in the solution is critical.
IoT solutions almost always start with a machine-to-machine (M2M) application. In the “Industrial Internet of Things,” this is almost always an industrial control system, either SCADA or a related gateway application like Kepware KepserverEX. In modern enterprise IoT and consumer-focused connected products, these M2M systems are often brokers like MQTT and Pulsar or IoT platforms like ThingWorx and Losant. Therefore, choosing a time series database that integrates easily with these platforms is an essential first step.
For end-user interfaces and interactivity, mobile and cloud services are king. For example, the mobile and web applications built for homeowners to manage their thermostats and heating and cooling systems need to interface with the M2M system for setpoint control and the time series database for temperature, humidity, and heating/cooling state history.
That historical data from the time series database integrates with advanced modeling and statistical analysis tools like R, Spark, and TensorFlow. These integrations power the energy-saving automation to help customers realize additional ROI on their IoT investment.
Those developers will eventually need a time series database with the APIs and SDKs to connect their devices, applications, M2M systems, and ML platforms at scale. You must choose wisely here. When it comes to production applications, segmenting time series data by customer and device, modeling and predicting across customers by demographic, and returning that information to the customer-facing app quickly, in seconds if at all possible, are challenges very few databases can meet.
Brian Gilmore Director of IoT Product Management, InfluxData.
Brian Gilmore is Director of IoT Product Management at InfluxData, the creators of InfluxDB. He has focused the last decade of his career on working with organizations around the world to drive the unification of industrial and enterprise IoT with machine learning, cloud, and other truly transformational technology trends.
Sponsored by InfluxData.