Five Challenges to IoT Analytics Success
By Dr. Srinath Perera
From smart personal devices to smart homes to smart cities, the Internet of Things (IoT) is changing the way we work, play, travel, and even power our homes and offices. In fact, IDC predicts that 30 billion IoT devices and sensors will be in use by 2020, and collectively they will generate 10% of the world’s data.
Of course, whether IDC’s prediction plays out will depend on how effectively we can make use of the vast information being captured by all those “things.” IoT-driven solutions carry a number of specific challenges when it comes to capturing, analyzing, and acting on data in a meaningful way. As a result, many enterprises will need to rethink the analytics strategies that have traditionally served them well.
Let’s dig into the five IoT analytics challenges most likely to affect businesses and government organizations.
IoT Analytics Need to Balance Scale and Speed
Most of the serious analysis for IoT will happen in the cloud, a data center, or more likely a hybrid cloud and server-based environment. That is because, despite the elasticity and scalability of the cloud, it may not be suited for scenarios requiring large amounts of data to be processed in real time. For example, moving 1 terabyte over a 10Gbps network takes 13 minutes, which is fine for batch processing and management of historical data but is not practical for analyzing real-time event streams.
At the same time, because different aspects of IoT analytics may need to scale more than others, the analysis software implemented should support the flexibility to do so. That is true whether the software is deployed in the data center, cloud or some combination of the two.
Some Analytics Will Occur at the Edge
IoT sensors, devices and gateways are distributed across different manufacturing floors, homes, retail stores, and farm fields, to name just a few locations. Yet moving one terabyte of data over a 10Mbps broadband network will take nine days. So enterprises need to plan on how to address the projected 40% of IoT data that will be processed at the edge in just a few years’ time. This is particularly true for large IoT deployments where billions of events may stream through each second, but systems only need to know an average over time or be alerted when a trends fall outside established parameters.
The answer is to conduct some analytics on IoT devices or gateways at the edge and send aggregated results to the central system. A good example is the sensor in a refrigerated railroad car that reports the average temperature every minute or communicates when the temperature rises above an acceptable threshold. Through such edge analytics, organizations can ensure the timely detection of important trends or aberrations while significantly reducing network traffic to improve performance.
Performing edge analytics requires very lightweight software, since IoT nodes and gateways are low-power devices with limited strength for query processing. To address the challenge, more analytics solutions are utilizing CEP engines that offer a small footprint suited to the cloud and gateways. Several other companies are working on edge analytics products and reference architectures. Significantly, because edge computing is heavily contextual, there is no one-size-fits-all solution.
Event Streams Drive Real-Time Insights
Most IoT information is based on streams of event data analyzed in real or near-real time. Often, it is used to support critical infrastructure, such as patient care systems, electric grids, and train networks where the ability to respond in seconds can protect people’s health and safety, as well as prevent system failures costing millions of dollars.
As a result, IoT deployments typically require some form of complex event processing (CEP) and streaming analytics. The software should handle time-series data, time windows, moving averages, and temporal event patterns. Among open source CEP engines are Esper and Siddhi, and two popular open source technologies for streaming analytics are Apache Storm and Apache Spark, which both provide capabilities for real-time event processing. Another option is the cloud-based Google Cloud DataFlow. With each offering, there are tradeoffs, so an IoT implementation’s specific requirements will determine the technology approach.
Along with streaming analytics, some organizations are turning to time series databases (TSDBs) for their IoT implementations. The databases require timestamps on all data and are capable of writing data within milliseconds. Examples of TSDBs include OpenTSDB, InfluxDB, and Google KairosDB.
IoT Analysis is Only as Reliable as the Data
Data coming in from IoT sensors present their own special challenges. Tremendous numbers of nodes are pushing data through low-bandwidth IoT networks, which may cause delays that put information arriving from different sensors out of sequence. And inevitably sensors will fail, creating issues about whether sensors should keep data and send it later. Other challenges can include collection latency, duplicate messages, and reliability.
IoT analysis utilizing time windows and temporal sequences can be used to ensure the proper order of inbound data. This is important because a progression of events may, for example, indicate that an engine part is heading for failure. However, time windows and temporal sequences require dedicated rule sets and queries. Moreover, there is no commercial solution or open source project that developers can apply to their systems. Instead, many IT organizations will need to develop custom rules and queries to support the specific requirements of their IoT analytics implementations.
Additionally, IoT analytics need a sophisticated data “munging” or cleanup layer to reorder data, detect and discard or fix erroneous data, and impute the gaps in the data when they inevitably arise. A number of tools in big data processors exist to support munging, such as those provided with Apache Hadoop, Apache Hive, Apache Pig, Apache Storm, Apache Spark, and Apache Shark.
Prediction Adds to the Power of IoT Analytics
The greatest—and as yet largely untapped—power of IoT analysis is to go beyond reacting to issues and opportunities in real time and instead prepare for them beforehand. That is why prediction is central to many IoT analytics strategies, whether to project demand, anticipate maintenance, detect fraud, predict churn, or segment customers.
Increasingly, machine-learning algorithms complement statistical models for handling prediction. These algorithms will automatically learn underline rules, providing an attractive alternative to rules-only systems, which require professionals to author rules and evaluate their performance. When applied at run to a CEP engine, in-memory computing, or other system that supports real-time, streaming analytics, machine learning for predictive analytics can provide valuable and actionable insights.
Several frameworks for machine learning have emerged in recent years. These include state-of-the-art open source machine learning libraries, such as Apache Spark MlLib, TensorFlow and Theano. The fact that there are no licensing fees for such solutions has helped to level the playing field and foster innovation among companies of all sizes. It also means that, even if you don’t innovate using these combined technologies, most likely your competitors will.
# # #
Srinath Perera, Ph.D., is vice president of research at WSO2 (http://wso2.com). He is an elected member of the Apache Software Foundation, and a Project Management Committee (PMC) member. Srinath also serves as a research scientist at the non-profit Lanka Software Foundation.