However in the IoT world, being able to run analytics as close to the source as possible has become a necessity in order to reduce the amount of data being transferred over the wire. Not only that, it’s also crucial if your goal is to provide quick feedback, even if limited, to the edge device (transportation vehicle, oil rig, mobile device, etc).
The value of the data decreases as the gap grows between the moment the data arrives at the point of action taken in response; from a business point of view, we need to analyze fast and as close as possible to the source.
If we can reduce that time span, hence provide near-real-time, or in the better case scenario real-time insights and action – the more value it will provide.
Shifting to Edge Analytics
Edge analytics is an enhancement to the classic approach to data collection and analysis in which an automated analytical computation is performed on data at a sensor, network switch or another device instead of waiting for the data to be sent back to a centralized data store and have the analytics run in the backend.
We’re all too familiar nowadays with WiFi connected devices such as washing machine, robot vacuum cleaners, home thermostats etc. These connected devices usually send out statistics or their status, which is a nice thing to have via a smart home hub or displayed on a mobile phone if we want the illusion of control. However, connected devices is only half of the IoT revolution, the second half is having them run self-diagnostic and statistical analytics on themselves before sending the information to us for further analysis thus making them finally not only connected but also smart. A simple example is that the thermostat computes a prediction according to the weather, our home arrival time, traffic and other information so it can start heating the water and tell the vacuum to start cleaning so it would finish before we arrive home. It’s not the matter of informing us but also a matter of making decisions and taking actions on the source or edge device.
Instead of sending all of the information to a single source, a central location of sorts, and only then applying transformations and analytics and only to then send feedback to the other devices – edge analytics is a faster, more agile way to go about it.
If we look at the telecommunications industry, for example, that’s exactly how they once operated. There was one central hub where a phone operator would sit, take your call and connect you to your destination. Not to mention that you’d often had to “order” a line beforehand and only after such a line was available, then the operator could connect you to the destination. All the calls were being routed through one central location and the analytics was only being done in that central hub.
Nowadays, you can reach your call destination by simply entering the number on your personal device, while all the routing is being done automatically and faster. No need to wait for the information to reach the central hub for you to analyze it. Information can now be analyzed at all time, partially or fully, in any way you choose – even on your own device.
Let’s discuss another real life example. Think about dozens of airplanes circling the airport looking to take off or land. The ability to analyze their data in seconds or in sub-seconds is more than just revenue, it’s true business value relates to human life. Every second counts, every slight wrong turn or technical issue has a massive domino effect.
In theory, if the initial process of analytics could have already been started before data was sent to a centralized location, e.g. on the original device (airplane or any other computer device), then an action could have already been taken based on preliminary insights – thus shortening the gap between insights and action.
In Gigaspaces terms, partial algorithmic or analytical functions can be performed on the edge device all while syncing the data asynchronously in a non-intrusive manner from the edge devices local grids to a hub or a centralized grid where further and much heavier calculations would be performed on the raw or aggregated data.
Embracing Quality Analytics Platforms
Doing real-time analytics is all about ingestion rate, simplifying workflow and component-reduce architecture to reduce the TCO. Looking at lambda or kappa architecture nowadays, simplifies everything by treating all information as streams is great but still, provides accidental complexity due to the amount of moving parts in the architecture.
Taking conventional analytics workflows and throwing the heavy-lifting onto the in-memory platform for convergence or stream unification is the only logical conclusion. This “NoETL” workflow is all about relying on Kafka as the message broker and InsightEdge as the NoETL/Stream Processing mechanism, on top of all the other Spark advantages.
The second phase is polyglot analytics, which defines the ability in quality analytics to converge real-time stream with the archived data stored on a persistent storage layer.
Keeping in mind that experts are predicting that 40 zettabytes of data will be in existence by 2020, we need platforms that can handle this kind of volume. Going back to the suggested component-reduced architecture, the correlation factor between those two event streams is a key player in the Kafka/Data Lake (or more loosely used term Data-Swamps).
This architecture allows us to not only reserve ourselves that possibility of returning to every piece of data in every resolution, we can also leverage high-end yet simplified analytical tools, such as Spark, to run in-memory computations for real-time and historical data convergence.
It’s clear to see that performing real-time analytics on data has become increasingly key for businesses across industries. Actually, it’s rather become a necessity in order to keep up with the fast changing world and demanding customer experience or fast changing regulations.
However, despite the transformational potential of big data, a 2016 report published by the McKinsey Global Institute (MGI) found that most industries have still not come close to realizing the full potential of data and analytics.
What’s holding them back? We’ll deep dive into that in a future post.