Q1. What are the technical challenges posed by the 5G revolution?
The challenges with the 5G revolution can be broadly categorized into spectrum related, service based architecture related and agility of decisions related. When you think of 5G, of course, the media is focused on the promised speeds and latencies. What gets drowned out is that the speeds that are being advertised are only accomplishable with high frequency bands. But with high frequency bands comes high interception as well, which means you will have 100X more small cell towers than the number of macro towers. This increase in cells is going to increase the number of events for two specific functions, namely Access Management Function (AMF) and Session Management Function (SMF).
With service based architecture, data stores have been unified into two categories, viz. structured subscriber data in UDR (User Data Repository) and all other data in an unstructured storage function in UDSF (Unstructured Data Storage Function). These are extensions of the shared data storage concept that become mainstay with the genesis of Virtualized Network Functions (VNFs). Both of these functions require a scalable, in-memory database technology that not only performs, but also does not lose data and provides accurate answers and decisions. This takes the need beyond data storage and streaming data, and combines the needs into a single platform.
The decisions being made in the modern system are no longer based off of static rules and policies – rules are continually evolving by learning from new data and training models. These learnings must be deployed into the decision making process, and this needs to happen in a seamless manner, in a live environment, with no stoppage of service.
Q2. What are the limitations of the current data streaming architectures?
Current data streaming architectures are looking at data streams as either event time windows or process time windows to “batch” the data in order to process. But reality doesn’t operate in discrete time windows. When looking at either forms of time windows, you are either missing important events or waiting/holding data for too long.
The evolution from Lambda architecture to Kappa architecture addresses updated code reprocessing of the data without needing a batch layer, but still operates within the constraints of time windows. The binary view of time windows is unnatural due to the chunking of data processing. On top of this, the current stream processing solutions can only process individual streams in conjunction with some static data, or process the data as an enrichment pipeline (even when done as a continuous stream). This is not suitable for the evolved streaming requirements in which the stream is not only processed for downstream applications, but also needs to drive decisions either per event or in a complex event processing frame of reference.
Turning database technology inside out involves a lot of compromise, as seen in almost all stream processing solutions attempting to retrofit a database and interact with the store. This approach sacrifices many guarantees like atomicity and consistency that one would expect and need from a data platform. Bottom line is we need to go beyond the Kappa architecture.
Q3. What is VoltDB’s new Smart Stream Processing Architecture?
When you look at what needs to happen in the earliest stages of data’s journey i.e. the streaming space, there are three main functions: ingestion, processing and storage.
While there are disparate solutions for each of these functions, in a world that now requires low latency complex decisions at scale, you need a system that brings all these functions together. Now the interesting part is that while typical stream processing systems look at the processing part as a “pipeline” activity, it is more often than not a decision making process oriented to initiate an action. Secondly, all new streaming data is ultimately going to be new training data for the learning systems. When the learning iteration is complete, you need to bring that new and improved insight into your decision making process. What VoltDB has done is bring all of these capabilities into a single platform i.e. combining ingest, store, process/decide, notify/alert and import machine learning outcomes into a SQL accessible form. This helps our customers deploy continuously evolving systems like credit card fraud prevention, telecom fraud prevention, customer value management and IIoT security management. This confluence of capabilities in a meaningful manner is what we call Smart Stream Processing Architecture.
Q4. How do you manage to offer accurate contextual state and embedded machine learning without sacrificing scale, performance or accuracy across multiple streams of data?
VoltDB is designed to bring a massive level of distributed storage and computing to database technology. But once we broke the shackles of “are we a database” or “are we a data processing platform,” we were able to extend this fundamental architecture to suit handling all things “fast data.” The database roots of VoltDB already ensure the accuracy of the data, while the streaming integration frameworks and ability to migrate data out of VoltDB as and when necessary allows our customers to leverage VoltDB for convergent stream processing. Here, multiple streams can converge to create a macro context to make smarter decisions instead of being bound by the artificial constraints of event time windows or process time windows. Our users deploy the machine learning insights into their decision making code by converting a PMML model into a VoltDB User Defined Function that can be used in SQL to drive ever-improving decisions without needing to change code.
Dheeraj Remella is the Chief Technologist at VoltDB responsible for technical OEM partnerships and enabling customers to take their next step in data driven decision making. Dheeraj has been instrumental in each of our significant customer acquisitions. He brings 22 years of experience in creating Enterprise solutions in a variety of industries. Dheeraj is a strong believer in cross pollination of ideas and innovation between industries and technologies.