David Rolfe brings 20+ years of experience managing data in the telecom industry. David helps telecom software vendors meet the scale and latency requirements imposed by 5G data utilizing VoltDB. He helps companies take the steps they need to deploy mass-scale, ultra-low latency transactional applications in cloud-native environments. He has over 25 years of experience with high-performance databases and telco systems and demonstrated expertise with charging and policy systems. He has authored multiple patents relating to geo-replicated conflict resolution.
Q1. How do 5G and edge computing relate to each other?
5G has some fairly ambitious expectations for round-trip latency. In older 4G systems, people (and standards) were thinking in terms of tens of milliseconds. Now, we’re talking single-digit millisecond or sometimes even a single millisecond. This is easy enough to do on a whiteboard but rather hard to do in reality. Traditionally, we’ve relied on the cavalry — in the form of Moore’s Law — to come and help us with this dramatic growth in volumes. But when you get into single-digit millisecond latency, the speed of light limits how fast we can communicate to around 180 miles per millisecond. So, I may have unlimited access to a planet -sized data center, but that data center is of no use if it’s 60 milliseconds away and I need a response in 3 milliseconds.
So how does edge come into this? By using small devices that are either nearby or possibly even attached to the target device, we can avoid the latency issues and a lot of the implicit network traffic that a centralized approach would dictate. While the term ‘edge’ is new, the concept isn’t. Smart meters and other devices that interact with some remote controlling entity have been around for years. What’s different this time is that while connectivity used to be an optional extra, a lot of edge applications have it as a baseline assumption and crucial requirement for being able to function. In many cases, the connectivity will be so good that you could make do with a much simpler, dumber edge device, if it weren’t for the latency requirements.
Q2. How do you see 5G latency and scale expectations challenging the traditional options of data processing?
Latency: When Ikea invents a new product they start with the price and work backwards. Similarly, if we’re to successfully operate in a low-latency 5G universe, we need to start with the available latency budget and then figure out how to meet our business needs within it.
Several things developers regard as ‘normal’ become problematic when latency is your enemy:
- Useful services hosted by hyperscalers may be too many milliseconds away to be of use. While Amazon, Google, et al. are busy extending their footprint to telco data centers to try and address this, there is no indication or reason that they will price edge services as cheaply as their data center offerings.
- Multi-layer stacks inevitably introduce internal latency that can only be removed by getting rid of the layers.
- Even a simple stack consisting of a client and a back end data processing platform may become problematic if solving a business problem involves dozens of round trips.
- Then there’s garbage collection in JVM languages. A 4-millisecond stall wasn’t a huge problem until you had a new SLA to answer requests in 3 milliseconds.
So, to work in this space, you ideally need a platform that can maintain state and take decisions in one move.
Scale: I was recently on a call with a major company that provides the policy and charging systems used by about a quarter of the world’s phone companies to manage their network. We were discussing 5G and the future in general and my counterpart casually mentioned that data volumes would be going up ten-fold. What was interesting was that this wasn’t an attempt to shock me or surprise me; it was a simple statement of fact. The telco world used to be about human-to-human communications, but we will soon pass the tipping point where the majority of traffic is computer generated. So while there are practical limits to human-generated activity, there is no limit on how many devices humans can own or indirectly use. Each one of these devices will use the network in its own way, and thus there is no absolute limit on how much network traffic we can generate or how complex that traffic can be. For such a system to work, that traffic needs to be managed and managed well, without compromising on data accuracy, especially as we no longer have humans in the loop to stop clearly bad behavior. The need becomes even more pronounced when machine type communication (MTC) comes into the picture since data accuracy leads to decision accuracy.
Q3. What are the main requirements for managing the demands of event-driven transactional stream processing?
- An acceptance of complexity: The most important thing is that people understand how complicated real-world data streams and the rules for processing them can be in real life. In a prior job I was responsible for handling tens of billions of records a day for a major telco. We were doing data warehousing, event processing and big data before any of those terms had even been invented. This stuff is complicated, and anyone who thinks that their needs will be met by a stream disguised as a single SQL statement is going to be sadly disappointed.
- Scalable support for transactions: The word “transactional” is important here. By definition, the outcome of a transaction is not known in advance. If it were predestined, we could just randomly distribute traffic to a swarm of stateless processing engines and it wouldn’t make a difference. But in a transactional world, we need to take decisions based on a stable set of facts that might change slightly based on our actions as we allocate finite resources and thus change how we will treat the next incoming transaction. Doing this at scale is a huge challenge.
- Reliability: The nature of streaming data is that if you stop processing it you immediately end up with a backlog. While backlogs can be handled, trying to work with multiple streams that are no longer synchronized in time is challenging to say the least, and is one of those things you need to avoid. As a consequence, you need a platform that is designed to stay up even if you lose one or more servers.
- Low-latency introspectable queues: This term is a bit of a mouthful, but we keep seeing requirements where people have a firehose of data, all of which needs to processed and then sent downstream to a data lake, but maybe 0.5% needs to be acted on within milliseconds because it’s talking about situations that can be monetized if only you can react fast enough. Video game analytics is a good example of this. The vast majority of the data leads to and is used for insights around level design and things like that, but a tiny portion is needed to adjust the difficulty of the game in real time or even to sell virtual goods to players at the time they are most incentivized to pay for them. A traditional queue won’t handle the 0.5% use cases, and a traditional OLTP system can’t handle the volume.
Q4. Who is going to use 5G services such as massive machine-type communications (mMTC) and ultra-reliable low latency communications (uRLLC) ?
“mMTC” assumes you have a very large number of relatively taciturn devices in a small geographical area less than a mile in size. Aside from the telco-related challenges of getting that many small antennas to play well together in a small space, there are many downstream operational ones. Here at VoltDB we’ve had relevant experience in this space in the form of ‘smart meters’. While they are spread out geographically, they mirror other aspects of mMTC, such as scale, limited communications capabilities and a wild variety of APIs and protocols.
What we’ve learnt is that even if somebody provides you with baseline connectivity to your heterogeneous herd of devices, an enormous amount of time and energy will be required to manage them. Simply translating the different APIs is a major project in itself, and that’s before we get into the realms of secure message delivery and software patches.
As a consequence, I simply don’t see the traditional internet business model of “give it away now and figure out how to charge later” working in this space, nor do I see it being a fully open ecosystem. This implies that mMTC will be very much industrial/factory floor stuff with fairly high ‘table stakes’ required to enter.
As for uRLLC: I’m not sure I ‘get’ the connected car use cases. Manufacturers such as Tesla are focusing on using computing power in the vehicle instead of offloading the work to a remote processing location. Connected factories make much more sense. Prior to the invention of electricity, factories were laid out so that the machines could somehow be attached to the available power source, be it stream or hydraulic. Electrical power allowed us to lay out the machines rationally and thus create a production line, albeit for a single product. By allowing every machine to communicate with every other machine reliably, wirelessly, and nearly instantly, we could create factories where the flow of parts and unfinished goods continually changes as the facility dynamically re-arranges itself according to the needs of the moment. For many items, you could even use teams of robots that throw parts to each other instead of using a conveyor belt. Before you dismiss this as unlikely, see what Boston Dynamics robots can now do individually and then imagine what a team of them could do in a factory if they had a technology such as uRLLC to coordinate their behaviour.
Q5. Are ACID properties still needed for supporting 5G services?
Nobody ‘needs’ ACID until they experience the consequences of not having it. ACID, along with SQL, created a 54 billion dollar industry that still largely exists today.
But the original legacy RDBMS vendors were reluctant to rewrite their products for the 21st century and stuck with an architecture that originally assumed a single server, a single processor, human end users, and a tiny amount of RAM fronting cheaper spinning hard drives.
While this 1980’s architecture easily supports hundreds of transactions per second, diseconomies of scale kick in as you add additional CPU cores. Solving this problem is one the main reasons VoltDB was created. Most of the NoSQL products were launched without solving ACID fully, on the assumption that when ACID was needed, developers could implement it themselves. We’re now at a point where ACID is being retrofitted to NoSQL products as it becomes apparent that the same problems that led to the RDBMS still exist even if you aren’t using one.
So, to bring this back to the subject of 5G: 5G needs ACID transactions as much, if not more so, than any other problem we might want to solve. Think of a really simple example such as allocating an IP address that one and only one end device is supposed to use. If we allocate it twice, chaos will erupt. If we think we’ve allocated it, but nobody has it, we have the beginnings of a resource leak. Or consider a more complicated example where someone has a finite amount of credit and owns multiple devices, all of which are trying to spend that last unit at the same moment in time. The sheer volume of 5G transactions will mean that events which are in theory rare will start happening every couple of minutes. Some of these events will involve valuable things, which is why we need ACID.
Q6. So, is eventual consistency a bad idea?
Immediate consistency is a hard technical challenge for a horizontally scaled, shared-nothing data platform. Eventual consistency is much simpler to implement, but it’s very hard to claim to be ACID compliant when the underlying data structures that provide the data your ACID transaction uses as input can silently and retroactively change after your transaction. In addition to producing the wrong results, it’s a potential regulatory and legal nightmare, as not only did you take the wrong decision, but you might struggle to explain why you did it. While this level of ambiguity is acceptable for many use cases such as social media feeds, at the end of the day, 5G and telco involve the allocation and consumption of shared, finite resources such as credit and bandwidth. The decisions that get made have consequences that are potentially hard to reverse. So, from a 5G/telco perspective, eventual consistency is generally unwelcome when it involves changeable data.
Q7. Anything else you wish to add?
Much of the talk and chatter over the last decade has been about exciting new database products that didn’t have the licencing constraints and feature bloat that we came to associate with legacy RDBMS.
The stereotypical “2017 Database Story” would be someone who is using a ‘free’ NoSQL database with no ACID and eventual consistency for a mass scale social media application. I would argue that we’re now at the point where things are swinging the other way:
- A lot of legacy applications are reaching the point where a rewrite makes commercial sense, especially if the goal is to make them cloud native. The ‘low hanging fruit’ is gone and now we’re into hard use cases that directly affect a business’s bottom line. The newer database platforms face challenges in this situation.
- ACID is making a comeback. Database vendors are either retrofitting ACID on top of architectures that never envisaged it, or making heavily footnoted claims that they support ACID.
- People are becoming leary of eventual consistency. It turns out that while it does not matter if an entry in your social media feed changes due to eventual consistency, the same flexibility does not apply when money or other things of value are at stake. Here at VoltDB we’ve had multiple customer conversations with organizations where eventual consistency has attracted negative attention at the C level.
- SQL is also making a comeback as people need a standard language for manipulating complex data structures. However, as with ACID, retrofitting SQL can create as many problems as it solves, especially if you’re creating some kind of pidgin-SQL with subtly different semantics. VoltDB has used SQL from the very start, which means that the very early design decisions that frame a product’s future don’t impede our ability to use it now.
- We also see the concept of ‘free databases’ coming under pressure. We’ve had several conversations with people over the years along the lines of, “We’re never going to pay anything for a database ever again. If you’re not free, we’re not interested”. But this has changed over the last 18 months. The fact is that pretty much every new database company out there is VC-funded, and sooner or later Sand Hill Road will want its money back. We’re also seeing the hyperscalers offer their own versions of popular open source databases as DBaaS, bypassing the creators of the code and aiming to capture the “Fortune 500” revenue that may have played an important part in the original business plan.
So far, both MongoDB and Redis have tweaked their licencing to try and capture the money involved, but the changes also raise real questions about DBaaS for private clouds, which has to led to at least one large corporation we’ve spoken to suddenly backing away from offering DBaaS because their lawyers suspect they are vulnerable to a licence audit.
My guess is that the stereotypical “2023 Database Story” will involve people trying to migrate a complex legacy system to a newer SQL-ish database, and that a depressing amount of time will be spent navigating the small print of repeated changes to database vendor licenses. As for VoltDB? We’ll continue to focus on what we’re good at, which is Telco and high volume/low latency transactional stuff.
Sponsored by VoltDB