On AI, Storage Systems and Scalability. Q&A with Paul Speciale
Q1. In the year ahead, you anticipated a resurgence of data architectures that support two key IT priorities: 1) accelerating AI-driven initiatives, and 2) strengthening end-to-end cyber-resiliency to counter the persistent rise in ransomware attacks. Why are these two priorities so important?
In nearly every customer interaction over the past two years, we have resoundingly encountered the dual topics of AI initiatives and cyber-resiliency. Moreover, these topics are now clearly intersecting, due to the growing use of AI to fuel ransomware attacks, and moreover the threat of AI data and models being threatened, lost or tainted from cyber attacks. These are clearly the overriding priorities at the CIO level within enterprises and government agencies across all of our geographies.
The reasons for 1) are clear in that customers are linking business value to AI-centric projects in an increasing way. Customers are convinced that their data is central to these projects, but this still leaves multiple paths to explore in terms of analyzing, processing, innovating with new services and monetizing the data through AI. From a data storage perspective, this has led organizations to a new pattern of data hoarding, in an effort to preserve data that they believe they may need for future AI projects, and might otherwise have deleted or purged. Given that, a key new challenge is to find storage solutions that offer a level of scalability and affordability to make holding all of this data practical for the long term.
For 2), ransomware is certainly the most significant cyberthreat facing organizations worldwide. This issue isn’t going away – quite the reverse: AI-fueled attacks are raising the stakes even further. Second, with all of the data hoarding mentioned above, organizations now realize that protecting this data use in AI initiatives from new cyber-threats is a rising concern. Imaging training an AI model with data that has been infected, tainted, poisoned or biased through unauthorized access. What about corporate AI training data being lost or exposed through exfiltration attacks?
For these reasons, a new standard of cyber-resilient storage is mandatory. That’s why we are calling on the storage industry to move beyond the paradigm of simple “data immutability” and embrace a new, more comprehensive standard of end-to-end cyber resilience. This approach encompasses not only the strongest form of true immutability but also robust, multi-layer protections against data exfiltration and other emerging threat vectors.
Q2. You also mentioned that object storage will emerge as the go-to data storage model for AI application developers. What is your definition of “object storage” and why does it matter in this context?
Object storage simplifies the model of storing large quantities of data into a flat namespace, consisting of object identifiers (IDs), descriptive attributes (referred to as metadata) and the data object itself. Managing these in a flat namespace is fundamentally much more scalable than hierarchical file systems, especially when data may grow to 100s of billions of objects and exabytes of capacity. Object storage is therefore an ideal choice specifically for AI data in several respects:
- Unbounded scale in terms of number of objects, data capacity
- Simplified data access based on object IDs, instead of navigational access through file system folders.
- Stateless, API-driven access is a natural fit with new “cloud native” design principles being used for AI applications.
Applications developers are under huge pressure to bring our new AI-driven services, to boost efficiency and outpace the competition. By using object storage, they no longer need to worry about limitations in capacity, performance inherent to legacy file and block storage.
Q3. What is your assessment of the different kinds of storage models supported by a variety of different vendors?
Many storage vendors are still providing support for legacy block and file storage, mainly to stay compatible with older applications that were built in the pre-cloud eras, where lower scale storage requirements were common. For some use-cases such as enterprise file services, these are the right fit, and offer the right combination of scale and performance for those workloads.
We do now see vendors who provide a mix of object and file system storage in a single system as is the case with our own Scality RING. While we see enterprise applications and new AI applications going toward object storage, a hybrid solution can help bridge the transition from legacy applications.
Q4. Do you really believe that Global 2000 enterprises, government agencies, and cloud service providers will pour resources into AI and machine learning projects?
Yes, we foresee investment in AI maintaining a record-breaking course for the next several years. However, despite the overall hype, organizations ultimately are going to embrace AI with a clear focus on practical purposes. In this highly dynamic situation, only what works in the long run will keep seeing maximum investment, and what fails to bring the promised ROI will be left behind.
Q5. What are opportunities and challenges in this respect?
We believe that Cloud Service Providers (CSPs) are ideally positioned here to serve as aggregators of infrastructure for AI applications to end-user enterprise customers. Since AI compute and storage infrastructure is costly (especially when considering large GPU clusters), CSPs can amortize these costs and recoup them through an economy-of-scale. We already see this happening across the globe, with many large providers announcing cloud-based AI services for managing AI applications, and the data pipeline.
Q6. You also predicted that emerging workloads will demand storage systems that scale beyond capacity and performance alone. What are “emerging” workloads for you?
We include many applications involved in the various phases of the AI data pipeline in the category of emerging workloads. This includes applications that cleanse, filter and augment data ahead of analysis, but also the applications that directly process AI models (and here there are many specific tasks as well ranging from foundational model training, to more common model fine-tuning and inference). More broadly speaking, emerging workloads include Machine Learning, Deep Learning and Natural Language Processing as well as Gen-AI, complex multi-step and large-scale media, to name just a few.
Q 7. As an example you mentioned that the challenges presented by modern AI data pipelines go far beyond the need to store massive datasets. Can you please elaborate on this?
Legacy systems are under pressure and must evolve to handle a whole range of new, complex demands that go way beyond just capacity considerations. For instance, unpredictable data patterns and rapid access speeds that are required particularly by modern workloads expose legacy systems in their limitations. We have seen extreme scale challenges in many dimensions, for example consider the following real-world requirements on our own data RING storage:
- A customer application that creates over a billion objects that needed to be stored
- A single customer has deployed over 2000 distinct applications on a single storage infrastructure
- An application that required access from a million concurrent client threads
- An application that demanded over 1 million authentication requests per second on our storage system
- An application that created over 1000 policies on a single bucket for access control, object locking and lifecycle management. Every policy had to be evaluated by the storage system on every API request, generating massive demands on “storage compute” resources.
The future of storage lies in systems built with a clear focus on flexibility and the capability of scaling across multiple dimensions. A modern storage infrastructure must be able to seamlessly scale in any aspect that might be conceivably required.
Q8. What do you mean when you talk about scalability across multiple dimensions?
Inflexible storage is a hidden cost. As described in the previous question, new workloads are placing unprecedented demands on storage. In addition to the examples provided in the previous response, consider these requirements on a data storage repository that stress the system in different ways:
- From the data hoarding metaphor introduced earlier, organizations require an AI data lake or “Lakehouse” repository to aggregate 100s of petabytes to exabytes of data from multiple external sources into a single repository that can be accessed across the various phases of their AI data pipeline.
- In contrast, an IoT application may send hundreds of thousands of transaction events per second, and they need to be stored as an individual small object in near real-time.
When we look at the dimensions of scale, we talk (apart from capacity) about the number of concurrent apps, the amount of storage compute resources, increasing need for metadata to store high numbers of objects or buckets, ultra high numbers of authentication transactions, and the various dimensions of performance (throughput, object transactions, maintaining low latency), and ultimately we also need to scale systems management across such larger systems. While some of these requirements were predictable ahead of time, many demands on data storage can only be understood at the time of deployment.
Q9. The World Economic Forum estimated that 75% of companies will adopt AI by 2027. What does this mean for society at large?
As with most emerging technologies, the effects on society will be a mixture of positives and negatives. For example, we can clearly see the positive effects AI can make in the areas of healthcare and overall productivity. We will also witness a huge impact on the global job market: WEF argues that almost a quarter (23%) of jobs will change, with AI and tech positions being at the forefront. Jobs availability will change and many kinds of positions are likely to be replaced, particularly in areas where workforces are employed entirely or almost entirely on computers. Other far-reaching implications of AI are to be expected in individuals’ lives, with compromised or biased data leading to potential, AI-augmented issues. In the legal or political realm, AI could be used by malign actors for maligned purposes. Properly controlled and monitored AI utilisation can and should, however, transform lives for the better.
…………………………………..

Paul Speciale, CMO Scality
Over 20 years of experience in Technology Marketing & Product Management. Key member of team at four high-profile startup companies and two fortune 500 companies.
Sponsored by Scality