On the AI technology stack. Q&A with Alexis Wedrychowski.
Q1: In your opinion, how is GenAI changing how enterprises do their business?
Generative AI is transforming enterprise operations in profound ways, offering capabilities once considered aspirational. IDC projects that AI and generative AI spending will reach $307 billion in 2025 and more than double by 2028, underscoring its growing impact. GenAI is revolutionizing business by automating repetitive tasks, enhancing customer experiences, predicting market trends, and accelerating R&D cycles. By leveraging vast amounts of both structured and unstructured data, GenAI requires an agile and scalable infrastructure. This is especially true with regard to data storage solutions.
As organizations scale, they’ll find that legacy storage systems may struggle to meet the performance, throughput, and capacity demands of AI workloads. In particular, during the data preparation phase, where petabytes of data are handled and prepared for AI model training, storage solutions must handle high-throughput access without bottlenecks. Enterprises are under growing pressure to invest in storage solutions that allow them to integrate vast amounts of data quickly and efficiently, support flexible interfaces like S3 APIs, and meet the overall performance demands of AI models.
Enterprises focused on enabling GenAI are well advised to prioritize object storage for its ability to manage large-scale, unstructured datasets, enabling high-speed access mandatory for efficient model training.
Q2: What are the main challenges you see for enterprises to be “GenAI-ready”?
Enterprises looking to be “GenAI-ready” face several hurdles, with the primary one being the sheer scale of data needed for training and deploying AI models. Petabytes of data are often involved, especially during the data-preparation phase, where large datasets are aggregated, cleaned, and augmented. Traditional storage solutions, whether file systems or legacy systems, often struggle to provide the required throughput or scalability for AI workloads.
Metadata consistency and searchability become critical, as enterprises need the ability to quickly filter and retrieve specific data subsets, often through tools like SQL-based searches on object storage or file system interfaces. The challenge lies in handling unstructured data across many sources, ensuring that data is in a format ready for training. This often requires data cleansing, augmentation, and transformation processes. The integration of AI frameworks such as PyTorch or TensorFlow also places high demands on storage performance, where low-latency access is crucial for efficient model training.
In addition, enterprises face the complexity of data siloing and integration of legacy systems into modern AI stacks, making storage modernization and workflow streamlining key components in achieving AI readiness.
Q3: Could you give us your suggestions on what is the preferable “AI technology stack” to address these challenges?
To overcome the challenges of scaling for GenAI, modern enterprises need a technology stack that is comprehensive, scalable, and performance-optimized. Key components include:
- Data Storage & Management:
A flexible object storage solution, like Scality’s RING, is critical for managing unstructured datasets at scale. The ability to handle petabytes of data, provide fast access to data across both cloud and on-premises environments, and ensure interoperability via S3 API is key for AI workflows. Given that AI models often require vast datasets for training, choosing a system that can support the heavily throughput-centric data preparation phase is a top focus. - Data Processing Engines:
Tools like Apache Spark, Starburst, and Dremio are indispensable for parallel processing and analyzing massive datasets. These tools integrate well with object storage systems and allow AI models to process and access data quickly, which is crucial for speeding up training cycles. - Machine Learning Frameworks:
AI frameworks like PyTorch and TensorFlow are essential for building sophisticated models. They benefit from the high throughput and low-latency access that advanced storage systems like RING provide. - AI Infrastructure (Cloud & GPUs):
High-performance GPUs are fundamental for resource-intensive workloads. As AI models scale, so must the storage infrastructure, with systems capable of supporting thousands of GPUs for model training and inferencing. GPU Direct technology will also play a role in reducing latency when data is moved between storage and GPU memory. - Governance & Security:
As more data is used for AI tasks, ensuring compliance, encryption, and auditability is paramount. A storage solution like RING provides built-in data protection, encryption, and scalability to safeguard sensitive data, which is crucial in regulated sectors.
Q4: How is Scality contributing?
Scality is supporting enterprises to tackle modern data management challenges associated with GenAI adoption by offering a high-performance, scalable, and secure storage platform. Scality RING provides seamless scalability to manage massive datasets, which is especially crucial in AI workloads like data preparation. The ability to store and quickly retrieve petabytes of unstructured data is essential in feeding AI models efficiently, especially where high-throughput access to datasets is a necessity for optimized AI model training.
Scality’s solution enables rapid retrieval from data lakes and integrates smoothly with popular tools like PyTorch, TensorFlow, and Starburst. As enterprises scale their AI initiatives, Scality ensures that their data infrastructure can scale without performance bottlenecks, all while delivering robust data protection and compliance.
Q5: As part of the French Genomic Medicine Plan (France Médecine Génomique 2025), SeqOIA is one of only two national laboratories integrating whole genome sequencing into the French healthcare system to benefit patients with rare diseases and cancer. Could you explain how Scality helped SeqOIA to implement a solution for their genomics data?
Scality’s RING storage platform enabled SeqOIA to integrate whole genome sequencing into the French healthcare system by providing a highly scalable, performance-optimized storage solution for genomics data reaching multi-petabyte scale. To be precise, SeqOIA’s genomics use-case is specifically in scientific computing (big data/HPC), not AI, but this still required a storage system that could handle large data volumes and ensure quick and efficient access to this data for analysis.
Over time RING demonstrated the ability to handle most performance needs and therefore has grown in several phases to handle 90% of their genomics storage requirements, and to do so cost effectively at scale. RING also enabled SeqOIA to manage this data across multiple locations, ensuring high availability and durability, even in the event of hardware failures. With data being replicated across nodes, SeqOIA was able to safeguard sensitive patient data and maintain compliance with regulations, such as encryption standards.
Q6: Specifically, how did you help solve their analytics processing needs?
Scality solved SeqOIA’s analytics processing needs by providing a storage solution that integrates seamlessly with analytics platforms. The ability to quickly access and process large genomic datasets was crucial for SeqOIA’s work in healthcare. Scality’s RING architecture allows parallelized data processing, ensuring that analytics engines can access and process nearly 10 petabytes of critical research and diagnostic data throughout its lifecycle, spanning from lab data to processed data, at accelerated speeds and a cost 3 to 5 times lower than that of all-flash file storage.
Q7: How is Scality RING used in this context? How does RING protect the petabytes of mission-critical data that enable SeqOIA to carry out their mission of improving care for patients?
Scality RING plays a critical role in SeqOIA’s mission by providing a distributed, fault-tolerant storage solution that ensures the protection and availability of mission-critical data. SeqOIA’s genomic data, often exceeding petabytes, is safely stored across multiple locations to ensure data redundancy and high availability.
RING’s features such as automated replication, continuous data protection, and built-in encryption ensure that sensitive patient data remains safe and accessible for analysis and future use. Additionally, the platform’s scalability allows SeqOIA to keep pace with rapidly growing datasets without sacrificing performance, enabling them to continue advancing genomic research.
Resources
2025 French Genomic Medicine Initiative
………………………………………….

Alexis Wedrychowski, VP Sales France, Scality.
Sponsored by Scality.