On Distributed File Systems and Object Storage. Q&A with Paul Speciale

by Roberto Zicari · May 9, 2024

Q1. What is the common definition of distributed file systems and object storage?

A distributed file system (DFS) and object storage system is “distributed” over multiple servers and/or locations. This allows users and applications to access any file transparently, regardless of its underlying server or location. The definition commonly requires that both data and metadata (descriptors about the data) are distributed across multiple servers, to provide scalability and also to avoid single-points-of-failure that could limit system availability or data durability.

Note that a distributed file system, typically provides data access over standard network file protocols (NFS and SMB), whereas a distributed object storage system would be accessed over a RESTful API (Amazon S3 API is today’s defacto standard object protocol). Some offerings – such as Scality RING – combine both a DFS and distributed object storage together in a single solution.

Q2. What is their purpose?

The purpose is to store and protect large amounts of unstructured data such as files representing documents, media assets (videos and images), log and event data, analytics data and much more. The distributed nature is required to provide storage for large amounts of data, more than can be stored on a single server. The distributed system provides access through a single namespace to data stored across all the servers/locations in the system, to offer simplified access to large data capacities.

Q3. Scality is positioned as A Leader again in the 2023 Gartner® Magic Quadrant™ for Distributed File Systems and Object Storage. What is your view of the future, the direction of the market and your role in shaping the market?

We see key customer requirements for future and enhanced capabilities in a few areas:

Supercharged cyber-resiliency for data, above and beyond today’s already capable immutable storage.
Simplified management of large amounts of data across the lifecycle (as data ages from hot to warm to cold data).
Broadening the spectrum of object storage to use-cases that require ultra fast response times.

Q4. In the Gartner report it is mentioned that “The challenges with managing unstructured data are expanding from scale, performance or availability to include cyber resilience, data management, hybrid cloud, single platform for file and object workloads and storage-as-a-service.” How does Scality handle these challenges?

Scality has built end-to-end cyber-resiliency in our offerings, to build on already strong data immutability (Scality terms these capabilities as the CORE5). Together, this creates a multi-layered defense against both today’s ransomware threats and future (potentially AI driven) ransomware.

Our solutions also provide integrated, hybrid-cloud capabilities, including a namespace that can span across local storage and public cloud storage in AWS, Azure, Google, Wasabi and multiple regional cloud service providers. This enables users to mirror or tier on-premises data to any desired public cloud locations.

Scality’s flagship RING has provided integrated, distributed file AND object storage in one solution over the last decade, and is in use with 100s of large enterprises globally for use-cases that require both interface presentations. This is offered as-a-service under the HPE Greenlake Managed Services (GMS) umbrella for the last several years, with major customers in financial services, healthcare, law enforcement and more using the solution through this model.

Q5. What are the pros and cons in offering a single platform to manage all of the unstructured data in an organization?

There are multiple very compelling advantages (pros) including:

One system to manage, which is a massive simplification for storage admins versus managing tens (or more) individual storage systems (this is the famous “islands of storage” problem which is well known, and a major contributor to higher costs of storage management).
Simplified data access for users and applications: a single system with a single namespace makes data much more easily accessible, since users won’t need knowledge a-priori about which system to connect or mount for their data.
Distributed systems can grow in capacity (this is known as scale-out), without disruptive limits or boundaries. This helps businesses to grow without being concerned or constrained by underlying infrastructure limitations.
Many solutions also provide integrated utilization metering and reporting, to make it easy to provide billing or internal charge-back on a storage consumption basis.

On the disadvantages:

Admins will need to implement strict security guidelines and best practices for user authentication, access control to ensure data privacy in a larger shared system. Most distributed systems make this possible by providing comprehensive security and multi-tenancy capabilities.

Q6. Tell us about the concept of storage-as-a-service, what is it useful for and for whom?

For cloud service providers, this is a natural offering to their customers much like AWS provides with the S3 service, or Azure with Blob Storage. STaaS means that they can expose file or object interfaces on the cloud, for access by customer applications – with consumption based metering and billing. The service provider delivers this to customers with full management, so this greatly simplifies the burden on the end-user customer for large-scale storage.

For large enterprises, they employ the STaaS model internally as the foundation for their own private cloud environments. This enables large-scale storage services for 10s to 1000s of internal applications running in the cloud, and provides internal IT with all of the advantages outlined above.

Q7. Is using a public cloud really a good idea to store all of the unstructured data in an organization?

Yes and no. The public cloud is ideal for many use-cases, for example for in-cloud applications it would not make sense to store data in a remote on-premises storage solution. On the other hand, for on-premises applications in a corporate data center, using remote cloud storage may lead to higher latencies and low performance.

Ultimately, the cost of public cloud storage must be evaluated carefully for each application or workload. Very IO intensive workloads will incur the dreaded “hidden fees” in cloud storage related to data retrieval, which can dramatically amplify their total cost.

Q8. What about hybrid cloud workflows?

Hybrid-cloud storage certainly fits well into a multi-tiered (multi-copy) data protection strategy, such as for backups with copies both on-premises and in the public cloud. For organizations without a second corporate data center, the public cloud can be an ideal secondary data center to store a mirrored copy of data for disaster recovery purposes.

Q9. Most ransom ware attacks target unstructured datasets. What are the solutions available to avoid or minimize this?

For securing backup data from ransom ware threats, we believe that immutable storage forms a baseline of defense. However, with ransom ware actors becoming more sophisticated, immutability alone is not enough to make a storage solution truly unbreakable against ransom ware. Traditional attack vectors go in the data-path (for example to encrypt or delete data) for ransom, but others exploit vulnerabilities in the networks, in humans (malicious and inadvertent), or attack the core architecture at the operating system level. For that reason, we advise to look for storage solutions that offer end-to-end cyber-resiliency capabilities directly in the solution. Scality ARTESCA today offers this through its integrated CORE5 capabilities.

…………………………………….

Paul Speciale, CMO, Scality

Over 20 years of experience in Technology Marketing & Product Management.

On Distributed File Systems and Object Storage. Q&A with Paul Speciale

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

HPCC Systems from LexisNexis Risk Solutions

KX

InterSystems

MySQL/Oracle

SingleStore

Supporters

McObject

NEXTGRES

Progress

Raima

Scality

Volt Active Data