On Data Fabric and Data Mesh. Q&A with Jeffrey Fried
“It’s fairly common that organizations are using a data fabric pattern without calling it by that name. For example, Health Information Exchanges (HIEs) use the data fabric pattern, but they were conceived before the term data fabric existed.”
Q1. What is a Data Fabric and what is it useful for?
A Data Fabric is an architecture to access data across multiple technologies and platforms within an enterprise. It takes data from disparate sources and make it available for various applications and analytics use cases. One key attribute of a Data Fabric is the use of metadata to help orchestrate data across all these different places, and to provide common governance across all of it.
Data Fabrics are useful wherever there are multiple sources of data that need to be tapped by multiple different applications.
Q2. Is Data Fabric the next trend in the data architecture approach for enterprises? Why?
Yes indeed, the Data Fabric is highly hyped and lauded as the next data architecture. Every major analyst firm now talks about Data Fabrics. Forrester has been talking about Data Fabrics for 4 years and is now talking about “Data Fabric 2.0”.
Garter lists “Data Fabric” as a “top strategic technology trend” and predicts that by 2024, 25% of data management vendors will provide a complete framework for data fabrics. This hype is fast becoming a reality; InterSystems has customers in every market segment we serve benefiting from the data fabric architecture.
This is because there is a real and pervasive problem not solved by data warehouses, data lakes, or data lakehouses. Data is not only silo’d, it is distributed, and it needs to be governed well independent of where it lives. Unlike these other previous trends, Data Fabrics don’t aim to replace current data repositories, they tap into the data across systems and unify it.
Q3. And what is a Data Mesh instead?
Data Mesh is a more recent term and a very similar concept. Data Mesh is described as an intentionally designed distributed data architecture, under centralized governance and standardization for interoperability, enabled by a shared and harmonized self-serve data infrastructure.
Q4. Did you ask ChatGPT: What is the difference between a data fabric and a data mesh?
Of course. I’m obsessed with ChatGPT and I always like seeing what it comes up with for questions like this. Try it yourself!
Q4. Do you agree with the answer you got?
Mostly. ChatGPT correctly identified that these two concepts are very similar and address the same underlying problem of siloed distributed data that is a mess to govern. It said: “A data fabric is technology-centric and tackles the complexity of data and metadata in a smart way that works well together. On the other hand, a data mesh focuses on organizational change and is more about people and process than architecture. In a data mesh, domain teams own and are accountable for their data, while in a data fabric, a single team oversees all data across a single environment.” That’s a key difference in my view: the data fabric emphasizes common governance with centralized policies, while the data mesh advocates for federated governance where each team sets their own policies.
However, ChatGPT went on to describe differences in data access and data storage between the two that are just incorrect. I won’t go so far as to call these hallucinations, they’re just fallacies. Both of these data architectures can handle distributed data and either API access or analytic access.
The definitions of both of these architectures are still evolving and morphing, but I think they will settle out to view a data mesh as a governance pattern that is best implemented on a data fabric. In other words, I don’t think it’s either or. Perhaps that’s what threw ChatGPT off.
Q5. Data Silos are the #1 Barrier to Innovation. Will a Data Fabric and/or a Data Mesh be of help to break and share these silos?
Certainly yes. Bridging across many data silos with a variety of types of data requires a data architecture that can connect to many systems, collect data if appropriate, and unify this data with common metadata and governance. Data fabrics make it easier to tap into silos of data and make them available securely. They also include facilities like data catalogs that make it easier to understand the provenance and purpose of a data set, so that data can be used appropriately.
Q6. Isn’t this also a people attitude rather than a pure technical one?
Absolutely. Data Governance, for example, is all about people and processes, and not very much about the technology.
I think there are a few things that make people not want to share their data. Sometimes this can be about power – there is still an attitude that controlling data brings power, but most people do come to understand that sharing the data they control brings more power than holding it tightly.
Mostly people don’t want to share data due to fear. Fear of data loss or data breaches is behind security policies, and it’s a well-founded fear. Fear of misuse or misapplying data is another real concern – using data out of context or without regard to its limitations can lead to false conclusions. And lack of trust in data, often stemming from real data quality issues, can keep people from sharing data.
I often run into tension between “IT” and “the business” around these areas. Business owners with a need for agility come into conflict with IT owners requirements to provide safety, compliance, and security. This may seem like a power issue, but it rarely is. Most IT organizations want to provide self-service access to business owners and help them move quickly…it’s just a hard thing to do safely.
That’s why an architecture like data fabric and data mesh can help with people’s attitudes too. Making it easier to tap into silos securely and safely reduces security fears. Simplifying self-service access alleviates this tension between IT and business groups. Data catalogs can help prevent misuse of data, as well as making it quicker to identify what datasets and data products are available and who to talk to about them. Data quality tools help to assess and improve data quality and improve trust.
Technology is only part of the answer, of course, but the right data architecture can make it much easier to do change management and address people’s attitudes and habits.
Q7. What are your recommendations on how to support an organization? Data Warehouse vs Data Lakes vs Data Fabric?
I don’t think there’s a one-size-fits-all answer to that question. It depends on what an organization already has and what is already working. But creating a data fabric, or even multiple domain-specific data fabrics, is generally a good idea if an organization has complex data picture. And most large enterprises have plenty of complexity.
If you have one or more data warehouses, I recommend keeping what’s working and tapping into these warehouses from a data fabric. Fix the things that aren’t working by understanding the problem and seeing if a data fabric can simplify the picture. The same goes for data lakes (although I rarely find a data lake implementation that’s truly working well).
Should you look at a data mesh in addition to a data fabric? I think that depends on whether you want to build and publish true data products. Some organizations have data that they can offer widely as curated, high value data sets. But if you are taking this on, it needs to be staffed and managed by people that have a true product mindset. A data product should have a product manager and a technical team focused on it, in which case federated governance makes sense, since that data product will also have some specific policies associated with it.
Q8. Could you give us an example of a “good” data strategy? What would be an example of a “bad” data strategy?
A good data strategy is connected to the business strategy, written down succinctly and kept alive, and is supported by (and supportable by) the technology and staffing deployed in an organization. A good example is the NHS information strategy – you can find the document online. A bad data strategy is the converse – it might be conceived in a vacuum, published and then forgotten, buried in a mound of paper and policies until it is incoherent, or unrealistic and unimplementable. Or it might be missing completely; having no strategy is definitely a bad strategy. Sadly, I think bad data strategies outnumber good ones in practice.
Q9. InterSystems approach calls for a Smart Data Fabric. Please tell us more about it.
The “Smart Data Fabric” is the InterSystems take on the Data Fabric architecture. It includes everything that we’ve been talking about, including the idea that a data mesh is essentially a variant of the governance approach and can be built on the Smart Data Fabric.
What makes this “Smart” is that it has built-in analytics and a common data plane. Putting analytics within the data fabric is the best approach for real-time analytics, since it ensures that data is in sync and available at the lowest latency.
Typically there’s analytics and applications fed by the Smart Data Fabric in addition to the analytics and machine learning built into it. The common data plane comes from the approach we use with InterSystems IRIS, where all the parts are built together and we project data into different forms rather than copying it. This keeps the architecture simple and the footprint and cost low.
I could talk about this for hours, but we’ve published some good material on it that you can read if you want to know more:https://www.intersystems.com/use-cases/smart-data-fabrics/
Q10 Can you briefly described some relevant Smart Data Fabric Use Cases?
Sure. In financial services, the Smart Data Fabric is being used to build out multiple applications on the same data, including regulatory reporting, asset management, market intellegence, and more. Each new application can take advantage of all the data that’s under the umbrella of the data fabric, and may connect to more data sources and bring them into the data fabric. In supply chain applications, the Smart Data Fabric is being used to create a ‘digital control tower’ that spans across multiple existing systems such as inventory, shipping, and orders. In health care, the Smart Data Fabric helps with managing operations, improving care, and managing claims – similarly tapping into multiple disparate data sources to create one reality.
Qx Anything else you wish to add?
It’s fairly common that organizations are using a data fabric pattern without calling it by that name. For example, Health Information Exchanges (HIEs) use the data fabric pattern, but they were conceived before the term data fabric existed. It’s been relatively straightforward for InterSystems to build out our Smart Data Fabric because we’ve had most of the underlying capabilities for some time already, and have quite a number of customers for whom we fielded Data Fabrics without using the term. Now that the concept has a name in the industry and organizations are adopting it, we are in a great position to serve them.
Jeff Fried, Director of Platform Strategy for InterSystems, is a long-standing data management nerd, and particularly passionate about helping people create powerful data-driven applications. Prior to joining InterSystems, Jeff was CTO of BA Insight, and ran product management for FAST Search and Transfer and for Microsoft. He has extensive experience in data management, text analytics, enterprise search, machine learning, and interoperability. Jeff is a frequent speaker and writer in the industry; holds 15 patents; and has authored more than 50 technical papers and co-authored three technical books.
Sponsored by InterSystems.