On next-generation Data Architectures. Q&A – Jeffrey Fried
“The most common mistakes organizations make when implementing a next-generation data architecture, like a data fabric or data mesh, stem from trying to do everything at once.”
Q1. InterSystems Global Summit 2022 will be a three and half-day event held in Seattle, Washington, June 20-23, 2022 at the Hyatt Regency Seattle. What is the main focus of this event?
The theme of the event this year is Innovations in Data, so the conference will focus on how we create and provide the tools to find new and better ways to leverage InterSystems technology. This year’s event will be live and in-person and present a number of opportunities to network, interact with the InterSystems product team, and see demos. We’ll also have several customers and startups sharing the innovations they’ve done using InterSystems technology. And, of course, we’ll unveil the latest innovations from InterSystems, too.
Q2. What are the most common challenges to implement a next generation data architecture?
The demand for data continues to explode at unprecedented volumes, new types of data are being captured, and a need for advanced analytics is growing in importance. Further, companies face a governance issue as data is increasingly spread across multiple locations.
Increasing amounts of data processing are moving to the cloud; in many organizations transitioning to the cloud is now a mandate. Traditional data architectures just can’t keep up with the range of requirements, hence the shift to the cloud and call for next-generation data architecture.
The stakes are high. Businesses are increasingly data-driven, sometimes entirely data-driven, so when data management can’t keep up, the business suffers. Many organizations face the challenge of selecting a next-generation architecture and deciphering between the complex implementations available. A common misstep is to approach the situation by stitching together a number of separate point solutions. While it may seem less risky or even cost effective, this type of strategy introduces complexity, delays time to value, and increases the total cost of ownership. Modern data platform software provides a broad and deep set of needed functionalities spanning integration, database management, analytics, and API management, greatly reducing the number of moving parts, simplifying architectures, lowering the total cost of ownership, and speeding time to value.
The most common mistakes organizations make when implementing a next-generation data architecture, like a data fabric or data mesh, stem from trying to do everything at once. Don’t try to boil the ocean. Start small. Measure and quantify the benefits and learn as you go. It’s a process and a journey. Learn, adjust and get value at every step along the way.
Q3. What are your suggestions on how to overcome them?
In order to overcome these challenges, a business needs to focus on what its end goal is – streamlining and simplifying its architecture to focus on productizing data. When building a data fabric or mesh for the first time, data interoperability is essential. Some of this is a mindset – recognizing that there isn’t a one-size-fits-all solution and focusing instead on the sets of patterns that need to be supported.
There’s no product on the market that does a complete data fabric, so organizations should expect to combine a couple of different products to implement a data fabric architecture. This can be a benefit as some of the existing data repositories can fit into the data fabric architectures. In addition to data repositories, integration facilities, and self-service analytics, data catalogs, and data quality facilities are key components of a data fabric. The trick is to keep this as simple as possible since merely putting all existing systems under a common umbrella can result in a very complex and fragile solution.
Before building the actual architecture, teams need to understand their data consumption and regulatory compliance needs in order to make proper use of the system. A lack of understanding can create complexity or even failure of the architecture itself. Appointing a Chief Data Officer (CDO) can be a good organizational move, and speed the path to success. This will foster top-down data governance and provide necessary organizational support for a cohesive data strategy from start to finish.
Sunsetting legacy applications takes a lot of time and effort, but organizations shouldn’t be held back by these limitations. Investment in modern data management technologies such as enterprise data fabrics enables firms to continue to run their legacy systems and stitch together distributed data from across the enterprise, as well as provide analytical capabilities and insights from the source data in real-time. Cataloging these patterns and implementing flexible architectures that can handle multiple different patterns (often with the same data) is important. In turn, modern data management can greatly simplify architectures by reducing the number of different products needed to build and maintain a smart data fabric.
Q4. What is a data fabric and what are the benefits?
A data fabric is a reference architecture that provides the capabilities needed to discover, connect, integrate, transform, analyze, manage, utilize and store data assets which enable the business to meet its myriad of business goals faster and with less complexity than previous approaches, such as data lakes. At its most basic level, a data fabric can be described as a web that is stretched across a network of a business’s existing data and technology assets. The fabric connects disparate data and applications whether it is on-premises, from partners, in the public cloud, or both.
The next generation of innovation and automation must be built on strong data foundations. Emerging technologies, such as artificial intelligence and machine learning, require a large volume of current, clean, and accurate data from different business silos to function. However, seamless access across a global company’s multiple data silos is extremely difficult without a real-time, consistent, and secure data layer to deliver the required information to the relevant stakeholders and applications at the right time.
While data lakes have been implemented in attempts to solve many data management challenges, many organizations are hamstrung by these data lakes turning into data swamps – murky with disorganized data that presents challenges around accessibility and the ability to leverage the data for actionable insights.
Data fabrics allow businesses to maximize the value of their existing data architectures, instead of requiring an entire rebuild of each silo or application already in use. By enabling existing applications and data to remain in place, organizations can access, harmonize and analyze the data in flight and on-demand to meet a variety of business initiatives.
By using a data fabric, a business can also use its already existing data to pull real-time insights, risk data, and more. This means businesses can make changes instantly to any developments that are occurring in the industry, and improve decision making both internally and externally as all stakeholders can be confident they have a current and accurate understanding of potential risk.
For example, leading capital markets firms are leveraging smart data fabrics to stitch together distributed data from across the enterprise, as well as power a wide variety of mission-critical initiatives, from business management reporting and scenario planning, to modeling enterprise risk and liquidity, regulatory compliance, and portfolio optimization. This gives these firms a holistic and comprehensive view of what has happened in the past, what’s currently happening, and what is likely to happen in the future so they can be proactive and prescriptive rather than reactive to market changes.
Q5. What is a data mesh architecture? What is the difference (if any) with a data fabric?
There’s not a lot of agreement in the industry about the definition of a data mesh. There are four general principles commonly cited: decentralized data ownership, data as a product, federated computational governance, and self-service data infrastructure as a platform – but it is not as well understood or established as the data fabric architecture. I have actually met quite a few organizations that have a data fabric architecture in production, and none that have a data mesh running.
At its core, a data mesh architecture leverages a domain-oriented, self-serve design, enabling data consumers to discover, understand, trust, and use data to inform decisions and initiatives. This approach enables end-users to easily access and request data where it lives, without the need to first move it to a data lake or warehouse. Similar to how engineering teams have adopted microservice architectures over monolithic applications, data teams view data mesh as an opportunity to adopt data microservices that provide business contextual services over monolithic data platforms.
Unlike a data fabric, a data mesh is focused on an overall organizational change where domain teams own the delivery of data products with the understanding that the domain teams are closer to their data and thus understand it better. This allows teams to pull the data and analytics they need without needing an expert data team on-site, making data more accessible and interoperable.
The data fabric and data mesh architectures can co-exist, just as some organizations have both a data lake and a data warehouse. I think the main tension between the two comes down to governance. A data fabric aims to provide consistent data governance, often across many different sources and locations of data. A data mesh is looking to have distributed and federated governance. When there are multiple applications using the same data, and those applications have different compliance or governance requirements, it’s really tough to have them work together without some level of centralized oversight.
Q6. What are your practical suggestions for evaluating when an organization needs to move from a data lake to a data fabric and/or to a data mesh architecture?
First, teams need to step back and look at what is working in the organization today and what is not. It may be that your data lake or data warehouse is working fine and “if it ain’t broke, don’t fix it.” You may also be able to leverage an existing data repository within a data fabric architecture.
Next, catalog the different patterns of data use you have today in addition to those you know you’ll need in the next year or two as the company grows. If you find a variety of data-use patterns with an overlap in the data sources involved, that is a key indicator that you will need a next-generation architecture.
Finally, look for ways to simplify what you are already doing. This could be reducing the number of moving parts by combining what is now multiple products into a single one. There isn’t a universal product for everything, but a product like InterSystems IRIS® can handle data in multiple data models and across multiple workloads efficiently.
Q7. There is a discussion on “democratizing” data. What is your take on that?
I’m very passionate about the process of enabling everybody in an organization, irrespective of their technical know-how, to work with data comfortably, to feel confident talking about it, and, as a result, make data-informed decisions and build customer experiences powered by data. That’s what I mean by “democratizing data.”
A business, and its customers, cannot make relevant, accurate decisions without first having all of the information – to do this they need to have clean, accessible, actionable data. However, access to data is not always readily available and can require special teams and security measures in order to reach it.
Individuals, both customers and internal stakeholders, have a right to their data. This will improve decision-making abilities, customer experiences and overall satisfaction. Data access also enables companies to focus more effectively on key priorities, such as developing new applications or using data insights to make strategic decisions.
I’m not, however, a fan of decentralizing or federating data governance. Sometimes I hear people describe “democratizing data” as freeing data from controls, getting away from the “data police.” To me, this seems more like a desire for anarchy than a desire for democracy. I think that governance is fundamental to good data management and that if it’s done well it can speed up processes rather than slow them down. Governance is part of enabling people, so where it seems like an obstacle it should be fixed rather than eliminated.
Qx Anything else you wish to add?
We live in very exciting times, and data management is a hugely innovative field today.
We are thrilled to kick off InterSystems Global Summit 2022 from June 20-23 at the Hyatt Regency in Seattle, Wash. If you are interested in attending the conference or learning more about the sessions, visit summit.intersystems.com. While the conference will not be live-streamed, the most popular session audio recordings on our learning services site, here. And be sure to stay tuned for some of the content to be available afterward.
Jeffrey Fried, Director of Product Management for InterSystems
Jeff Fried is passionate about helping people create data-driven applications that empower their decision-making. It’s this very passion that drives Fried, director of product management at InterSystems and a self-proclaimed data management nerd, to encourage the enterprise- level operationalization of data — a process that strictly defines variables into measurable factors.
Sponsored by InterSystems