Business Requirements First, Technology Second
BY Tamara Dull, Director of Emerging Technologies ▪ SAS Best Practices
Four or five years ago, when the term “big data” or “Hadoop” was mentioned around business professionals who had been in the business of “doing data” for a long time, there was mixed feelings about what was coming next. Some called Hadoop the EDW killer, others called it the ETL killer, while others just called it a passing fad. Many people, frankly, didn’t know what to think.
Today, we know that Hadoop has not killed the EDW or ETL—and it’s far from being a passing fad. We also know that big data technologies are still maturing, and we’re still a ways out from entering a steady state—especially with the rise of the Internet of Things.
And one lesson we’ve learned from our traditional, relational experience that applies to these newer big data technologies is: Identify business requirements first, then explore technology solutions that can fulfill these requirements. As common sense as this sounds, there are still organizations today who go after the latest technology (Hadoop, anyone?) without any business requirements defined, and then are disappointed that the technology didn’t live up to its hyped promise.
We’re also learning that some of our organizations’ core business requirements can be fulfilled by existing relational technologies and newer big data technologies. Let’s take a look at a few of these requirements:
- Discovery of unexplored business questions. While we’ve been using our data warehouses for discovery work for years, sometimes this technology isn’t optimal for comparisons across and between large, and often unstructured, data sets. Big data technologies like Hadoop are also well suited for this type of work and are very good at fast pattern recognition, making “discovery” work very fast.
- Clean, transformed, high-quality aggregated data. In terms of data quality, many data warehouses have integrated data quality functions built-in. In big data environments, however, there could be a reason to provision data in its “raw” or unstructured format. Data quality can happen inside Hadoop—as more and more vendors continue to offer up solutions—but the market is still young.
- Low latency, interactive reports. Data warehouses have typically been known as the answer for low latency or interactive reporting. But with new data visualization tools, big data technologies are presenting an interesting new alternative for reporting directly against big data platforms.
- High volumes of raw, highly granular, unstructured data. Big data technologies are primed to process raw, unstructured data. They not only process it quickly, but they can store it cheaply and avail it to a range of projects locally. Data warehouses, on the other hand, often deal with aggregated data, which can now be aggregated using big data technologies and then provisioned to the data warehouse.
- Exploratory analysis of preliminary data. This refers to data that might be in the midst of being processed, as in a staging area or sandbox. Hadoop offers a standalone environment for structured and unstructured exploration, whereas with data warehouses, the data modeling, acquisition, cleansing, structuring, and loading of that data might take more time.
The one thing to keep in mind is that while there are exceptions to every rule, big data and traditional, relational technologies are optimized for different purposes. The goal is to use these solutions for what they were designed to do; or in other words, use the best tool for the job. This is not a new lesson. We’ve learned this one before. Now let’s do it.