On Big Data and the Internet of Things. Interview with Bill Franks
“Perhaps the biggest challenge is that the IoT has the potential to generate orders of magnitude more data than any other source in existence today. So, in the world of the IoT we will test the limits of ‘big.’”–Bill Franks
On topics of Data Warehouses, Hadoop, the Internet of Things, and Teradata`s perspective on the world of Big Data, I have interviewed Bill Franks, Chief Analytics Officer for Teradata.
Q1. What is Teradata`s perspective on the world of Big Data?
Bill Franks: Our perspective has not really changed with regard to ‘big data:’ the primary mission of Teradata for decades has been helping organizations utilize and analyze large volumes of data to produce insight for business value. Note that our Teradata database was originally designed exclusively for analytics, then called ‘decision support’ – unlike most other platforms, which were designed for general computing – then later adapted for analytic uses. As a result, the Teradata analytic engine is – and has always been – uniquely architected for large – ‘big data’ – volume and complexity aimed at producing actionable intelligence.
Of course, the amount of data that’s considered ‘big’ and thus a challenge – has changed, and we have a lot of novel data sources in recent times. However, we believe that companies which have always focused on analyzing and acting upon data intelligently can adapt to the new world of big data. After all, big data is just more data and the analysis of big data is still analysis. There are as many similarities as differences from the past.
Teradata has engineered further analytic enhancements over the years to create a diverse portfolio of products, partnerships, and services to allow our customers to continue to get the most from their data assets. The pace of change is very rapid today and we expect that to continue. We believe our strength is in our experience, expertise and our ability to help organizations navigate the changing landscape and continue to derive new, useful insights from their increasingly large and diverse data sources.
Q2. Most data warehousing projects consolidate data from different source systems. What is different in the world of Big Data?
Bill Franks: By definition, if you want to look at two different data sources together, you must either move one set of data to the other or move them both to a 3rd location. If data is truly disparate, you can’t use it effectively. That is what drove data warehousing to prominence. One huge difference between data warehousing practices years back and then today is that previously, all data that was captured in the business world met three criteria almost 100% of the time.
1) It was immensely important; given the cost to capture and store it,
2) The data was well structured, and
3) The data was generated by an organization’s internal business processes.
— Therefore, it was mostly placed in relational databases or on a mainframe since those technologies easily handle that type of data. Data warehousing solved the problem of many structured data platforms being spread out – by consolidating the sources for analytic purposes into a single structured platform.
What is different with big data is that today, the data often violates all of the rules.
1) Much of it is not important, or has not yet been proven to be important,
2) The data is not structured in the classic fashion at the outset (though most can and must be structured for analytical purposes), and
3) The data is often from sources external to an organization.
— As a result, we now have disparate data platforms that each serve different functions. Some focus on one type of data, while others focus on flexibility. However, the downside is that these platforms don’t integrate well and it isn’t as easy to tie everything together. That’s a problem Teradata is working diligently to solve with our Unified Data Architecture – our pioneering version of the visionary Gartner Logical Data Warehouse.
Q3. Will data warehouses become obsolete soon and be replaced by Hadoop?
Bill Franks: Absolutely not. A few years ago, that was a common claim. That claim is rarely heard today. In fact, all of the big Hadoop vendors partner with Teradata.
This is because our data warehousing platforms provide some important things Hadoop does not — just as Hadoop provides some things a data warehouse does not. Each platform has its strengths and weaknesses, but when positioned together, additional value is added. Part of the issue is that people mistake policy decisions for technology limitations.
There is no reason you can’t place untested, raw, unclean data of unknown value on a data warehousing platform; it’s the corporate policies that often forbid it. It is true that once data is critical and is leveraged by many applications and business users, you have to keep some control and consistency over it. This is what a data warehouse does for an organization.
But, that doesn’t mean you can’t experiment with new sources freely using the technology that supports formal data warehouses.
A colleague of mine mentioned a conversation he had with a Hadoop user. That user was boasting about how he could with a single command change the data type of information on Hadoop, for instance, if it would help him more easily solve his next problem. My friend then asked him what would happen to the prior dozen or two processes that were built expecting the data to be in the original data type format. Wouldn’t they all then break? The user had a blank stare for a moment and then realized his error. As you develop more processes, you must implement security, consistency, and controls on the underlying data. This is why data warehousing – as Gartner defines it, is going to be around for a long time.
Q4. With the increased need of tools for combining data together, are we going to see a “federated”- Big Data architecture?
Bill Franks: A form of that is exactly what we are pursuing with Teradata’s Unified Data Architecture. Again – we refer to Gartner’s vision of the “Logical Data Warehouse.” What we are doing is putting in place a layer of architecture that connects multiple disparate data stores. This architecture includes – and connects – relational databases like Teradata and Oracle, discovery platforms like our Teradata Aster offering, Hadoop, and other platforms such as MongoDB. The idea is that we make information available to users about data throughout the ecosystem, not just the data on the platform they are operating from. So, I see a data dictionary that includes a “table” called “Sensor Feed.”
I can see the data elements available and write analytic logic against those elements. However, I don’t need to be aware of whether the data is a database table, or a Hadoop file, or is in MongoDB. Users can simply build analytics instead of worrying about where data resides, how to log on to various systems, and how to move data. We’ll handle that for them.
We are also beginning to push processing across the various platforms to optimize performance. Just like with a ‘table’ versus a ‘view’ in a database, making a process enterprise-ready might require moving data around the architecture permanently. But now, users are free to discover where that is required. And, the technical team behind the curtain can worry about the details just as they do with traditional data warehousing. We are very bullish on our approach and think we are well positioned to maintain our leadership position in the analytics space.
Q5. Teradata made several acquisitions lately. How do the tools that Teradata acquired fit the current Teradata Data Architectural Framework?
Bill Franks: I believe this in general was addressed. However, in addition I would point out that we acquired Revelytix in 2014 to obtain Loom: an open platform for discovering, profiling, preparing, and tracking data lineage for data in Hadoop. Likewise, we acquired Hadapt, which created a big data analytic platform natively integrating SQL with Apache Hadoop. Plus our recent RainStor acquisition strengthens Teradata’s enterprise-grade Hadoop solutions and enables organizations to add archival data store capabilities for their entire enterprise, including data stored in OLTP, data warehouses, and applications.
Q6. What are the key differentiators of the Teradata Database core architecture?
Bill Franks: As I said, the Teradata DW was differentiated from the start – uniquely architected for analytics from day one. However, I would add that Teradata continues to broaden our differentiation: we’ve built the best data orchestration software in the industry (Teradata Unity and QueryGrid). The orchestration software is key – because it enables our customers to choose a file system that they use to store the data in – and the analytics that they apply to that data independently — and marry them together with software.
It helps reduce the complexity of connecting to, accessing, understanding interfaces and getting value from multiple analytical systems. Another differentiator is Teradata Intelligent Memory, introduced two years ago. TIM is the world’s first extended memory technology beyond cache to increase query performance. Users can configure the exact amount of in-memory capability needed for critical workloads – based on temperature – hot or cold data. The list goes on. I would say that our data technology really does focus on how data is best used – and what proficient users need most.
Q7. Is SQL really the right language to handle Big Data Analytics?
Bill Franks: In some cases yes and in some cases no. We want users to be able to utilize whatever language or platform is best for any given task. There are many big data requirements that perfectly fit SQL and many that don’t. The key is enabling scalable access to the data and flexibility in approach. Most people are aware that there is a big effort to add a SQL interface to Hadoop. What most haven’t realized is how far we’ve also come the other direction. For some time, Teradata has allowed C and Java processing directly against our database platforms via User Defined Functions and other similar extensions. We are now also enabling other languages such as R and Python to be executed within a Teradata context. What is possible today is so far beyond what was possible even 5 or 10 years ago.
Q8. How do you see the adoption of Cloud for Analytics?
Bill Franks: We are aggressively rolling out our own cloud offerings across our product suites. Many of our enterprise customers also configure our products as a private cloud behind their firewall. Adoption will be mixed based on the type of data and nature of work being done. Anything involving sensitive data is still typically not allowed outside a firewall. If you think back to the issue raised in a prior question of having to be able to combine data for analytics, you can’t really have some data locked behind a firewall and some data locked outside it. The real driver behind the cloud is that people want flexible, pay on demand access to analysis platforms. We have multiple ways to provide that to our clients, of which our cloud offerings are only one option. We have some other novel pricing and licensing options the help customers get access to the resources they require for analytics.
Q9. What are the most important data challenges posed by the Internet of Things (IoT)?
Bill Franks: Perhaps the biggest challenge is that the IOT has the potential to generate orders of magnitude more data than any other source in existence today.
So, in the world of the IOT we will test the limits of ‘big.’ At the same time, much of the data generated by the IOT will have low value in the short term and no value in the long term. One of the biggest challenges will be determining which pieces of the information generated by a given sensor actually matters to your business and for how long. In the long run, it is likely that only a small fraction of the raw data produced by the IOT will be stored beyond a few moments of immediate usage. For example, why keep the sensor readings that help navigate my car into a tight parking spot? Once I’m safely in the spot, I really don’t ever need to revisit that data again. If I hit a car in front of me, I might make an exception and keep the data so that the cause can be identified.
Q10. Could you mention some successful Big Data projects you have recently completed with customers?
Bill Franks: We are seeing a lot of very interesting analytics come about. We’ve helped health organizations discover genetic patterns associated with disease, we’ve helped manufacturers reduce cost and increase customer satisfaction by building predictive maintenance algorithms, we’ve helped cable providers identify valuable consumer viewing habits.
I could go on and on. A great place to see some of the examples, and even hear from some of the companies and people behind it, is at our website.
Bill Franks is the Chief Analytics Officer for Teradata, where he provides insight on trends in the analytics and big data space and helps clients understand how Teradata and its analytic partners can support their efforts. His focus is to translate complex analytics into terms that business users can understand and work with organizations to implement their analytics effectively. His work has spanned many industries for companies ranging from Fortune 100 companies to small non-profits. Franks also helps determine Teradata’s strategies in the areas of analytics and big data.
Franks is the author of the book Taming The Big Data Tidal Wave (John Wiley & Sons, Inc., April, 2012). In the book, he applies his two decades of experience working with clients on large-scale analytics initiatives to outline what it takes to succeed in today’s world of big data and analytics. The book made Tom Peter’s list of 2014 “Must Read” books and also the Top 10 Most Influential Translated Technology Books list from CSDN in China.
Franks’ second book The Analytics Revolution (John Wiley & Sons, Inc., September, 2014) lays out how to move beyond using analytics to find important insights in data (both big and small) and into operationalizing those insights at scale to truly impact a business.
He is a faculty member of the International Institute for Analytics, founded by leading analytics expert Tom Davenport, and an active speaker who has presented at dozens of events in recent years. His blog, Analytics Matters, addresses the transformation required to make analytics a core component of business decisions.
Franks earned a Bachelor’s degree in Applied Statistics from Virginia Tech and a Master’s degree in Applied Statistics from North Carolina State University. More information is available here: http://www.bill-franks.com.
Follow ODBMS.org on Twitter: @odbmsorg