ODBMS Industry Watch » IoT http://www.odbms.org/blog Trends and Information on Big Data, New Data Management Technologies, Data Science and Innovation. Fri, 09 Feb 2018 21:04:31 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.19 On Technology Innovation, AI and IoT. Interview with Philippe Kahn http://www.odbms.org/blog/2018/01/on-technology-innovation-ai-and-iot-interview-with-philippe-kahn/ http://www.odbms.org/blog/2018/01/on-technology-innovation-ai-and-iot-interview-with-philippe-kahn/#comments Sat, 27 Jan 2018 18:59:01 +0000 http://www.odbms.org/blog/?p=4556

“There is a lot of hype about the dangers of IoT and AI. It’s important to understand that nobody is building Blade-Runner style replicants.” — Philippe Kahn

I have interviewed Philippe Kahn. Philippe is a mathematician, well known technology innovator, entrepreneur and founder of four technology companies: Fullpower Technologies, LightSurf Technologies, Starfish Software and Borland.


Q1. Twenty years ago, you spent about a year working on a Web-based infrastructure that you called Picture Mail. Picture Mail would do what we now call photo “sharing”. How come it took so long before the introduction of the iPhone, Snapchat, Instagram, Facebook Live and co.?

Philippe Kahn: Technology adoption takes time. We designed a system where a picture would be stored once and a link-back would be sent as a notification to thousands. That’s how Facebook and others function today. At the time necessity created function because for wireless devices and the first Camera-Phones/Cellphone-Cameras the bandwidth on cellular networks was 1200 Baud at most and very costly. Today a picture or a video are shared once on Facebook and millions/billions can be notified. It’s exactly the same approach.

Q2. Do you have any explanation why established companies such as Kodak, Polaroid, and other camera companies (they all had wireless camera projects at that time), could not imagine that the future was digital photography inside the phone?

Philippe Kahn: Yes, I met with all of them. Proposed our solution to no avail. They had an established business and thought that it would never go away and they could wait. They totally missed the paradigm shift. Paradigm shifts are challenges for any established player, look at the demise of Nokia for missing the smartphone.

Q3. What is your take on Citizen journalism?

Philippe Kahn: Citizen journalism is one of the pillars of future democracy. There is always someone snapping and pushing forward a different point of view. We see it every day around the world.

Q4. Do you really believe that people can’t hide things anymore?

Philippe Kahn: I think that people can’t hide what they do in public: Brutality, Generosity, Politics, Emotions. We all have a right to privacy. However in public, there is always someone snapping.

Q5. What about fake news?

Philippe Kahn: There is nothing new about Fake News. It’s always been around. What’s new is that with the web omnipresent, it’s much more effective. Add modern powerful editing and publishing tools and sometimes it’s very challenging to differentiate what’s real from what’s fake.

Q6. You told Bob Parks, who interviewed you for a Wired article in 2000: ‘In the future people will document crimes using video on their phones. Then everyone will know the real story.’ Has this really changed our world?

Philippe Kahn: Yes, it has. It’s forced policing for example to re-examine protocols. Of course not every violence or crime is covered, but video and photos are helping victims.

Q7. What are the challenges and opportunities in country like Africa, where people don’t have laptops, but have phones with cameras?

Philippe Kahn: The opportunities are great. Those countries are skipping the laptop and focusing on a Smartphone with a cloud infrastructure. That’s pretty much what I do daily. In fact, this is what I am doing as I am answering these questions.

Q8. Back to the future: you live now in the world of massive firehouses of machine data and AI driven algorithms. How these new technologies will change the world (for the better or the worst)?

Philippe Kahn: There are always two sides to everything: Even shoes can be used to keep me warm or march fascist armies across illegitimately conquered territories. The dangers of AI lie in police states and in a massive focus on an advertising business model. But what we do with AI is helping us find solutions for better sleep, diabetes, high blood pressure, cancer and more. We need to accept one to get the other in some ways.

Q9. In my recent interview with interview Vinton G. Cerf , he expressed great concerns about the safety, security and privacy of IoT devices. He told me “A particularly bad scenario would have a hacker taking over the operating system of 100,000 refrigerators.”

Philippe Kahn: When we build AI-powered IoT solutions at Fullpower, security and privacy are paramount. We follow the strictest protocols. Security and privacy are at risk every day with computer viruses and hacking. Nothing is new. It’s always a game of cat and mouse. I want to believe that we are a great cat. We work hard at it.

Q10. With your new startup, FullPower Technologies, you have developed under-the-mattress sensors and cloud based artificial intelligence to gather data and personalize recommendations to help customers improve their sleep. What do you think of Cerf´s concerns and how can they be mitigated in practice?

Philippe Kahn: Vince’s concerns are legitimate. At Fullpower our privacy, security and anonymity protocols are our #1 focus together with quality, accuracy, reliability and repeatability. We think of what we build as a fortress. We’ve built in security, privacy, preventive maintenance, automated secure trouble shooting.

Qx Anything else you wish to add?

Philippe Kahn: There is a lot of hype about the dangers of IoT and AI. It’s important to understand that nobody is building Blade-Runner style replicants. AI is very good at solving specialized challenges: Like being the best at playing chess, where the rules are clear and simple. AI can’t deal with general purpose intelligence that is necessary for a living creature to prosper. We are all using AI, Machine Learning, Deep Learning, Supervised Learning for simple and useful solutions.


Philippe Kahn is CEO of Fullpower, the creative team behind the AI-powered Sleeptracker IoT Smartbed technology platform and the MotionX Wearable Technology platform. Philippe is a mathematician, scientist, inventor, and the creator of the camera phone, which original 1997 implementation is now with the Smithsonian in Washington, D.C.





Related Posts

– Internet of Things: Safety, Security and Privacy. Interview with Vint G. CerfODBMS Industry Watch, 2017-06-11

– On Artificial Intelligence and Analytics. Interview with Narendra Mulani, ODBMS Industry Watch, 2017-12-08

Follow us on Twitter: @odbmsorg


http://www.odbms.org/blog/2018/01/on-technology-innovation-ai-and-iot-interview-with-philippe-kahn/feed/ 0
Internet of Things: Safety, Security and Privacy. Interview with Vint G. Cerf http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/ http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/#comments Sun, 11 Jun 2017 17:06:03 +0000 http://www.odbms.org/blog/?p=4373

” I like the idea behind programmable, communicating devices and I believe there is great potential for useful applications. At the same time, I am extremely concerned about the safety, security and privacy of such devices.” –Vint G. Cerf

I had the pleasure to interview Vinton G. Cerf. Widely known as one of the “Fathers of the Internet,” Cerf is the co-designer of the TCP/IP protocols and the architecture of the Internet. Main topic of the interview is the Internet of Things (IoT) and its challenges, especially the safety, security and privacy of IoT devices.
Vint is currently Chief Internet Evangelist for Google.

Q1. Do you like the Internet of Things (IoT)?

Vint Cerf: This question is far too general to answer. I like the idea behind programmable, communicating devices and I believe there is great potential for useful applications. At the same time, I am extremely concerned about the safety, security and privacy of such devices. Penetration and re-purposing of these devices can lead to denial of service attacks (botnets), invasion of privacy, harmful dysfunction, serious security breaches and many other hazards. Consequently the makers and users of such devices have a great deal to be concerned about.

Q2. Who is going to benefit most from the IoT?

Vint Cerf: The makers of the devices will benefit if they become broadly popular and perhaps even mandated to become part of local ecosystem. Think “smart cities” for example. The users of the devices may benefit from their functionality, from the information they provide that can be analyzed and used for decision-making purposes, for example. But see Q1 for concerns.

Q3. One of the most important requirement for collections of IoT devices is that they guarantee physical safety and personal security. What are the challenges from a safety and privacy perspective that the pervasive introduction of sensors and devices pose? (e.g. at home, in cars, hospitals, wearables and ingestible, etc.)

Vint Cerf: Access control and strong authentication of parties authorized to access device information or control planes will be a primary requirement. The devices must be configurable to resist unauthorized access and use. Putting physical limits on the behavior of programmable devices may be needed or at least advisable (e.g., cannot force the device to operate outside of physically limited parameters).

Q5. Consumers want privacy. With IoT physical objects in our everyday lives will increasingly detect and share observations about us. How is it possible to reconcile these two aspects?

Vint Cerf: This is going to be a tough challenge. Videocams that help manage traffic flow may also be used to monitor individuals or vehicles without their permission or knowledge, for example (cf: UK these days). In residential applications, one might want (insist on) the ability to disable the devices manually, for example. One would also want assurances that such disabling cannot be defeated remotely through the software.

Q6. Let`s talk about more about security. It is reported that badly configured “smart devices” might provide a backdoor for hackers. What is your take on this?

Vint Cerf: It depends on how the devices are connected to the rest of the world. A particularly bad scenario would have a hacker taking over the operating system of 100,000 refrigerators. The refrigerator programming could be preserved but the hacker could add any of a variety of other functionality including DDOS capacity, virus/worm/Trojan horse propagation and so on.
One might want the ability to monitor and log the sources and sinks of traffic to/from such devices to expose hacked devices under remote control, for example. This is all a very real concern.

Q7. What measures can be taken to ensure a more “secure” IoT?

Vint Cerf: Hardware to inhibit some kinds of hacking (e.g. through buffer overflows) can help. Digital signatures on bootstrap programs checked by hardware to inhibit boot-time attacks. Validation of software updates as to integrity and origin. Whitelisting of IP addresses and identifiers of end points that are allowed direct interaction with the device.

Q8. Is there a danger that IoT evolves into a possible enabling platform for cyber-criminals and/or for cyber war offenders?

Vint Cerf: There is no question this is already a problem. The DYN Corporation DDOS attack was launched by a botnet of webcams that were readily compromised because they had no access controls or well-known usernames and passwords. This is the reason that companies must feel great responsibility and be provided with strong incentives to limit the potential for abuse of their products.

Q9. What are your personal recommendations for a research agenda and policy agenda based on advances in the Internet of Things?

Vint Cerf: Better hardware reinforcement of access control and use of the IOT computational assets. Better quality software development environments to expose vulnerabilities before they are released into the wild. Better software update regimes that reduce barriers to and facilitate regular bug fixing.

Q10. The IoT is still very much a work in progress. How do you see the IoT evolving in the near future?

Vint Cerf: Chaotic “standardization” with many incompatible products on the market. Many abuses by hackers. Many stories of bugs being exploited or serious damaging consequences of malfunctions. Many cases of “one device, one app” that will become unwieldy over time. Dramatic and positive cases of medical monitoring that prevents serious medical harms or signals imminent dangers. Many experiments with smart cities and widespread sensor systems.
Many applications of machine learning and artificial intelligence associated with IOT devices and the data they generate. Slow progress on common standards.

Vinton G. Cerf co-designed the TCP/IP protocols and the architecture of the Internet and is Chief Internet Evangelist for Google. He is a member of the National Science Board and National Academy of Engineering and Foreign Member of the British Royal Society and Swedish Royal Academy of Engineering, and Fellow of ACM, IEEE, AAAS, and BCS.
Cerf received the US Presidential Medal of Freedom, US National Medal of Technology, Queen Elizabeth Prize for Engineering, Prince of Asturias Award, Japan Prize, ACM Turing Award, Legion d’Honneur and 29 honorary degrees.


European Commission, Internet of Things Privacy & Security Workshop’s Report,10/04/2017

Securing the Internet of Things. US Homeland Security, November 16, 2016

Related Posts

Social and Ethical Behavior in the Internet of Things By Francine Berman, Vinton G. Cerf. Communications of the ACM, Vol. 60 No. 2, Pages 6-7, February 2017

Security in the Internet of Things, McKinsey & Company,May 2017

Interview to Vinton G. Cerf. ODBMS Industry Watch, July 27, 2009

Five Challenges to IoT Analytics Success. By Dr. Srinath Perera. ODBMS.org, September 23, 2016

Follow us on Twitter: @odbsmorg


http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/feed/ 0
Democratizing the use of massive data sets. Interview with Dave Thomas. http://www.odbms.org/blog/2016/09/democratizing-the-use-of-massive-data-sets-interview-with-dave-thomas/ http://www.odbms.org/blog/2016/09/democratizing-the-use-of-massive-data-sets-interview-with-dave-thomas/#comments Mon, 12 Sep 2016 19:04:14 +0000 http://www.odbms.org/blog/?p=4234

“Any important data driving a business decision needs to be sanity checked, just as it would if one was using a spreadsheet.”–Dave Thomas.

I have interviewed Dave Thomas,Chief Scientist at Kx Labs.


Q1. For many years business users have had their data locked up in databases and data warehouses. What is wrong with that?

Dave Thomas: It isn’t so much an issue of where the data resides, whether it is in files, databases, data warehouses or a modern data lake. The challenge is that modern businesses need access to the raw data, as well as the ability to rapidly aggregate and analyze their data.

Q2. Typical business intelligence (BI) tool users have never seen their actual data. Why?

Dave Thomas: For large corporations hardware and software both used to be prohibitively expensive, hence much of their data was aggregated prior to making it available to users. Even today when machines are very inexpensive most corporate IT infrastructures are impoverished relative to what one can buy on the street or in the Cloud.
Compounding the problem, IT charge-back mechanisms are biased to reduce IT spending rather than to maximize the value of data delivered to the business.
Traditional technologies are not sufficiently performant to allow processing of large volumes of data.
Many companies have inexpensive data lakes and have realized after the fact that using a commodity storage systems, such as HDFS, has severely constrained their performance and limited their utility. Hence more corporations are moving data away from HDFS into high-performance storage or memory.

Q3. What are the limitations of the existing BI and extract, transform and load (ETL) data tools?

Dave Thomas: Traditional BI tools assume that it is possible for DBAs and BI experts to a priori define the best way to structure and query the data. This reduces the whole power of BI to mere reporting. In an attempt to deal with huge BI backlogs, generic query and reporting tools have become popular to shift reporting to self-serve. However, they are often designed for sophisticated BI users rather than for normal business users. They are often not performant because they depend on the implementation of the underlying data stores.
For the most part, existing ETL tools are constrained by having to move the data to the ETL process and then on to the end user. Many ETL tools only work against one kind of data source. ETL can’t be written by normal users and due to the cost of an incorrect ETL run, such tools are not available to the data analyst. One of the major topics of discussion in Big Data shops is the complexity and performance of their Big Data pipeline. ETL, data blending, shouldn’t be a separate process or product. It should be something one can do with queries in a single efficient data language.

Q4. What are the typical technical challenges in finance, IoT and other time-series applications?

Dave Thomas:
1. Speed, as data volumes and variety are always increasing.
2. Ability to deal with both real-time events and historical events efficiently. Ideally in a single technology.
3. To handle time-series one needs to be able to deal with simultaneous arrival of events. Time with nanosecond precision is our solution. Other solutions are constrained by using milliseconds and event counters that are much less efficient.
4. High-performance operations on time, over days, months and years are essential for time-series. This is why time is a native type in Kx.
5. The essence of time-series is processing sliding time windows of data for both joins and aggregations.
6. In IOT, data is always dirty. Kx’s native support for missing data and out of band data due to failing sensors, allows one to deal with the realities of sensor data.

Q5. Kx offers analysts a language called q. Why not extend standard SQL?

Dave Thomas: I think there is a misunderstanding about q. Q is a full functional data language that both includes and extends SQL. Selects are easier than SQL because they provide implicit joins and group-bys. This makes queries roughly 50% of the code of SQL. Unlike many flavors of SQL, q lets one put a functional expression in any position in an SQL statement. One can easily extend the aggregation operations available to the end-user.

Q6. Can you show the difference between a query written in q and in standard SQL?

Dave Thomas: Here’s an example of retrieving parts from an orders table with a foreign key join to a parts table, summing by quantity and then sorting by color:

select sum qty by p.color from sp

select p.color, sum(sp.qty) from sp, p
where sp.p=p.p group by p.color order by color

Q7. How do queries execute inside the database?

Dave Thomas: Q is native to the database engine. Hence queries and analytics execute in the columns of the Kx database. There is no data shipping between the client and database server.

Q8. Shawn Rogers of Dell said: “A ‘citizen data scientist’ is an everyday, non-technical user that lacks the statistical and analytical prowess of a traditional data scientist, but is equally eager to leverage data in order to uncover insights, and importantly, do so at the speed of business.” What is your take on this?

Dave Thomas: High-performance data technologies, such as Kx, using modern large-memory hardware, can support data analysts versus data scientist queries. In the product Analyst for Kx, for example, users can work interactively on a sample of data using visual tools to import, clean, query, transform, analyze and visualize data with minimal, if any programming or even SQL. Given correct operations on one or more samples they then can be run against trillions of rows of data. Data analysts today can truly live in their data.

Q9. What are the risks of bringing the power of analytics to users who are non-expert programmers?

Dave Thomas: Clearly any important analysis needs to be validated and cross-checked. Hence any important data driving a business decision needs to be sanity checked, just as it would if one was using a spreadsheet.
In our experience users do make initial mistakes, but as they live in their data they quickly learn.
Visualization really helps, as does the provision of metadata about the data sources. Reducing the cycle time provides increased understanding, and allows one to make mistakes.
Runaway query performance has been a concern of DBAs, but for many years frameworks have been in place such as our smart query router that will ensure that ad hoc queries against massive datasets are throttled so they don’t run away. Fortunately, recent cost reductions in non-volatile memory make it possible to have high-performance query-only replicas of data that can be made available to different parts of the organization based on its needs.

Q10. How can non-expert programmers understand if the information expressed in visual analytics such as heat maps or in operational dashboard charts, is of good quality or not?

Dave Thomas: In our experience users spot visual anomalies much faster than inconsistencies in a spreadsheet.

Q11. What are the opportunities arising in “democratizing” the use of massive data sets?

Dave Thomas: We are finally living in a world where for many companies it is possible to run a real-time business where everyone can have fast, efficient access to the data they need. Rather than being held hostage to aggregations, spreadsheets and all sorts of variants of the truth, the organization can expediently see new opportunities to improve results in sales, marketing, production and other business operations.

Q12. How important is data query and data semantics?

Dave Thomas: Unfortunately we are not educated on how to express data semantics and data query.
Even computer scientists often study less about writing queries than how to execute them efficiently.
We need to educate students and employees on how to live in their data. It may well be that the future of programming for most will be writing queries. Given powerful data languages even compiler optimizations can be expressed by queries.
We need to invest much more in data governance and the use of standard terminology in order to share data within and across companies.

Dave Thomas, Kx Labs.
As Chief Scientist Dave envisions the future roadmap for Kx tools. Dave has had a long and storied career in computer software development and is perhaps best known as the founder and past CEO of Object Technology International, formerly OTI, now IBM OTI Labs, a pioneer in Agile Product Development. He was the principal visionary and architect for IBM VisualAge Smalltalk and Java tools and virtual machines including the popular open-source, multi-language Eclipse.org IDE. As the cofounder of Bedarra Research Labs he led the creation of the Ivy visual analytics workbench. Dave is a renowned speaker, university lecturer and Chairman of the Australian developer YOW! conferences.


New Kx release includes encryption, enhanced compression and Tableau integration. ODBMS.org JULY 4, 2016.

Resources for learning more about kdb+ and q benchmarking results.

Kdb+ and the Internet of Things/Big Data. InDetail Paper by Bloor Research Author: Philip Howard. ODBMS.org- JANUARY 28, 2015

Related Posts

Democratizing fast access to Big Data. By Dave Thomas, chief scientist at Kx Labs. ODBMS.org-April 26, 2016

On Data Governance. Interview with David Saul. ODBMS Industry Watch, Published on 2016-07-23

On the Challenges and Opportunities of IoT. Interview with Steve Graves. ODBMS Industry Watch, Published on 2016-07-06

On Data Analytics and the Enterprise. Interview with Narendra Mulani. ODBMS Industry Watch, Published on 2016-05-24

Follow us on Twitter: @odbmsorg


http://www.odbms.org/blog/2016/09/democratizing-the-use-of-massive-data-sets-interview-with-dave-thomas/feed/ 0
On the Challenges and Opportunities of IoT. Interview with Steve Graves http://www.odbms.org/blog/2016/07/on-the-challenges-and-opportunities-of-iot-interview-with-steve-graves/ http://www.odbms.org/blog/2016/07/on-the-challenges-and-opportunities-of-iot-interview-with-steve-graves/#comments Wed, 06 Jul 2016 09:00:29 +0000 http://www.odbms.org/blog/?p=4172

“Assembling a team with the wide range of skills needed for a successful IoT project presents an entirely different set of challenges. The skills needed to build a ‘thing’ are markedly different than the skills needed to implement the data analytics in the cloud.”–Steve Graves.

I have interviewed, Steve Graves, co-founder and CEO of McObject. Main topic of the interview is the Internet of Things and how it relates to databases.


Q1. What are in your opinion the main Challenges and Opportunities of the Internet of Things (IoT) seen from the perspective of a database vendor?

Steve Graves: Let’s start with the opportunities.

When we started McObject in 2001, we chose “eXtremeDB, the embedded database for intelligent, connected devices” as our tagline. eXtremeDB was designed from the get-go to live in the “things” comprising what the industry now calls the Internet of Things. The popularization of this term has created a lot of visibility and, more importantly, excitement and buzz for what was previously viewed as the relatively boring “embedded systems.” And that creates a lot of opportunities.

A lot of really smart, creative people are thinking of innovative ways to improve our health, our workplace, our environment, our infrastructure, and more. That means new opportunities for vendors of every component of the technology stack.
The challenges are manifold, and I can’t begin to address all of them. The media is largely fixated on security, which itself is multi-dimensional.
We can talk about protecting IoT-enabled devices (e.g. your car) from being hacked. We can talk about protecting the privacy of your data at rest. And we can talk about protecting the privacy of data in motion.
Every vendor needs recognize the importance of security. But, it isn’t enough for a vendor, like McObject, to provide the features to secure the target system; the developer that assembles the stack along with their own proprietary technology to create an IoT solution needs to use available security features, and use them correctly.

After security, scaling IoT systems is the next big challenge. It’s easy enough to prototype something.
But careful planning is needed to leap from prototype to full-blown deployment. Obvious decisions have to be made about connectivity and necessary bandwidth, how many things per gateway, one tier of gateways or more, and how much compute capacity is needed in the cloud. Beyond that, there are less obvious decisions to be made that will affect scalability, like making sure the DBMS used on devices and/or gateways is able to handle the workload (e.g. that the gateway DBMS can scale from 10 input streams to 100 input streams); determining how to divide the analytics workload between gateways and the cloud; and ensuring that the gateway, its DBMS and its communication stack can stream data to the cloud while simultaneously processing its own input streams and analytics.
Assembling a team with the wide range of skills needed for a successful IoT project presents an entirely different set of challenges. The skills needed to build a ‘thing’ are markedly different than the skills needed to implement the data analytics in the cloud. In fact, ‘things’ are usually very much like good ol’ embedded systems, and system engineers that know their way around real-time/embedded operating systems, JTAG debuggers, and so on, have always been at a premium.

Q2. Data management for the IoT: What are the main differences between data management in field-deployed devices and at aggregation points?

Steve Graves: Quite simply: scale. A field-deployed device (or a gateway to field-deployed devices that do not, themselves, have any data management need or capability) has to manage a modest amount of data. But an aggregation point (the cloud being the most obvious example) has to manage many times more data – possibly orders of magnitude more.
At the same time, I have to say that they might not be all that different. Some IoT systems are going to be closed, meaning the nature of the things making up the system is known, and these won’t require much scaling. For example, a building automation system for a small- to mid-size building would have perhaps 100s of sensors and 10s of gateways, and may (or may not) push data up to a central aggregation point. If there are just 10s of gateways, we can create a UI that connects to the database on each gateway where each database is one shard of a single logical database, and execute analytics against that logical database without any need of a central aggregation point. We can extend this hypothetical case to a campus of buildings, or to a landlord with many buildings in a metropolitan area, and then a central aggregation point makes sense.

But the database system would not necessarily be different, only the organization of the physical and logical databases.
The gateways of each building would stream to a database server in the cloud. In the case of 10 buildings, we could have 10 database servers in the cloud that represent 10 shards of that logical database in the cloud. This architecture allows for great scalability. The landlord acquires another building? Great, stand up another database server and the UI connects to 11 shards instead of 10. In this scenario, database servers are software, not hardware. For the numbers we’re talking about (10 or 11 buildings), it could easily be handled by a single hardware server of modest ability.

At the other end of the scale (pun intended) are IoT systems that are wide open. By that, I mean the creators are not able to anticipate the universe of “things” that could be connected, or their quantity. In the first case, the database system should be able to ingest data that was heretofore unknown. This argues for a NoSQL database system, i.e. a database system that is schema-less. In this scenario, the database system on field-deployed devices is probably radically different from the database system in the cloud. Field-deployed devices are purpose-specific, so A) they don’t need and wouldn’t benefit from a NoSQL database system, and B) most NoSQL database systems are too resource-hungry to reside on embedded device nodes.

Q3. If we look at the characteristics of a database system for managing device-based data in the IoT, how do they differ from the characteristics of a database system (typically deployed on a server) for analyzing the “big data” generated by myriad devices?

Steve Graves: Again, let’s recognize that field-deployed devices in the IoT are classic embedded systems. In practical terms, that means relatively modest hardware like an ARM, MIPS, PowerPC or Atom processor running at 100s of megahertz, or perhaps 1 ghz if we’re lucky, and with only enough memory to perform its function. Further, it may require a real-time operating system, or at least an embedded operating system that is less resource hungry than a full-on Linux distro. So, for a database system to run in this environment, it will need to have been designed to run in this environment. It isn’t practical to try to shoehorn in a database system that was written on the assumption that CPU cycles and memory are abundant. It may also be the case that the device has little-to-no persistent storage, which mandates an in-memory database.

So a database system for a field-deployed device is going to
1. have a small code size
2. use little stack
3. preferably, allocate no heap memory
4. have no, or minimal, external dependencies (e.g. not link in an extra 1 MB of code from the C run-time library)
5. have built-in ability to replicate data (to a gateway or directly to the cloud)
a. Replication should be “open”, meaning be able to replicate to a different database system
6. Have built-in security features

7. Nice to have:
a. built-in analytics to aggregate data prior to replicating it
b. ability to define the schema
c. ability to operate entirely in memory

A database system for the cloud might benefit from being schema-less, as described previously. It should certainly have pretty elastic scalability. Servers in the cloud are going to have ample resources and robust operating systems. So a database system for the cloud doesn’t need to have a small code size, use a small amount of stack memory, or worry about external dependencies such as the C run-time library. On the contrary, a database system for the cloud is expected to do much more (handle data at scale, execute analytics, etc.) and will, therefore, need ample resources. In fact, this database system should be able to take maximum advantage of the resources available, including being able to scale horizontally (across cores, CPUs, and servers).
In summary, the edge (device-based) DBMS needs to operate in a constrained environment. A cloud DBMS needs to be able to effectively and efficiently utilize the ample resources available to it.

Q4. Why is the ability to define a database schema important (versus a schema-less DBMS, aka NoSQL) for field-deployed devices?

Steve Graves: Field-deployed devices will normally perform a few specific functions (sometimes, just one function). For example, a building automation system manages HVAC, lighting, etc. A livestock management system manages feed, output, and so on. In such systems, the data requirements are well known. The hallmark NoSQL advantage of being able to store data without predefining its structure is unwarranted. The other purported hallmark of NoSQL is horizontal scalability, but this is not a need for field-deployed devices.
Walking away from the relational database model (and its implicit use of a database schema) has serious implications.
A great deal of scientific knowledge has been amassed around the relational database model over the last few decades, and without it developers are completely on their own with respect to enforcing sound data management practices.

In the NoSQL sphere, there is nothing comparable to the relational model (e.g. E.F. Codd’s work) and the mathematical foundation (relational calculus) underpinning it.
There should be overwhelming justification for a decision to not use relational.
In my experience, that justification is absent for data management of field-deployed devices.
A database system that “knows” the data design (via a schema) can more intelligently manage the data. For example, it can manage constraints, domain dependencies, events and much more. And some of the purported inflexibility imposed by a schema can be eliminated if the DBMS supports dynamic DDL (see more details on this in the answer to question Q6, below).

Q5. In your opinion, do IoT aggregation points resemble data lakes?

Steve Graves: The term data lake was originally conceived in the context of Hadoop and map-reduce functionality. In more recent times, the meaning of the term has morphed to become synonymous with big data, and that is how I use the term. Insofar as a gateway can also be an aggregation point, I would not say ‘aggregation points resemble data lakes’ because gateway aggregation points, in all likelihood, will not manage Big Data.

Q6. What are the main technical challenges for database systems used to accommodate new and unforeseen data, for example when a new type of device begins streaming data?

Steve Graves: The obvious challenges are
1. The ability to ingest new data that has a previously unknown structure
2. The ability to execute analytics on #1
3. The ability to integrate analytics on #1 with analytics on previously known data

#1 is handled well by NoSQL DBMSs. But, it might also be handled well by an RDBMS via “dynamic DDL” (dynamic data definition language), e.g. the ability to execute CREATE TABLE, ALTER TABLE, and/or CREATE INDEX statements against an existing database.
To efficiently execute analytics against any data, the structure of the data must eventually be understood.
RDBMS handle this through the database dictionary (the binary equivalent of the data definition language).
But some NoSQL DBMSs handle this through different meta data. For example, the MarkLogic DBMS uses JSON metadata to understand the structure of documents in its document store.
NoSQL DBMSs with no meta data whatsoever put the entire burden on the developers. In other words, since the data is opaque to the DBMS, the application code must read and interpret the content.

Q7. Client/server DBMS architecture vs. in-process DBMSs: which one is more suitable for IoT?

Steve Graves: For edge DBMSs (on constrained devices), an in-process architecture will be more suitable. It requires fewer resources than client/server architecture, and imposes less latency through elimination of inter-process communication. For cloud DBMSs, a client/server architecture will be more suitable. In the cloud environment, resources are not scarce, and the the advantage of being able to scale horizontally will outweigh the added latency associated with client/server.

Qx Anything else you wish to add?

Steve Graves: We feel that eXtremeDB is uniquely positioned for the Internet of Things. Not only have devices and gateways been in eXtremeDB’s wheelhouse for 15 years with over 25 million real world deployments, but the scalability, time series data management, and analytics built into the eXtremeDB server (big data) offering make it an attractive cloud database solution as well. Being able to leverage a single DBMS across devices, gateways and the cloud has obvious synergistic advantages.

Steve Graves is co-founder and CEO of McObject, a company specializing in embedded Database Management System (DBMS) software. Prior to McObject, Steve was president and chairman of Centura Solutions Corporation and vice president of worldwide consulting for Centura Software Corporation.


Big Data, Analytics, and the Internet of Things, by Mohak Shah, analytics leader and research scientist at Bosch Research, USA.ODBMS.org APRIL 6, 2015

 Privacy considerations & responsibilities in the era of Big Data & Internet of Things, by Ramkumar Ravichandran, Director, Analytics, Visa Inc. ODBMS.org January 8, 2015.

 Securing Your Largest USB-Connected Device: Your Car,BY Shomit Ghose, General Partner, ONSET Ventures, ODBMs.org MARCH 31, 2016.

 eXtremeDB Financial Edition DBMS Sweeps Records in Big Data Benchmark,ODBMS.org JULY 2, 2016

 eXtremeDB in-memory database

 User Experience Design for the Internet of Things

Related Posts

On the Internet of Things. Interview with Colin MahonyODBMS Industry Watch, Published on 2016-03-14

A Grand Tour of Big Data. Interview with Alan MorrisonODBMS Industry Watch, Published on 2016-02-25

On the Industrial Internet of Things. Interview with Leon Guzenda, ODBMS Industry Watch,  January 28, 2016

Follow us on Twitter: @odbmsorg


http://www.odbms.org/blog/2016/07/on-the-challenges-and-opportunities-of-iot-interview-with-steve-graves/feed/ 0
On Data Interoperability. Interview with Julie Lockner. http://www.odbms.org/blog/2016/06/on-data-interoperability-interview-with-julie-lockner/ http://www.odbms.org/blog/2016/06/on-data-interoperability-interview-with-julie-lockner/#comments Tue, 07 Jun 2016 16:47:14 +0000 http://www.odbms.org/blog/?p=4151

“From a healthcare perspective, how can we aggregate all the medical data, in all forms from multiple sources, such as wearables, home medical devices, MRI images, pharmacies and so on, and also blend in intelligence or new data sources, such as genomic data, so that doctors can make better decisions at the point of care?”– Julie Lockner.

I have interviewed Julie Lockner.  Julie leads data platform product marketing for InterSystems. Main topics of the interview are Data Interoperability and InterSystems` data platform strategy.


Q1. Everybody is talking about Big Data — is the term obsolete?

Julie Lockner: Well, there is no doubt that the sheer volume of data is exploding, especially with the proliferation of smart devices and the Internet of Things (IoT). An overlooked aspect of IoT is the enormous volume of data generated by a variety devices, and how to connect, integrate and manage it all.

The real challenge, though, is not just processing all that data, but extracting useful insights from the variety of device types. Put another way, not all data is created using a common standard. You want to know how to interpret data from each device, know which data from what type of device is important, and which trends are noteworthy. Better information can create better results when it can be aggregated and analyzed consistently, and that’s what we really care about. Better, higher quality outcomes, not bigger data.

Q2. If not Big Data, where do we go from here?

Julie Lockner: We always want to be focusing on helping our customers build smarter applications to solve real business challenges, such as helping them to better compete on service, roll out high-quality products quicker, simplify processes – not build solutions in search of a problem. A canonical example is in retail. Our customers want to leverage insight from every transaction they process to create a better buying experience online or at the point of sale. This means being able to aggregate information about a customer, analyze what the customer is doing while on the website, and make an offer at transaction time that would delight them. That’s the goal – a better experience – because that is what online consumers expect.

From a healthcare perspective, how can we aggregate all the medical data, in all forms from multiple sources, such as wearables, home medical devices, MRI images, pharmacies and so on, and also blend in intelligence or new data sources, such as genomic data, so that doctors can make better decisions at the point of care? That implies we are analyzing not just more data, but better data that comes in all shapes and sizes, and that changes more frequently. It really points to the need for data interoperability.

Q3. What are the challenges software developers are telling you they have in today’s data-intensive world?

Julie Lockner: That they have too many database technologies to choose from and prefer to have a simple data platform architecture that can support multiple data models and multiple workloads within a single development environment.
We understand that our customers need to build applications that can handle a vast increase in data volume, but also a vast array of data types – static, non-static, local, remote, structured and non-structured. It must be a platform that coalesces all these things, brings services to data, offers a range of data models, and deals with data at any volume to create a more stable, long-term foundation. They want all of these capabilities in one platform – not a platform for each data type.

For software developers today, it’s not enough to pick elements that solve some aspect of a problem and build enterprise solutions around them; not all components scale equally. You need a common platform without sacrificing scalability, security, resilience, rapid response. Meeting all these demands with the right data platform will create a successful application.
And the development experience is significantly improved and productivity drastically increased when they can use a single platform that meets all these needs. This is why they work with InterSystems.

Q4. Traditionally, analytics is used with structured data, “slicing and dicing” numbers. But the traditional approach also involves creating and maintaining a data warehouse, which can only provide a historical view of data. Does this work also in the new world of Internet of Things?

Julie Lockner: I don’t think so. It is generally possible to take amorphous data and build it into a structured data model, but to respond effectively to rapidly changing events, you need to be able to take data in the form in which it comes to you.

If your data platform lacks certain fields, if you lack schema definition, you need to be able to capitalize on all these forms without generating a static model or a refinement process. With a data warehouse approach, it can take days or weeks to create fully cleansed, normalized data.
That’s just not fast enough in today’s always-on world – especially as machine-generated data is not conforming to a common format any time soon. It comes back to the need for a data platform that supports interoperability.

Q5. How hard is it to make decisions based on real-time analysis of structured and unstructured data?

Julie Lockner: It doesn’t have to be hard. You need to generate rules that feed rules engines that, in turn, drive decisions, and then constantly update those rules. That is a radical enhancement of the concept of analytics in the service of improving outcomes, as more real-time feedback loops become available.

The collection of changes we describe as Big Data will profoundly transform enterprise applications of the future. Today we can see the potential to drive business in new ways and take advantage of a convergence of trends, but it is not happening yet. Where progress has been made is the intelligence of devices and first-level data aggregation, but not in the area of services that are needed. We’re not there yet.

Q6. What’s next on the horizon for InterSystems in meeting the data platform requirements of this new world?

Julie Lockner: We continually work on our data platform, developing the most innovative ways we can think of to integrate with new technologies and new modes of thinking. Interoperability is a hugely important component. It may seem a simple task to get to the single most pertinent fact, but the means to get there may be quite complex. You need to be able to make the right data available – easily – to construct the right questions.

Data is in all forms and at varying levels of completeness, cleanliness, and accuracy. For data to be consumed as we describe, you need measures of how well you can use it. You need to curate data so it gets cleansed and you can cull what is important. You need flexibility in how you view data, too. Gathering data without imposing an orthodoxy or structure allows you to gain access to more data. Not all data will conform to a schema a priori.

Q7. Recently you conducted a benchmark test of an application based on InterSystems Caché®. Could you please summarize the main results you have obtained?

Julie Lockner: One of our largest customers is Epic Systems, one of the world’s top healthcare software companies.
Epic relies on Caché as the data platform for electronic medical record solutions serving more than half the U.S. patient population and millions of patients worldwide.

Epic tested the scalability and performance improvements of Caché version 2015.1. Almost doubling the scalability of prior versions, Caché delivers what Epic President Cark Dvorak has described as “a key strategic advantage for our user organizations that are pursuing large-scale medical informatics programs as well as aggressive growth strategies in preparation for the volume-to-value transformation in healthcare.”

Qx Anything else you wish to add?

Julie Lockner: The reason why InterSystems has succeeded in the market for so many years is a commitment to the success of those who depend on our technology. A recent Gartner Magic Quadrant report found we had the highest number of customers surveyed – 85% – who would buy from us again. That is the highest number of any vendor participating in that study.

The foundation of the company’s culture is all about helping our customers succeed. When our customers come to us with a challenge, we all pitch in to solve it. Many times our solutions may address an unusual problem that could benefit others – which then becomes the source of many of our innovations. It is one of the ways we are using problem-solving skills as a winning strategy to benefit others. When our customers are successful at using our engine to solve the world’s most important challenges, we all win.


Julie Lockner leads data platform product marketing for InterSystems. She has more than 20 years of experience in IT product marketing management and technology strategy, including roles at analyst firm ESG as well as Informatica and EMC.



“InterSystems Unveils Major New Release of Caché,” Feb. 25, 2015.

“Gartner Magic Quadrant for Operational DBMS, Donald Feinberg, Merv Adrian, Nick Heudecker, Adam M. Ronthal, and Terilyn Palanca, October 12, 2015, ID: G00271405.

– White Paper: Big Data Healthcare: Data Scalability with InterSystems Caché® and Intel® Processors (LINK to .PDF)

Related Posts

– A Grand Tour of Big Data. Interview with Alan Morrison. ODBMs Industry Watch, February 25, 2016

–  RIP Big Data. By Carl Olofson, Research Vice President, Data Management Software Research, IDC. ODBMS.org, JANUARY 6, 2016.

What is data blending. By Oleg Roderick, David Sanchez, Geisinger Data Science. ODBMS.org, November 2015

Follow us on Twitter: @odbmsorg


http://www.odbms.org/blog/2016/06/on-data-interoperability-interview-with-julie-lockner/feed/ 0
Challenges and Opportunities of The Internet of Things. Interview with Steve Cellini http://www.odbms.org/blog/2015/10/challenges-and-opportunities-of-the-internet-of-things-interview-with-steve-cellini/ http://www.odbms.org/blog/2015/10/challenges-and-opportunities-of-the-internet-of-things-interview-with-steve-cellini/#comments Wed, 07 Oct 2015 00:01:17 +0000 http://www.odbms.org/blog/?p=4008

“The question of ‘who owns the data’ will undoubtedly add requirements on the underlying service architecture and database, such as the ability to add meta-data relationships representing the provenance or ownership of specific device data.”–Steve Cellini

I have interviewed Steve Cellini, Vice President of Product Management at NuoDB. We covered the challenges and opportunities of The Internet of Things, seen from the perspective of a database vendor.


Q1. What are in your opinion the main Challenges and Opportunities of The Internet of Things (IoT) seen from the perspective of a database vendor?

Steve Cellini: Great question. With the popularity of Internet of Things, companies have to deal with various requirements, including data confidentiality and authentication, access control within the IoT network, privacy and trust among users and devices, and the enforcement of security and privacy policies. Traditional security counter-measures cannot be directly applied to IoT technologies due to the different standards and communication stacks involved. Moreover, the high number of interconnected devices leads to scalability issues; therefore a flexible infrastructure is needed to be able to deal with security threats in such a dynamic environment.

If you think about IoT from a data perspective, you’d see these characteristics:
• Distributed: lots of data sources, and consumers of workloads over that data are cross-country, cross-region and worldwide.
• Dynamic: data sources come and go, data rates may fluctuate as sets of data are added, dropped or moved into a locality. Workloads may also fluctuate.
• Diverse: data arrives from different kinds of sources
• Immediate: some workloads, such as monitoring, alerting, exception handling require near-real-time access to data for analytics. Especially if you want to spot trends before they become problems, or identify outliers by comparison to current norms or for a real-time dashboard.
These issues represent opportunities for the next generation of databases. For instance, the need for immediacy turns into a strong HTAP (Hybrid Transactional and Analytic Processing) requirement to support that as well as the real-time consumption of the raw data from all the devices.

Q2. Among the key challenge areas for IoT are Security, Trust and Privacy. What is your take on this?

Steve Cellini: IoT scenarios often involve human activities, such as tracking utility usage in a home or recording motion received from security cameras. The data from a single device may be by itself innocuous, but when the data from a variety of devices is combined and integrated, the result may be a fairly complete and revealing view of one’s activities, and may not be anonymous.

With this in mind, the associated data can be thought of as “valuable” or “sensitive” data, with attendant requirements on the underlying database, not dissimilar from, say, the kinds of protections you’d apply to financial data — such as authentication, authorization, logging or encryption.

Additionally, data sovereignty or residency regulations may also require that IoT data for a given class of users be stored in a specific region only, even as workloads that consume that data might be located elsewhere, or may in fact roam in other regions.

There may be other requirements, such as the need to be able to track and audit intermediate handlers of the data, including IoT hubs or gateways, given the increasing trend to closely integrate a device with a specific cloud service provider, which intermediates general access to the device. Also, the question of ‘who owns the data’ will undoubtedly add requirements on the underlying service architecture and database, such as the ability to add meta-data relationships representing the provenance or ownership of specific device data.

Q3. What are the main technical challenges to keep in mind while selecting a database for today’s mobile environment?

Steve Cellini: Mobile users represent sources of data and transactions that move around, imposing additional requirements on the underlying service architecture. One obvious requirement is to enable low-latency access to a fully active, consistent, and up-to-date view of the database, for both mobile apps and their users, and for backend workloads, regardless of where users happen to be located. These two goals may conflict if the underlying database system is locked to a single region, or if it’s replicated and does not support write access in all regions.

It can also get interesting when you take into account the growing body of data sovereignty or residency regulations. Even as your users are traveling globally, how do you ensure that their data-at-rest is being stored in only their home region?

If you can’t achieve these goals without a lot of special-case coding in the application, you are going to have a very complex, error-prone application and service architecture.

Q4. You define NuoDB as a scale-out SQL database for global operations. Could you elaborate on the key features of NuoDB?

Steve Cellini: NuoDB offers several key value propositions to customers: the ability to geo-distribute a single logical database across multiple data centers or regions, arbitrary levels of continuous availability and storage redundancy, elastic horizontal scale out/in on commodity hardware, automation, ease and efficiency of multi-tenancy.
All of these capabilities enable operations to cope flexibly, efficiently and economically as the workload rises and dips around the business lifecycle, or expands with new business requirements.

Q5. What are the typical customer demands that you are responding to?

Steve Cellini: NuoDB is the database for today’s on-demand economy. Businesses have to respond to their customers who demand immediate response and expect a consistent view of their data, whether it be their bank account or e-commerce apps — no matter where they are located. Therefore, businesses are looking to move their key applications to the cloud and ensure data consistency – and that’s what is driving the demand for our geo-distributed SQL database.

Q6. Who needs a geo-distributed database? Could you give some example of relevant use cases?

Steve Cellini: A lot of our customers come to us precisely for our geo distributed capability – by which I mean our ability to run a single unified database spread across multiple locations, accessible for querying and updating equally in all those locations. This is important where applications have mobile users, switching the location they connect to. That happens a lot in the telecommuting industry. Or they’re operating ‘follow the sun’ services where a user might need to access any data from anywhere that’s a pattern with global financial services customers. Or just so they can offer the same low-latency service everywhere. That’s what we call “local everywhere”, which means you don’t see increasing delays, if you are traveling further from the central database.

Q7. You performed recently some tests using the DBT2 Benchmark. Why are you using the DBT2 Benchmark and what are the results you obtained so far?

Steve Cellini: The DBT2 (TPC/C) benchmark is a good test for an operational database, because it simulates a real-world transactional workload.
Our focus on DBT2 hasn’t been on achieving a new record for absolute NOTPM rates, but rather to explore one of our core value propositions — horizontal scale out on commodity hardware. We recently passed the 1 million NOTPM mark on a cluster of 50 low-cost machines and we are very excited about it.

Q8. How is your offering in the area of automation, resiliency, and disaster recovery different (or comparable) with some of the other database competitors?

Steve Cellini: We’ve heard from customers who need to move beyond the complexity, pain and cost of their disaster recovery operations, such as expanding from a typical two data center replication operation to three or more data centers, or addressing lags in updates to the replica, or moving to active/active.

With NuoDB, you use our automation capability to dynamically expand the number of hosts and regions a database operates in, without any interruption of service. You can dial in the level of compute and storage redundancy required and there is no single point of failure in a production NuoDB configuration. And you can update in every location – which may be more than two, if that’s what you need.

Steve Cellini VP, Product Management, NuoDB
Steve joined NuoDB in 2014 and is responsible for Product Management and Product Support, as well as helping with strategic partnerships.

In his 30-year career, he has led software and services programs at various companies – from startups to Fortune 500 – focusing on bringing transformational technology to market. Steve started his career building simulator and user interface systems for electrical and mechanical CAD products and currently holds six patents.

Prior to NuoDB, Steve held senior technical and management positions on cloud, database, and storage projects at EMC, Mozy, and Microsoft. At Microsoft, Steve helped launch one of the first cloud platform services and led a company-wide technical evangelism team. Steve has also built and launched several connected mobile apps. He also managed Services and Engineering groups at two of the first object database companies – Ontos (Ontologic) and Object Design.

Steve holds a Sc.B in Engineering Physics from Cornell University.


DBT-2 Clone from SourceForge

Setting up DBT-2 for NuoDB, Github

One Million NOTPM DBT2 Benchmark on NuoDB 2.3 By Dai Klegg, NuoDB, Sr Director of Product Marketing. ODBMS.org

Hybrid Transaction and Analytical Processing with NuoDB. Technical Whitepaper, NuoDB. ODBMS.org

Related Posts

Big Data, Analytics, and the Internet of Things. Mohak Shah, analytics leader and research scientist at Bosch Research, USA, ODBMS.org

SMART DATA: Running the Internet of Things as a Citizen Web. by Dirk Helbing , ETH Zurich. ODBMS.org

On Big Data and the Internet of Things. Interview with Bill Franks. ODBMS Industry Watch, March 9, 2015

Follow ODBMS.org on Twitter: @odbmsorg


http://www.odbms.org/blog/2015/10/challenges-and-opportunities-of-the-internet-of-things-interview-with-steve-cellini/feed/ 0
On Big Data and the Internet of Things. Interview with Bill Franks http://www.odbms.org/blog/2015/03/interview-bill-franks/ http://www.odbms.org/blog/2015/03/interview-bill-franks/#comments Mon, 09 Mar 2015 15:52:38 +0000 http://www.odbms.org/blog/?p=3791

“Perhaps the biggest challenge is that the IoT has the potential to generate orders of magnitude more data than any other source in existence today. So, in the world of the IoT we will test the limits of ‘big.’”–Bill Franks

On topics of Data Warehouses, Hadoop, the Internet of Things, and Teradata`s perspective on the world of Big Data, I have interviewed Bill Franks, Chief Analytics Officer for Teradata.


Q1. What is Teradata`s perspective on the world of Big Data?

Bill Franks: Our perspective has not really changed with regard to ‘big data:’ the primary mission of Teradata for decades has been helping organizations utilize and analyze large volumes of data to produce insight for business value. Note that our Teradata database was originally designed exclusively for analytics, then called ‘decision support’ – unlike most other platforms, which were designed for general computing – then later adapted for analytic uses. As a result, the Teradata analytic engine is – and has always been – uniquely architected for large – ‘big data’ – volume and complexity aimed at producing actionable intelligence.

Of course, the amount of data that’s considered ‘big’ and thus a challenge – has changed, and we have a lot of novel data sources in recent times. However, we believe that companies which have always focused on analyzing and acting upon data intelligently can adapt to the new world of big data. After all, big data is just more data and the analysis of big data is still analysis. There are as many similarities as differences from the past.

Teradata has engineered further analytic enhancements over the years to create a diverse portfolio of products, partnerships, and services to allow our customers to continue to get the most from their data assets. The pace of change is very rapid today and we expect that to continue. We believe our strength is in our experience, expertise and our ability to help organizations navigate the changing landscape and continue to derive new, useful insights from their increasingly large and diverse data sources.

Q2. Most data warehousing projects consolidate data from different source systems. What is different in the world of Big Data?

Bill Franks: By definition, if you want to look at two different data sources together, you must either move one set of data to the other or move them both to a 3rd location. If data is truly disparate, you can’t use it effectively. That is what drove data warehousing to prominence. One huge difference between data warehousing practices years back and then today is that previously, all data that was captured in the business world met three criteria almost 100% of the time.
1) It was immensely important; given the cost to capture and store it,
2) The data was well structured, and
3) The data was generated by an organization’s internal business processes.
— Therefore, it was mostly placed in relational databases or on a mainframe since those technologies easily handle that type of data. Data warehousing solved the problem of many structured data platforms being spread out – by consolidating the sources for analytic purposes into a single structured platform.

What is different with big data is that today, the data often violates all of the rules.
1) Much of it is not important, or has not yet been proven to be important,
2) The data is not structured in the classic fashion at the outset (though most can and must be structured for analytical purposes), and
3) The data is often from sources external to an organization.
— As a result, we now have disparate data platforms that each serve different functions. Some focus on one type of data, while others focus on flexibility. However, the downside is that these platforms don’t integrate well and it isn’t as easy to tie everything together. That’s a problem Teradata is working diligently to solve with our Unified Data Architecture – our pioneering version of the visionary Gartner Logical Data Warehouse.

Q3. Will data warehouses become obsolete soon and be replaced by Hadoop?

Bill Franks: Absolutely not. A few years ago, that was a common claim. That claim is rarely heard today. In fact, all of the big Hadoop vendors partner with Teradata.
This is because our data warehousing platforms provide some important things Hadoop does not — just as Hadoop provides some things a data warehouse does not. Each platform has its strengths and weaknesses, but when positioned together, additional value is added. Part of the issue is that people mistake policy decisions for technology limitations.
There is no reason you can’t place untested, raw, unclean data of unknown value on a data warehousing platform; it’s the corporate policies that often forbid it. It is true that once data is critical and is leveraged by many applications and business users, you have to keep some control and consistency over it. This is what a data warehouse does for an organization.
But, that doesn’t mean you can’t experiment with new sources freely using the technology that supports formal data warehouses.

A colleague of mine mentioned a conversation he had with a Hadoop user. That user was boasting about how he could with a single command change the data type of information on Hadoop, for instance, if it would help him more easily solve his next problem. My friend then asked him what would happen to the prior dozen or two processes that were built expecting the data to be in the original data type format. Wouldn’t they all then break? The user had a blank stare for a moment and then realized his error. As you develop more processes, you must implement security, consistency, and controls on the underlying data. This is why data warehousing – as Gartner defines it, is going to be around for a long time.

Q4. With the increased need of tools for combining data together, are we going to see a “federated”- Big Data architecture?

Bill Franks: A form of that is exactly what we are pursuing with Teradata’s Unified Data Architecture. Again – we refer to Gartner’s vision of the “Logical Data Warehouse.” What we are doing is putting in place a layer of architecture that connects multiple disparate data stores. This architecture includes – and connects – relational databases like Teradata and Oracle, discovery platforms like our Teradata Aster offering, Hadoop, and other platforms such as MongoDB. The idea is that we make information available to users about data throughout the ecosystem, not just the data on the platform they are operating from. So, I see a data dictionary that includes a “table” called “Sensor Feed.”
I can see the data elements available and write analytic logic against those elements. However, I don’t need to be aware of whether the data is a database table, or a Hadoop file, or is in MongoDB. Users can simply build analytics instead of worrying about where data resides, how to log on to various systems, and how to move data. We’ll handle that for them.

We are also beginning to push processing across the various platforms to optimize performance. Just like with a ‘table’ versus a ‘view’ in a database, making a process enterprise-ready might require moving data around the architecture permanently. But now, users are free to discover where that is required. And, the technical team behind the curtain can worry about the details just as they do with traditional data warehousing. We are very bullish on our approach and think we are well positioned to maintain our leadership position in the analytics space.

Q5. Teradata made several acquisitions lately. How do the tools that Teradata acquired fit the current Teradata Data Architectural Framework?

Bill Franks: I believe this in general was addressed. However, in addition I would point out that we acquired Revelytix in 2014 to obtain Loom: an open platform for discovering, profiling, preparing, and tracking data lineage for data in Hadoop. Likewise, we acquired Hadapt, which created a big data analytic platform natively integrating SQL with Apache Hadoop. Plus our recent RainStor acquisition strengthens Teradata’s enterprise-grade Hadoop solutions and enables organizations to add archival data store capabilities for their entire enterprise, including data stored in OLTP, data warehouses, and applications.

Q6. What are the key differentiators of the Teradata Database core architecture?

Bill Franks: As I said, the Teradata DW was differentiated from the start – uniquely architected for analytics from day one. However, I would add that Teradata continues to broaden our differentiation: we’ve built the best data orchestration software in the industry (Teradata Unity and QueryGrid). The orchestration software is key – because it enables our customers to choose a file system that they use to store the data in – and the analytics that they apply to that data independently — and marry them together with software.
It helps reduce the complexity of connecting to, accessing, understanding interfaces and getting value from multiple analytical systems. Another differentiator is Teradata Intelligent Memory, introduced two years ago. TIM is the world’s first extended memory technology beyond cache to increase query performance. Users can configure the exact amount of in-memory capability needed for critical workloads – based on temperature – hot or cold data. The list goes on. I would say that our data technology really does focus on how data is best used – and what proficient users need most.

Q7. Is SQL really the right language to handle Big Data Analytics?

Bill Franks: In some cases yes and in some cases no. We want users to be able to utilize whatever language or platform is best for any given task. There are many big data requirements that perfectly fit SQL and many that don’t. The key is enabling scalable access to the data and flexibility in approach. Most people are aware that there is a big effort to add a SQL interface to Hadoop. What most haven’t realized is how far we’ve also come the other direction. For some time, Teradata has allowed C and Java processing directly against our database platforms via User Defined Functions and other similar extensions. We are now also enabling other languages such as R and Python to be executed within a Teradata context. What is possible today is so far beyond what was possible even 5 or 10 years ago.

Q8. How do you see the adoption of Cloud for Analytics?

Bill Franks: We are aggressively rolling out our own cloud offerings across our product suites. Many of our enterprise customers also configure our products as a private cloud behind their firewall. Adoption will be mixed based on the type of data and nature of work being done. Anything involving sensitive data is still typically not allowed outside a firewall. If you think back to the issue raised in a prior question of having to be able to combine data for analytics, you can’t really have some data locked behind a firewall and some data locked outside it. The real driver behind the cloud is that people want flexible, pay on demand access to analysis platforms. We have multiple ways to provide that to our clients, of which our cloud offerings are only one option. We have some other novel pricing and licensing options the help customers get access to the resources they require for analytics.

Q9. What are the most important data challenges posed by the Internet of Things (IoT)?

Bill Franks: Perhaps the biggest challenge is that the IOT has the potential to generate orders of magnitude more data than any other source in existence today.
So, in the world of the IOT we will test the limits of ‘big.’ At the same time, much of the data generated by the IOT will have low value in the short term and no value in the long term. One of the biggest challenges will be determining which pieces of the information generated by a given sensor actually matters to your business and for how long. In the long run, it is likely that only a small fraction of the raw data produced by the IOT will be stored beyond a few moments of immediate usage. For example, why keep the sensor readings that help navigate my car into a tight parking spot? Once I’m safely in the spot, I really don’t ever need to revisit that data again. If I hit a car in front of me, I might make an exception and keep the data so that the cause can be identified.

Q10. Could you mention some successful Big Data projects you have recently completed with customers?

Bill Franks: We are seeing a lot of very interesting analytics come about. We’ve helped health organizations discover genetic patterns associated with disease, we’ve helped manufacturers reduce cost and increase customer satisfaction by building predictive maintenance algorithms, we’ve helped cable providers identify valuable consumer viewing habits.
I could go on and on. A great place to see some of the examples, and even hear from some of the companies and people behind it, is at our website.

Bill Franks is the Chief Analytics Officer for Teradata, where he provides insight on trends in the analytics and big data space and helps clients understand how Teradata and its analytic partners can support their efforts. His focus is to translate complex analytics into terms that business users can understand and work with organizations to implement their analytics effectively. His work has spanned many industries for companies ranging from Fortune 100 companies to small non-profits. Franks also helps determine Teradata’s strategies in the areas of analytics and big data.

Franks is the author of the book Taming The Big Data Tidal Wave (John Wiley & Sons, Inc., April, 2012). In the book, he applies his two decades of experience working with clients on large-scale analytics initiatives to outline what it takes to succeed in today’s world of big data and analytics. The book made Tom Peter’s list of 2014 “Must Read” books and also the Top 10 Most Influential Translated Technology Books list from CSDN in China.

Franks’ second book The Analytics Revolution (John Wiley & Sons, Inc., September, 2014) lays out how to move beyond using analytics to find important insights in data (both big and small) and into operationalizing those insights at scale to truly impact a business.

 He is a faculty member of the International Institute for Analytics, founded by leading analytics expert Tom Davenport, and an active speaker who has presented at dozens of events in recent years. His blog, Analytics Matters, addresses the transformation required to make analytics a core component of business decisions. 

Franks earned a Bachelor’s degree in Applied Statistics from Virginia Tech and a Master’s degree in Applied Statistics from North Carolina State University.  More information is available here: http://www.bill-franks.com.

2014 Gartner Magic Quadrant for Data Warehouse and Database Management Systems. 07 March 2014 Analyst(s): Mark A. Beyer | Roxane Edjlali

Related Posts

On MarkLogic 8. Interview with Stephen Buxton. ODBMS Industry Watch Published on 2015-02-13

On Hadoop RDBMS. Interview with Monte Zweben. ODBMS Industry Watch Published on 2014-11-02

Follow ODBMS.org on Twitter: @odbmsorg

http://www.odbms.org/blog/2015/03/interview-bill-franks/feed/ 0