On the Industrial Internet of Things. Interview with Leon Guzenda
“Apart from security, the single biggest new challenges that the Industrial Internet of Things poses are the number of devices involved, the rate that many of them can generate data and the database and analytical requirements.” –Leon Guzenda.
I have interviewed Leon Guzenda, Chief Technical Marketing Officer at Objectivity. Topics of the interview are data analytics, the Industrial Internet of Things (IIoT), and ThingSpan.
Q1. What is the difference between Big Data and Fast Data?
Leon Guzenda: Big Data is a generic term for datasets that are too large or complex to be handled with traditional technology. Fast Data refers to streams of data that must be processed or acted upon immediately once received.
If most, or all, of it is stored, it will probably end up as Big Data. Hadoop standardized the parallel processing approach for Big Data, and HDFS provided a resilient storage infrastructure. Meanwhile, Complex Event Processing became the main way of dealing with fast-moving streams of data, applying business logic and triggering event processing. Spark is a major step forward in controlling workflows that have streaming, batch and interactive elements, but it only offers a fairly primitive way to bridge the gap between the Fast and Big Data worlds via tabular RDDs or DataFrames.
ThingSpan, Objectivity’s new information fusion platform, goes beyond that. It integrates with Spark Streaming and HDFS to provide a dynamic Metadata Store that holds information about the many complex relationships between the objects in the Hadoop repository or elsewhere. It can be used to guide data mining using Spark SQL or GraphX and analytics using Spark MLlib.
Q2. Shawn Rogers, Chief Research Officer, Dell Statistica recently said in an interview: “A ‘citizen data scientist’ is an everyday, non-technical user that lacks the statistical and analytical prowess of a traditional data scientist, but is equally eager to leverage data in order to uncovering insights, and importantly, do so at the speed business”. What is your take on this?
Leon Guzenda: It’s a bit like the difference between amateur and professional astronomers.
There are far more data users than trained data scientists, and it’s important that the data users have all of the tools needed to extract value from their data. Things filter down from the professionals to the occasional users. I’ve heard the term “NoHow” applied to tools that make this possible. In other words, the users don’t have to understand the intricacy of the algorithms. They only need to apply them and interpret the results. We’re a long way from that with most kinds of data, but there is a lot of work in this area.
We are making advances in visual analytics, but there is also a large and rapidly growing set of algorithms that the tool builders need to make available. Users should be able to define their data sources, say roughly what they’re looking for and let the tool assemble the workflow and visualizers. We like the idea of “Citizen Data Scientists” being able to extract value from their data more efficiently, but let’s not forget that data blending at the front end is still a challenge and may need some expert help.
That’s another reason why the ThingSpan Metadata Store is important. An expert can describe the data there in terms that are familiar to the user. Applying the wrong analytical algorithm can produce false patterns, particularly when the data has been sampled inadequately. Once again, having an expert constrain those of particular algorithms to certain types of data can make it much more likely that the Citizen Data Scientists will obtain useful results.
Q3. Do we really need the Internet of Things?
Leon Guzenda: That’s a good question. It’s only worth inventing a category if the things that it applies to are sufficiently different from other categories to merit it. If we think of the Internet as a network of connected networks that share the same protocol, then it isn’t necessary to define exactly what each node is. The earliest activities on the Internet were messaging, email and file sharing. The WWW made it possible to set up client-server systems that ran over the Internet. We soon had “push” systems that streamed messages to subscribers rather than having them visit a site and read them. One of the fastest growing uses is the streaming of audio and video. We still haven’t overcome some of the major issues associated with the Internet, notably security, but we’ve come a long way.
Around the turn of the century it became clear that there are real advantages in connecting a wider variety of devices directly to each other in order to improve their effectiveness or an overall system. Separate areas of study, such as smart power grids, cities and homes, each came to the conclusion that new protocols were needed if there were no humans tightly coupled to the loop. Those efforts are now converging to the discipline that we call the Internet of Things (IoT), though you only have to walk the exhibitor hall at any IoT conference to find that we’re at about the same point as we were in the early NoSQL conferences. Some companies have been tackling the problems for many years whilst others are trying to bring value by making it easier to handle connectivity, configuration, security, monitoring, etc.
The Industrial IoT (IIoT) is vital, because it can help improve our quality of life and safety whilst increasing the efficiency of the systems that serve us. The IIoT is a great opportunity for some of the database vendors, such as Objectivity, because we’ve been involved with companies or projects tackling these issues for a couple of decades, notably in telecoms, process control, sensor data fusion, and intelligence analysis. New IoT systems generally need to store data somewhere and make it easy to analyze. That’s what we’re focused on, and why we decided to build ThingSpan, to leverage our existing technology with new open source components to enable real-time relationship and pattern discovery of IIoT applications.
Q4. What is special about the Industrial Internet of Things? And what are the challenges and opportunities in this area?
Leon Guzenda:. Apart from security, the single biggest new challenges that the IIoT poses are the number of devices involved, the rate that many of them can generate data and the database and analytical requirements. The number of humans on the planet is heading towards eight billion, but not all of them have Internet access. The UN expects that there will be around 11 billion of us by 2100. There are likely to be around 25 billion IIoT devices by 2020.
There is growing recognition and desire by organizations to better utilize their sensor-based data to gain competitive advantage. According to McKinsey & Co., organizations in many industry segments are currently using less than 5% of data from their sensors. Better utilization of sensor-based data could lead to a positive impact of up to $11.1 Trillion per year by 2025 through improved productivities.
Q5. Could you give us some examples of predictive maintenance and asset management within the Industrial IoT?
Leon Guzenda: Yes, neither use case is new nor directly the result of the IIoT, but the IIoT makes it easier to collect, aggregate and act upon information gathered from devices. We have customers building telecom, process control and smart building management systems that aggregate information from multiple customers in order to make better predictions about when equipment should be tweaked or maintained.
One of our customers provides systems for conducting seismic surveys for oil and gas companies and for helping them maximize the yield from the resources that they discover. A single borehole can have 10,000 sensors in the equipment at the site.
That’s a lot of data to process in order to maintain control of the operation and avoid problems. Replacing a broken drill bit can take one to three days, with the downtime costing between $1 million and $3.5 million. Predictive maintenance can be used to schedule timely replacement or servicing of the drill bit, reducing the downtime to three hours or so.
There are similar case studies across industries. The CEO of one of the world’s largest package transportation companies said recently that saving a single mile off of every driver’s route resulted in savings of $50 million per year! Airlines also use predictive maintenance to service engines and other aircraft parts to keep passengers safely in the air, and mining companies use GPS tracking beacons on all of their assets to schedule the servicing of vital and very costly equipment optimally. Prevention is much better than treatment when it comes to massive or expensive equipment.
Q6. What is ThingSpan? How is it positioned in the market?
Leon Guzenda: ThingSpan is an information fusion software platform, architected for performance and extensibility, to accelerate time-to-production of IoT applications. ThingSpan is designed to seat between streaming analytics platforms and Big Data platforms in the Fast Data pipeline to create contextual information in the form of transformed data and domain metadata from streaming data and static, historical data. Its main differentiators from other tools in the field are its abilities to handle concurrent high volume ingest and pathfinding query loads.
ThingSpan is built around object-data management technology that is battle-tested in data fusion solutions in production use with U.S. government and Fortune 1000 organizations. It provides out-of-the-box integration with Spark and Hadoop 2.0 as well as other major open source technologies. Objectivity has been bridging the gap between Big Data and Fast Data within the IIoT for leading government agencies and commercial enterprises for decades, in industries such as manufacturing, oil and gas, utilities, logistics and transportation, and telecommunications. Our software is embedded as a key component in several custom IIoT applications, such as management of real-time sensor data, security solutions, and smart grid management.
Q7. Graphs are hard to scale. How do you handle this in ThingSpan?
Leon Guzenda: ThingSpan is based on our scalable, high-performance, distributed object database technology. ThingSpan isn’t constrained to graphs that can be handled in memory, nor is it dependent upon messaging between vertices in the graph. The address space could be easily expanded to the Yottabyte range or beyond, so we don’t expect any scalability issues. The underlying kernel handles difficult tasks, such as pathfinding between nodes, so performance is high and predictable. Supplementing ThingSpan’s database capabilities with the algorithms available via Spark GraphX makes it possible for users to handle a much broader range of tasks.
We’ve also noted over the years that most graphs aren’t as randomly connected as you might expect. We often see clusters of subgraphs, or dandelion-like structures, that we can use to optimize the physical placement of portions of the graph on disk. Having said that, we’ve also done a lot of work to reduce the impact of supernodes (ones with extremely large numbers of connections) and to speed up pathfinding in the cases where physical clustering doesn’t work.
Q8. Could you describe how ThingSpan’s graph capabilities can be beneficial for use cases, such as cybersecurity, fraud detection and anti-money laundering in financial services, to name a few?
Leon Guzenda: Each of those use cases, particularly cybersecurity, deals with fast-moving streams of data, which can be analyzed by checking thresholds in individual pieces of data or accumulated statistics. ThingSpan can be used to correlate the incoming (“Fast”) data that is handled by Spark Streaming with a graph of connections between devices, people or institutions. At that point, you can recognize Denial of Service attacks, fraudulent transactions or money laundering networks, all of which will involve nodes representing suspicious people or organizations.
The faster you can do this, the more chance you have of containing a cybersecurity threat or preventing financial crimes.
Q9. Objectivity has traditionally focused on a relatively narrow range of verticals. How do you intend to support a much broader range of markets than your current base?
Leon Guzenda: Our base has evolved over the years and the number of markets has expanded since the industry’s adoption of Java and widespread acceptance of NoSQL technology. We’ve traditionally maintained a highly focused engineering team and very responsive product support teams at our headquarters and out in the field. We have never attempted to be like Microsoft or Apple, with huge teams of customer service people handling thousands of calls per day. We’ve worked with VARs that embed our products in their equipment or with system integrators that build highly complex systems for their government and industry customers.
We’re expanding this approach with ThingSpan by working with the open source community, as well as building partnerships with technology and service providers. We don’t believe that it’s feasible or necessary to suddenly acquire expertise in a rapidly growing range of disciplines and verticals. We’re happy to hand much of the service work over to partners with the right domain expertise while we focus on strengthening our technologies. We recently announced a technology partnership with Intel via their Trusted Analytics Platform (TAP) initiative. We’ll soon be announcing certification by key technology partners and the completion of major proof of concept ThingSpan projects. Each of us will handle a part of a specific project, supporting our own products or providing expertise and working together to improve our offerings.
Leon Guzenda, Chief Technical Marketing Officer at Objectivity
Leon Guzenda was one of the founding members of Objectivity in 1988 and one of the original architects of Objectivity/DB.
He currently works with Objectivity’s major customers to help them effectively develop and deploy complex applications and systems that use the industry’s highest-performing, most reliable DBMS technology, Objectivity/DB. He also liaises with technology partners and industry groups to help ensure that Objectivity/DB remains at the forefront of database and distributed computing technology.
Leon has more than five decades of experience in the software industry. At Automation Technology Products, he managed the development of the ODBMS for the Cimplex solid modeling and numerical control system.
Before that, he was Principal Project Director for International Computers Ltd. in the United Kingdom, delivering major projects for NATO and leading multinationals. He was also design and development manager for ICL’s 2900 IDMS database product. He spent the first 7 years of his career working in defense and government systems. Leon has a B.S. degree in Electronic Engineering from the University of Wales.
– What is data blending. By Oleg Roderick, David Sanchez, Geisinger Data Science, ODBMS.org, November 2015
-￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼ Industrial Internet of Things: Unleashing the Potential of Connected Products and Services. World Economic Forum. January 2015
– Can Columnar Database Systems Help Mathematical Analytics? by Carlos Ordonez, Department of Computer Science, University of Houston. ODBMS.org, 23 JAN, 2016.
–The Managers Who Stare at Graphs. By Christopher Surdak, JD. ODBMS.org, 23 SEP, 2015.
– From Classical Analytics to Big Data Analytics. by Peter Weidl, IT-Architect, Zürcher Kantonalbank. ODBMS.org,11 AUG, 2015
– Streamlining the Big Data Landscape: Real World Network Security Usecase. By Sonali Parthasarathy Accenture Technology Labs. ODBMS.org, 2 JUL, 2015.
Follow ODBMS.org on Twitter: @odbmsorg