ODBMS Industry Watch » Internet of Things http://www.odbms.org/blog Trends and Information on Big Data, New Data Management Technologies, Data Science and Innovation. Fri, 09 Feb 2018 21:04:31 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.19 On the InterSystems IRIS Data Platform. http://www.odbms.org/blog/2018/02/on-the-intersystems-iris-data-platform/ http://www.odbms.org/blog/2018/02/on-the-intersystems-iris-data-platform/#comments Fri, 09 Feb 2018 15:16:22 +0000 http://www.odbms.org/blog/?p=4572

“We believe that businesses today are looking for ways to leverage the large amounts of data collected, which is driving them to try to minimize, or eliminate, the delay between event, insight, and action to embed data-driven intelligence into their real-time business processes.” –Simon Player

I have interviewed Simon Player, Director of Development for TrakCare and Data PlatformsHelene Lengler, Regional Director for DACH & BeNeLux, and  Joe Lichtenberg, Director of Marketing for Data Platforms. All three work at InterSystems. We talked about the new InterSystems IRIS Data Platform.

RVZ

Q1. You recently  announced the InterSystems IRIS Data Platform®. What is it?

Simon Player: We believe that businesses today are looking for ways to leverage the large amounts of data collected, which is driving them to try to minimize, or eliminate, the delay between event, insight, and action to embed data-driven intelligence into their real-time business processes.

It is time for database software to evolve and offer multiple capabilities to manage that business data within a single, integrated software solution. This is why we chose to include the term ‘data platform’ in the product’s name.
InterSystems IRIS Data Platform supports transactional and analytic workloads concurrently, in the same engine, without requiring moving, mapping, or translating the data, eliminating latency and complexity. It incorporates multiple, disparate and dissimilar data sources, supports embedded real-time analytics, easily scales for growing data and user volumes, interoperates seamlessly with other systems, and provides flexible, agile, Dev Ops-compatible deployment capabilities.

InterSystems IRIS provides concurrent transactional and analytic processing capabilities; support for multiple, fully synchronized data models (relational, hierarchical, object, and document); a complete interoperability platform for integrating disparate data silos and applications; and sophisticated structured and unstructured analytics capabilities supporting both batch and real-time use cases in a single product built from the ground up with a single architecture. The platform also provides an open analytics environment for incorporating best-of-breed analytics into InterSystems IRIS solutions, and offers flexible deployment capabilities to support any combination of cloud and on-premises deployments.

Q2. How is InterSystems IRIS Data Platform positioned with respect to other Big Data platforms in the market (e.g. Amazon Web Services, Cloudera, Hortonworks Data Platform, Google Cloud Platform, IBM Watson Data Platform and Watson Analytics, Oracle Data Cloud system, Microsoft Azure, to name a few) ?

Joe Lichtenberg: Unlike other approaches that require organizations to implement and integrate different technologies, InterSystems IRIS delivers all of the functionality in a single product with a common architecture and development experience, making it faster and easier to build real-time, data rich applications. However it is an open environment and can integrate with existing technologies already in use in the customer’s environment.

Q3. How do you ensure High Performance with Horizontal and Vertical Scalability? 

Simon Player: Scaling a system vertically by increasing its capacity and resources is a common, well-understood practice. Recognizing this, InterSystems IRIS includes a number of built-in capabilities that help developers leverage the gains and optimize performance. The main areas of focus are Memory, IOPS and Processing management. Some of these tuning mechanisms operate transparently, while others require specific adjustments on the developer’s own part to take full advantage.
One example of those capabilities is parallel query execution, built on a flexible infrastructure for maximizing CPU usage, it spawns one process per CPU core, and is most effective with large data volumes, such as analytical workloads that make large aggregation.

When vertical scaling does not provide the complete solution—for example, when you hit the inevitable hardware (or budget) ceiling—data platforms can also be scaled horizontally. Horizontal scaling fits very well with virtual and cloud infrastructure, in which additional nodes can be quickly and easily provisioned as the workload grows, and decommissioned if the load decreases.
InterSystems IRIS accomplishes this by providing the ability to scale for both increasing user volume and increasing data volume.

For increased user capacity, we leverage a distributed cache with an architectural solution that partitions users transparently across a tier of application servers sitting in front of our data server(s). Each application server handles user queries and transactions using its own cache, while all data is stored on the data server(s), which automatically keeps the application server caches in sync.

For increased data volume, we distribute the workload to a sharded cluster with partitioned data storage, along with the corresponding caches, providing horizontal scaling for queries and data ingestion. In a basic sharded cluster, a sharded table is partitioned horizontally into roughly equal sets of rows called shards, which are distributed across a number of shard data servers. For example, if a table with 100 million rows is partitioned across four shard data servers, each stores a shard containing about 25 million rows. Queries against a sharded table are decomposed into multiple shard-local queries to be run in parallel on multiple servers; the results are then transparently combined and returned to the user. This distributed data layout can further be exploited for parallel data loading and with third party frameworks like Apache Spark.

Horizontal clusters require greater attention to the networking component to ensure that it provides sufficient bandwidth for the multiple systems involved and is entirely transparent to the user and the application.

Q4. How can you simultaneously processes both transactional and analytic workloads in a single database?

Simon Player: At the core of InterSystems IRIS is a proven, enterprise-grade, distributed, hybrid transactional-analytic processing (HTAP) database. It can ingest and store transactional data at very high rates while simultaneously processing high volumes of analytic workloads on real-time data (including ACID-compliant transactional data) and non-real-time data. This architecture eliminates the delays associated with moving real-time data to a different environment for analytic processing. InterSystems IRIS is built on a distributed architecture to support large data volumes, enabling organizations to analyze very large data sets while simultaneously processing large amounts of real-time transactional data.

Q5. There are a wide range of analytics, including business intelligence, predictive analytics, distributed big data processing, real-time analytics, and machine learning. How do you support them in the InterSystems IRIS  Data Platform?

Simon Player: Many of these capabilities are built into the platform itself and leverage that tight integration to simultaneously processes both transactional and analytic workloads; however, we realize that there are multiple use cases where customers and partners would like InterSystems IRIS Data Platform to access data on other systems or to build solutions that leverage best-of-breed tools (such as ML algorithms, Spark etc.) to complement our platform and quickly access data stored on it.
That’s why we chose to provide open analytics capabilities supporting industry standard APIs such as UIMA, Java Integration, xDBC and other connectivity options.

Q6. What about third-party analytics tools? 

Simon Player:  The InterSystems IRIS Data Platform offers embedded analytics capabilities such as business intelligence, distributed big data processing & natural language processing, which can handle both structured and unstructured data with ease. It is designed as an Open Analytics Platform, built around a universal, high-performance and highly scalable data store.
Third-party analytics tools can access data stored on the platform via standard APIs including ODBC, JDBC, .NET, SOAP, REST, and the new Apache Spark Connector. In addition, the platform supports working with industry-standard analytical artifacts such as predictive models expressed in PMML and unstructured data processing components adhering to the UIMA standard.

Q7. How does InterSystems IRIS Data Platform integrate into existing infrastructures and with existing best-of-breed technologies (including your own products)?

Simon Player:  InterSystems IRIS offers a powerful, flexible integration technology that enables you to eliminate “siloed” data by connecting people, processes, and applications. It includes the comprehensive range of technologies needed for any connectivity task.
InterSystems IRIS can connect to your existing data and applications, enabling you to leverage your investment, rather than “ripping and replacing.” With its flexible connectivity capabilities, solutions based on InterSystems IRIS can easily be deployed in any client environment.

Built-in support for standard APIs enables solutions based on InterSystems IRIS to leverage applications that use Java, .NET, JavaScript, and many other languages. Support for popular data formats, including JSON, XML, and more, cuts down time to connect to other systems.

A comprehensive library of adapters provides out-of-the-box connectivity and data transformations for packaged applications, databases, industry standards, protocols, and technologies – including SQL, SOAP, REST, HTTP, FTP, SAP, TCP, LDAP, Pipe, Telnet, and Email.

Object inheritance minimizes the effort required to build any needed custom adapters. Using InterSystems IRIS’ unit testing service, custom adapters can be tested without first having to complete the entire solution. Traceability of each event allows efficient analysis and debugging.

The InterSystems IRIS messaging engine offers guaranteed message delivery, content-based routing, high-performance message transformation, and support for both synchronous and asynchronous interactions. InterSystems IRIS has a graphical editor for business process orchestration, a business rules engine, and a workflow editor that enable you to automate your enterprise-wide business procedures or create new composite applications. With world-class support for XML, SOAP, JSON and REST, InterSystems

IRIS is ideal for creating an Enterprise Service Bus (ESB) or employing a Service-Oriented Architecture (SOA).

Because it includes a high performance transactional-analytic database, InterSystems IRIS can store and analyze messages as they flow through your system. It enables business activity monitoring, alerting, real-time business intelligence, and event processing.

· Other integration point with industry standards or best-of-breed technologies include the ability to easily transport files between client machines and the server in a secure via our Managed File Transfer (MFT) capability. This functionality leverages state-of-the-art MFT providers like Box, Dropbox and KiteWorks to provide a simple client that non-technical users can install and companies can pre-configure and brand. InterSystems IRIS connects with these providers as a peer and exposes common APIs (e.g. to manage users)

· When using Apache Spark for large distributed data processing and analytics tasks, the Spark Connector will leverage the distributed data layout of sharded tables and push computation as close to the data as possible, increasing parallelism and thus overall throughput significantly vs regular JDBC connections.

Q8. What market segments do you address with IRIS  Data Platform?

Helene Lengler: InterSystems IRIS is an open platform that suits virtually any industry, but we will be initially focusing on a couple of core market segments, primarily due to varying regional demand. For instance, we will concentrate on the financial services industry in the US or UK and the retail and logistics market in the DACH and Benelux regions. Additionally, in Germany and Japan, our major focus will be on the manufacturing industry, where we see a rapidly growing demand for data-driven solutions, especially in the areas of predictive maintenance and predictive analytics.
We are convinced that InterSystems IRIS is ideal for this and also for other kinds of IoT applications with its ability to handle large-scale transactional and analytic workloads On top of this, we are also looking to engage with companies that are at the very beginning of product development – in other words, start-ups and innovators working on solutions that require a robust, future-proof data platform.

Q9. Are there any proof of concepts available? 

Helene Lengler: Yes. Although the solution has only been available to selected partners for a couple of weeks, we have already completed the first successful migration in Germany. A partner that is offering an Enterprise Information Management System, which allows organizations to archive and access all of an organization’s data, documents, emails and paper files has been able to migrate from InterSystems Caché to InterSystems IRIS in as little as a couple of hours and – most importantly – without any issues at all. The partner decided to move to InterSystems IRIS because they are in the process of signing a contract with one of the biggest players in the German travel & transport industry. With customers like this, you are looking at data volumes in the Petabyte range very, very shortly, meaning you require the right technology from the start in order to be able to scale horizontally – using the InterSystems IRIS technologies such as sharding – as well as vertically.

In addition, we were able to show a live IoT demonstrator at our InterSystems DACH Symposium in November 2017. This proof of concept is actually a lighthouse example of what the new platform’s brings to the table: A team of three different business partners and InterSystems experts leveraged InterSystems IRIS’ capabilities to rapidly develop and implement a fully functional solution for a predictive maintenance scenario. Numerous other test scenarios and PoC’s are currently being conducted in various industry segments with different partners around the globe.

Q10. Can developers already use InterSystems IRIS Data Platform? 

Simon Player: Yes. Starting on 1/31, developers can use our sandbox, the InterSystems IRIS Experience, at www.intersystems.com/experience.

Qx. Anything else you wish to add?

Simon Player: The public is welcome to join the discussion on how to graduate from database to data platform on our developer community at https://community.intersystems.com.

——————————–
imgres
Simon Player is director of development for both TrakCare and Data Platforms at InterSystems. Simon has used and developed on InterSystems technologies since the early 1990s. He holds a BSc in Computer Sciences from the University of Manchester.

Lengler,Helene-658-web

Helene Lengler is the Regional Managing Director for the DACH and Benelux regions. She joined InterSystems in July 2016 and has more than 25 years of experience in the software technology industry. During her professional career, she has held various senior positions at Oracle, including Vice President (VP) Sales Fusion Middleware and member of the executive board at Oracle Germany, VP Enterprise Sales and VP of Oracle Direct. Prior to her 16 years at Oracle, she worked for the Digital Equipment Corporation in several business disciplines such as sales, marketing and presales.
Helene holds a Masters degree from the Julius-Maximilians-University in Würzburg and a post-graduate Business Administration degree from AKAD in Pinneberg.

imgres-1
Joe Lichtenberg is responsible for product and industry marketing for data platform software at InterSystems. Joe has decades of experience working with various data management, analytics, and cloud computing technology providers.

Resources

InterSystems IRIS Data Platform, Product Page.

E-Book (IDC): Slow Data Kills Business.

White Paper (ESG): Building Smarter, Faster, and Scalable Data-rich Applications for Businesses that Operate in Real Time. 

Achieving Horizontal Scalability, Alain Houf – Sales Engineer, InterSystems

Horizontal Scalability with InterSystems IRIS

Press release:InterSystems IRIS Data Platform™ Now Available.

Related Posts

Facing the Challenges of Real-Time Analytics. Interview with David Flower. Source: ODBMS Industry Watch,Published on 2017-12-19

On the future of Data Warehousing. Interview with Jacque Istok and Mike Waas. Source: ODBMS Industry Watch,Published on 2017-11-09

On Vertica and the new combined Micro Focus company. Interview with Colin Mahony. Source: ODBMS Industry Watch, Published on 2017-10-25

On Open Source Databases. Interview with Peter Zaitsev Source: ODBMS Industry Watch, Published on 2017-09-06

Follow up on Twitter: @odbsmorg

##

]]>
http://www.odbms.org/blog/2018/02/on-the-intersystems-iris-data-platform/feed/ 0
On Technology Innovation, AI and IoT. Interview with Philippe Kahn http://www.odbms.org/blog/2018/01/on-technology-innovation-ai-and-iot-interview-with-philippe-kahn/ http://www.odbms.org/blog/2018/01/on-technology-innovation-ai-and-iot-interview-with-philippe-kahn/#comments Sat, 27 Jan 2018 18:59:01 +0000 http://www.odbms.org/blog/?p=4556

“There is a lot of hype about the dangers of IoT and AI. It’s important to understand that nobody is building Blade-Runner style replicants.” — Philippe Kahn

I have interviewed Philippe Kahn. Philippe is a mathematician, well known technology innovator, entrepreneur and founder of four technology companies: Fullpower Technologies, LightSurf Technologies, Starfish Software and Borland.

RVZ

Q1. Twenty years ago, you spent about a year working on a Web-based infrastructure that you called Picture Mail. Picture Mail would do what we now call photo “sharing”. How come it took so long before the introduction of the iPhone, Snapchat, Instagram, Facebook Live and co.?

Philippe Kahn: Technology adoption takes time. We designed a system where a picture would be stored once and a link-back would be sent as a notification to thousands. That’s how Facebook and others function today. At the time necessity created function because for wireless devices and the first Camera-Phones/Cellphone-Cameras the bandwidth on cellular networks was 1200 Baud at most and very costly. Today a picture or a video are shared once on Facebook and millions/billions can be notified. It’s exactly the same approach.

Q2. Do you have any explanation why established companies such as Kodak, Polaroid, and other camera companies (they all had wireless camera projects at that time), could not imagine that the future was digital photography inside the phone?

Philippe Kahn: Yes, I met with all of them. Proposed our solution to no avail. They had an established business and thought that it would never go away and they could wait. They totally missed the paradigm shift. Paradigm shifts are challenges for any established player, look at the demise of Nokia for missing the smartphone.

Q3. What is your take on Citizen journalism?

Philippe Kahn: Citizen journalism is one of the pillars of future democracy. There is always someone snapping and pushing forward a different point of view. We see it every day around the world.

Q4. Do you really believe that people can’t hide things anymore?

Philippe Kahn: I think that people can’t hide what they do in public: Brutality, Generosity, Politics, Emotions. We all have a right to privacy. However in public, there is always someone snapping.

Q5. What about fake news?

Philippe Kahn: There is nothing new about Fake News. It’s always been around. What’s new is that with the web omnipresent, it’s much more effective. Add modern powerful editing and publishing tools and sometimes it’s very challenging to differentiate what’s real from what’s fake.

Q6. You told Bob Parks, who interviewed you for a Wired article in 2000: ‘In the future people will document crimes using video on their phones. Then everyone will know the real story.’ Has this really changed our world?

Philippe Kahn: Yes, it has. It’s forced policing for example to re-examine protocols. Of course not every violence or crime is covered, but video and photos are helping victims.

Q7. What are the challenges and opportunities in country like Africa, where people don’t have laptops, but have phones with cameras?

Philippe Kahn: The opportunities are great. Those countries are skipping the laptop and focusing on a Smartphone with a cloud infrastructure. That’s pretty much what I do daily. In fact, this is what I am doing as I am answering these questions.

Q8. Back to the future: you live now in the world of massive firehouses of machine data and AI driven algorithms. How these new technologies will change the world (for the better or the worst)?

Philippe Kahn: There are always two sides to everything: Even shoes can be used to keep me warm or march fascist armies across illegitimately conquered territories. The dangers of AI lie in police states and in a massive focus on an advertising business model. But what we do with AI is helping us find solutions for better sleep, diabetes, high blood pressure, cancer and more. We need to accept one to get the other in some ways.

Q9. In my recent interview with interview Vinton G. Cerf , he expressed great concerns about the safety, security and privacy of IoT devices. He told me “A particularly bad scenario would have a hacker taking over the operating system of 100,000 refrigerators.”

Philippe Kahn: When we build AI-powered IoT solutions at Fullpower, security and privacy are paramount. We follow the strictest protocols. Security and privacy are at risk every day with computer viruses and hacking. Nothing is new. It’s always a game of cat and mouse. I want to believe that we are a great cat. We work hard at it.

Q10. With your new startup, FullPower Technologies, you have developed under-the-mattress sensors and cloud based artificial intelligence to gather data and personalize recommendations to help customers improve their sleep. What do you think of Cerf´s concerns and how can they be mitigated in practice?

Philippe Kahn: Vince’s concerns are legitimate. At Fullpower our privacy, security and anonymity protocols are our #1 focus together with quality, accuracy, reliability and repeatability. We think of what we build as a fortress. We’ve built in security, privacy, preventive maintenance, automated secure trouble shooting.

Qx Anything else you wish to add?

Philippe Kahn: There is a lot of hype about the dangers of IoT and AI. It’s important to understand that nobody is building Blade-Runner style replicants. AI is very good at solving specialized challenges: Like being the best at playing chess, where the rules are clear and simple. AI can’t deal with general purpose intelligence that is necessary for a living creature to prosper. We are all using AI, Machine Learning, Deep Learning, Supervised Learning for simple and useful solutions.

———————-
philippe-kahn-costa-report

Philippe Kahn is CEO of Fullpower, the creative team behind the AI-powered Sleeptracker IoT Smartbed technology platform and the MotionX Wearable Technology platform. Philippe is a mathematician, scientist, inventor, and the creator of the camera phone, which original 1997 implementation is now with the Smithsonian in Washington, D.C.

Resources

SleepTracker

MotionX

Fullpower

Related Posts

– Internet of Things: Safety, Security and Privacy. Interview with Vint G. CerfODBMS Industry Watch, 2017-06-11

– On Artificial Intelligence and Analytics. Interview with Narendra Mulani, ODBMS Industry Watch, 2017-12-08

Follow us on Twitter: @odbmsorg

#

]]>
http://www.odbms.org/blog/2018/01/on-technology-innovation-ai-and-iot-interview-with-philippe-kahn/feed/ 0
Facing the Challenges of Real-Time Analytics. Interview with David Flower http://www.odbms.org/blog/2017/12/facing-the-challenges-of-real-time-analytics-interview-with-david-flower/ http://www.odbms.org/blog/2017/12/facing-the-challenges-of-real-time-analytics-interview-with-david-flower/#comments Tue, 19 Dec 2017 19:24:11 +0000 http://www.odbms.org/blog/?p=4534

“We are now seeing a number of our customers in financial services adopt a real-time approach to detecting and preventing fraudulent credit card transactions. With the use of ML integrating into the real-time rules engine within VoltDB, the transaction can be monitored, validated and either rejected or passed, before being completed, saving time and money for both the financial institution and the consumer.”–David Flower.

I have interviewed David Flower, President and Chief Executive Officer of VoltDB. We discussed his strategy for VoltDB,  and the main data challenges enterprises face nowadays in performing real-time analytics.

RVZ

Q1. You joined VoltDB as Chief Revenue Officer last year, and since March 29, 2017 you have been appointment to the role of President and Chief Executive Officer. What is your strategy for VoltDB?

David Flower : When I joined the company we took a step back to really understand our business and move from the start-up phase to growth stage. As with all organizations, you learn from what you have achieved but you also have to be honest with what your value is. We looked at 3 fundamentals;
1) Success in our customer base – industries, use cases, geography
2) Market dynamics
3) Core product DNA – the underlying strengths of our solution, over and above any other product in the market

The outcome of this exercise is we have moved from a generic veneer market approach to a highly focused specialized business with deep domain knowledge. As with any business, you are looking for repeatability into clearly defined and understood market sectors, and this is the natural next phase in our business evolution and I am very pleased to report that we have made significant progress to date.

With the growing demand for massive data management aligned with real-time decision making, VoltDB is well positioned to take advantage of this opportunity.

Q2. VoltDB is not the only in-memory transactional database in the market. What is your unique selling proposition and how do you position VoltDB in the broader database market?

David Flower : The advantage of operating in the database market is the pure size and scale that it offers – and that is also the disadvantage. You have to be able to express your target value. Through our customers and the strategic review we undertook, we are now able to express more clearly what value we have and where, and equally importantly, where we do not play! Our USP’s revolve around our product principles – vast data ingestion scale, full ACID consistency and the ability to undertake real-time decisioning, all supported through a distributed low-latency in-memory architecture, and we embrace traditional RDBMS through SQL to leverage existing market skills, and reduce the associated cost of change. We offer a proven enterprise grade database that is used by some of the World’s leading and demanding brands, a fact that many other companies in our market are unable to do.

Q3. VoltDB was founded in 2009 by a team of database experts, including Dr. Michael Stonebraker (winner of the ACM Turing award). How much of Stonebraker`s ideas are still in VoltDB and what is new?

David Flower : We are both proud and privileged to be associated with Dr. Stonebraker, and his stature in the database arena is without comparison. Mike’s original ideas underpin our product philosophy and our future direction, and he continues to be actively engaged in the business and will always remain a fundamental part of our heritage. Through our internal engineering experts and in conjunction with our customers, we have developed on Mike’s original ideas to bring additional features, functions and enterprise grade capabilities into the product.

Q4. Stonebraker co-founded several other database companies. Before VoltDB, in 2005, Stonebraker co-founded Vertica to commercialize the technology behind C-Store; and after VoltDB, in 2013 he co-founded another company called Tamr. Is there any relationship between Vertica, VoltDB and Tamr (if any)?

David Flower : Mike’s legacy in this field speaks for itself. VoltDB evolved from the Vertica business and while we have no formal ties, we are actively engaged with numerous leading technology companies that enable clients to gain deeper value through close integrations.

Q5. VoltDB is a ground-up redesign of a relational database. What are the main data challenges enterprises face nowadays in performing real-time analytics?

The demand for ‘real-time’ is one of the most challenging areas for many businesses today. Firstly, the definition of real-time is changing. Batch or micro-batch processing is now unacceptable – whether that be for the consumer, customer and in some cases for compliance. Secondly, analytics is also moving from the back-end (post event) to the front-end (in-event or in-process).
The drivers around AI and ML are forcing this even more. The market requirement is now for real-time analytics but what is the value of this if you cannot act on it? This is where VoltDB excels – we enable the action on this data, in process, and when the data/time is most valuable. VoltDB is able to truly deliver on the value of translytics – the combination of real-time transactions with real-time analytics, and we can demonstrate this through real use cases.

Q6. VoltDB is specialized in high-velocity applications that thrive on fast streaming data. What is fast streaming data and why does it matter?

David Flower : As previously mentioned, VoltDB is designed for high volume data streams that require a decision to be taken ‘in-stream’ and is always consistent. Fast streaming data is best defined through real applications – policy management, authentication, billing as examples in telecoms; fraud detection & prevention in finance (such as massive credit card processing streams); customer engagement offerings in media & gaming; and areas such as smart-metering in IoT.
The underlying principle being that the window of opportunity (action) is available in the fast data stream process, and once passed the opportunity value diminishes.

Q7. You have recently announced an “Enterprise Lab Program” to accelerate the impact of real-time data analysis at large enterprise organizations. What is it and how does it work?

David Flower : The objective of the Enterprise Lab Program is to enable organizations to access, test and evaluate our enterprise solution within their own environment and determine the applicability of VoltDB for either the modernization of existing applications or for the support of next gen applications. This comes without restriction, and provides full access to our support, technical consultants and engineering resources. We realize that selecting a database is a major decision and we want to ensure the potential of our product can be fully understood, tested and piloted with access to all our core assets.

Q8. You have been quoted saying that “Fraud is a huge problem on the Internet, and is one of the most scalable cybercrimes on the web today. The only way to negate the impact of fraud is to catch it before a transaction is processed”. Is this really always possible? How do you detect a fraud in practice?

David Flower : With the phenomenal growth in e-commerce and the changing consumer demands for web-driven retailing, the concerns relating to fraud (credit card) are only going to increase. The internet creates the challenge of handling massive transaction volumes, and cyber criminals are becoming ever more sophisticated in their approach.
Traditional fraud models simply were not designed to manage at this scale, and in many cases post-transaction capture is too late – the damage has been done. We are now seeing a number of our customers in financial services adopt a real-time approach to detecting and preventing fraudulent credit card transactions. With the use of ML integrating into the real-time rules engine within VoltDB, the transaction can be monitored, validated and either rejected or passed, before being completed, saving time and money for both the financial institution and the consumer. By using the combination of post- analytics and ML, the most relevant, current and effective set of rules can be applied as the transaction is processed.

Q9. Another area where VoltDB is used is in mobile gaming. What are the main data challenges with mobile gaming platforms?

David Flower : Mobile gaming is a perfect example of fast data – large data streams that require real-time decisioning for in-game customer engagement. The consumer wants the personal interaction but with relevant offers at that precise moment in the game. VoltDB is able to support this demand, at scale and based on the individual’s profile and stage in the application/game. The concept of the right offer, to the right person, at the right time ensures that the user remains loyal to the game and the game developer (company) can maximize its revenue potential through high customer satisfaction levels.

Q11. Can you explain the purpose of VoltDB`s recently announced co-operations with Huawei and Nokia?

David Flower : We have developed close OEM relationships with a number of major global clients, of which Huawei and Nokia are representative. Our aim is to be more than a traditional vendor, and bring additional value to the table, be it in the form of technical innovation, through advanced application development, or in terms of our ‘total company’ support philosophy. We also recognize that infrastructure decisions are critical by nature, and are not made for the short-term.
VoltDB has been rigorously tested by both Huawei and Nokia and was selected for several reasons against some of the world’s leading technologies, but fundamentally because our product works – and works in the most demanding environments providing the capability for existing and next-generation enterprise grade applications.

—————
David-Flower Headshot

David Flower brings more than 28 years of experience within the IT industry to the role of President and CEO of VoltDB. David has a track record of building significant shareholder value across multiple software sectors on a global scale through the development and execution of focused strategic plans, organizational development and product leadership.

Before joining VoltDB, David served as Vice President EMEA for Carbon Black Inc. Prior to Carbon Black he held senior executive positions in numerous successful software companies including Senior Vice President International for Everbridge (NASDAQ: EVBG); Vice President EMEA (APM division) for Compuware (formerly NASDAQ: CPWR); and UK Managing Director and Vice President EMEA for Gomez. David also held the position of Group Vice President International for MapInfo Corp. He began his career in senior management roles at Lotus Development Corp and Xerox Corp – Software Division.

David attended Oxford Brookes University where he studied Finance. David retains strong links within the venture capital investment community.

Resources

– eBook: Fast Data Use Cases for Telecommunications. Ciara Byrne  2017 O’Reilly Media. ( LINK to .PDF (registration required)

– Fast Data Pipeline Design: Updating Per-Event Decisions by Swapping Tables.  July 11, 2017 BY JOHN PIEKOS, VoltDB

– VoltDB Extends Open Source Capabilities for Development of Real-Time Applications · OCTOBER 24, 2017

– New VoltDB Study Reveals Business and Psychological Impact of Waiting · OCTOBER 11, 2017

– VoltDB Accelerates Access to Translytical Database with Enterprise Lab Program · SEPTEMBER 29, 2017

Related Posts

– On Artificial Intelligence and Analytics. Interview with Narendra Mulani. ODBMS Industry Watch, December 8, 2017

 Internet of Things: Safety, Security and Privacy. Interview with Vint G. Cerf, ODBMS Indutry Watch, June 11, 2017

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2017/12/facing-the-challenges-of-real-time-analytics-interview-with-david-flower/feed/ 0
On Apache Ignite, Apache Spark and MySQL. Interview with Nikita Ivanov http://www.odbms.org/blog/2017/06/on-apache-ignite-apache-spark-and-mysql-interview-with-nikita-ivanov/ http://www.odbms.org/blog/2017/06/on-apache-ignite-apache-spark-and-mysql-interview-with-nikita-ivanov/#comments Fri, 30 Jun 2017 13:40:51 +0000 http://www.odbms.org/blog/?p=4369

“Spark and Ignite can complement each other very well. Ignite can provide shared storage for Spark so state can be passed from one Spark application or job to another. Ignite can also be used to provide distributed SQL with indexing that accelerates Spark SQL by up to 1,000x.”–Nikita Ivanov.

I have interviewed Nikita Ivanov,CTO of GridGain.
Main topics of the interview are Apache Ignite, Apache Spark and MySQL, and how well they perform on big data analytics.

RVZ

Q1. What are the main technical challenges of SaaS development projects?

Nikita Ivanov: SaaS requires that the applications be highly responsive, reliable and web-scale. SaaS development projects face many of the same challenges as software development projects including a need for stability, reliability, security, scalability, and speed. Speed is especially critical for modern businesses undergoing the digital transformation to deliver real-time services to their end users. These challenges are amplified for SaaS solutions which may have hundreds, thousands, or tens of thousands of concurrent users, far more than an on-premise deployment of enterprise software.
Fortunately, in-memory computing offers SaaS developers solutions to the challenges of speed, scale and reliability.

Q2. In your opinion, what are the limitations of MySQL® when it comes to big data analytics?

Nikita Ivanov: MySQL was originally designed as a single-node system and not with the modern data center concept in mind. MySQL installations cannot scale to accommodate big data using MySQL on a single node. Instead, MySQL must rely on sharding, or splitting a data set over multiple nodes or instances, to manage large data sets. However, most companies manually shard their database, making the creation and maintenance of their application much more complex. Manually creating an application that can then perform cross-node SQL queries on the sharded data multiplies the level of complexity and cost.

MySQL was also not designed to run complicated queries against massive data sets. MySQL optimizer is quite limited, executing a single query at a time using a single thread. A MySQL query can neither scale among multiple CPU cores in a single system nor execute distributed queries across multiple nodes.

Q3. What solutions exist to enhance MySQL’s capabilities for big data analytics?

Nikita Ivanov: For companies which require real-time analytics, they may attempt to manually shard their database. Tools such as Vitess, a framework YouTube released for MySQL sharding, or ProxySQL are often used to help implement sharding.
To speed up queries, caching solutions such as Memcached and Redis are often deployed.

Many companies turn to data warehousing technologies. These solutions require ETL processes and a separate technology stack which must be deployed and managed. There are many external solutions, such as Hadoop and Apache Spark, which are quite popular. Vertica and ClickHouse have also emerged as analytics solutions for MySQL.

Apache Ignite offers speed, scale and reliability because it was built from the ground up as a high performant and highly scalable distributed in-memory computing platform.
In contrast to the MySQL single-node design, Apache Ignite automatically distributes data across nodes in a cluster eliminating the need for manual sharding. The cluster can be deployed on-premise, in the cloud, or in a hybrid environment. Apache Ignite easily integrates with Hadoop and Spark, using in-memory technology to complement these technologies and achieve significantly better performance and scale. The Apache Ignite In-Memory SQL Grid is highly optimized and easily tuned to execute high performance ANSI-99 SQL queries. The In-Memory SQL Grid offer access via JDBC/ODBC and the Ignite SQL API for external SQL commands or integration with analytics visualization software such as Tableau.

Q4. What is exactly Apache® Ignite™?

Nikita Ivanov: Apache Ignite is a high-performance, distributed in-memory platform for computing and transacting on large-scale data sets in real-time. It is 1,000x faster than systems built using traditional database technologies that are based on disk or flash technologies. It can also scale out to manage petabytes of data in memory.

Apache Ignite includes the following functionality:

· Data grid – An in-memory key value data cache that can be queried

· SQL grid – Provides the ability to interact with data in-memory using ANSI SQL-99 via JDBC or ODBC APIs

· Compute grid – A stateless grid that provides high-performance computation in memory using clusters of computers and massive parallel processing

· Service grid – A service grid in which grid service instances are deployed across the distributed data and compute grids

· Streaming analytics – The ability to consume an endless stream of information and process it in real-time

· Advanced clustering – The ability to automatically discover nodes, eliminating the need to restart the entire cluster when adding new nodes

Q5. How Apache Ignite differs from other in-memory data platforms?

Nikita Ivanov: Most in-memory computing solutions fall into one of three types: in-memory data grids, in-memory databases, or a streaming analytics engine.
Apache Ignite is a full-featured in-memory computing platform which includes an in-memory data grid, in-memory database capabilities, and a streaming analytics engine. Furthermore, Apache Ignite supports distributed ACID compliant transactions and ANSI SQL-99 including support for DML and DDL via JDBC/ODBC.

Q6. Can you use Apache® Ignite™ for Real-Time Processing of IoT-Generated Streaming Data?

Nikita Ivanov: Yes, Apache Ignite can ingest and analyze streaming data using its streaming analytics engine which is built on a high-performance and scalable distributed architecture. Because Apache Ignite natively integrates with Apache Spark, it is also possible to deploy Spark for machine learning at in-memory computing speeds.
Apache Ignite supports both high volume OLTP and OLAP use cases, supporting Hybrid Transactional Analytical Processing (HTAP) use cases, while achieving performance gains of 1000x or greater over systems which are built on disk-based databases.

Q7. How do you stream data to an Apache Ignite cluster from embedded devices?

Nikita Ivanov: It is very easy to stream data to an Apache Ignite cluster from embedded devices.
The Apache Ignite streaming functionality allows for processing never-ending streams of data from embedded devices in a scalable and fault-tolerant manner. Apache Ignite can handle millions of events per second on a moderately sized cluster for embedded devices generating massive amounts of data.

Q8. Is this different then using Apache Kafka?

Nikita Ivanov: Apache Kafka is a distributed streaming platform that lets you publish and subscribe to data streams. Kafka is most commonly used to build a real-time streaming data pipeline that reliably transfers data between applications. This is very different from Apache Ignite, which is designed to ingest, process, analyze and store streaming data.

Q9. How do you conduct real-time data processing on this stream using Apache Ignite?

Nikita Ivanov: Apache Ignite includes a connector for Apache Kafka so it is easy to connect Apache Kafka and Apache Ignite. Developers can either push data from Kafka directly into Ignite’s in-memory data cache or present the streaming data to Ignite’s streaming module where it can be analyzed and processed before being stored in memory.
This versatility makes the combination of Apache Kafka and Apache Ignite very powerful for real-time processing of streaming data.

Q10. Is this different then using Spark Streaming?

Nikita Ivanov: Spark Streaming enables processing of live data streams. This is merely one of the capabilities that Apache Ignite supports. Although Apache Spark and Apache Ignite utilize the power of in-memory computing, they address different use cases. Spark processes but doesn’t store data. It loads the data, processes it, then discards it. Ignite, on the other hand, can be used to process data and it also provides a distributed in-memory key-value store with ACID compliant transactions and SQL support.
Spark is also for non-transactional, read-only data while Ignite supports non-transactional and transactional workloads. Finally, Apache Ignite also supports purely computational payloads for HPC and MPP use cases while Spark works only on data-driven payloads.

Spark and Ignite can complement each other very well. Ignite can provide shared storage for Spark so state can be passed from one Spark application or job to another. Ignite can also be used to provide distributed SQL with indexing that accelerates Spark SQL by up to 1,000x.

Qx. Is there anything else you wish to add?

Nikita Ivanov: The world is undergoing a digital transformation which is driving companies to get closer to their customers. This transformation requires that companies move from big data to fast data, the ability to gain real-time insights from massive amounts of incoming data. Whether that data is generated by the Internet of Things (IoT), web-scale applications, or other streaming data sources, companies must put architectures in place to make sense of this river of data. As companies make this transition, they will be moving to memory-first architectures which ingest and process data in-memory before offloading to disk-based datastores and increasingly will be applying machine learning and deep learning to make understand the data. Apache Ignite continues to evolve in directions that will support and extend the abilities of memory-first architectures and machine learning/deep learning systems.

——–
Nikita IvanovFounder & CTO, GridGain,
Nikita Ivanov is founder of Apache Ignite project and CTO of GridGain Systems, started in 2007. Nikita has led GridGain to develop advanced and distributed in-memory data processing technologies – the top Java in-memory data fabric starting every 10 seconds around the world today. Nikita has over 20 years of experience in software application development, building HPC and middleware platforms, contributing to the efforts of other startups and notable companies including Adaptec, Visa and BEA Systems. He is an active member of Java middleware community, contributor to the Java specification. He’s also a frequent international speaker with over two dozen of talks on various developer conferences globally.

Resources

Apache Ignite Community Resources

apache/ignite on GitHub

Yardstick Apache Ignite Benchmarks

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite

Misys Uses GridGain to Enable High Performance, Real-Time Data Processing

The Spark Python API (PySpark)

Related Posts

Supporting the Fast Data Paradigm with Apache Spark. BY Stephen Dillon, Data Architect, Schneider Electric

On the new developments in Apache Spark and Hadoop. Interview with Amr Awadallah. ODBMS Industry Watch,March 13, 2017

Follow ODBMS.org on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2017/06/on-apache-ignite-apache-spark-and-mysql-interview-with-nikita-ivanov/feed/ 0
Internet of Things: Safety, Security and Privacy. Interview with Vint G. Cerf http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/ http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/#comments Sun, 11 Jun 2017 17:06:03 +0000 http://www.odbms.org/blog/?p=4373

” I like the idea behind programmable, communicating devices and I believe there is great potential for useful applications. At the same time, I am extremely concerned about the safety, security and privacy of such devices.” –Vint G. Cerf

I had the pleasure to interview Vinton G. Cerf. Widely known as one of the “Fathers of the Internet,” Cerf is the co-designer of the TCP/IP protocols and the architecture of the Internet. Main topic of the interview is the Internet of Things (IoT) and its challenges, especially the safety, security and privacy of IoT devices.
Vint is currently Chief Internet Evangelist for Google.
RVZ

Q1. Do you like the Internet of Things (IoT)?

Vint Cerf: This question is far too general to answer. I like the idea behind programmable, communicating devices and I believe there is great potential for useful applications. At the same time, I am extremely concerned about the safety, security and privacy of such devices. Penetration and re-purposing of these devices can lead to denial of service attacks (botnets), invasion of privacy, harmful dysfunction, serious security breaches and many other hazards. Consequently the makers and users of such devices have a great deal to be concerned about.

Q2. Who is going to benefit most from the IoT?

Vint Cerf: The makers of the devices will benefit if they become broadly popular and perhaps even mandated to become part of local ecosystem. Think “smart cities” for example. The users of the devices may benefit from their functionality, from the information they provide that can be analyzed and used for decision-making purposes, for example. But see Q1 for concerns.

Q3. One of the most important requirement for collections of IoT devices is that they guarantee physical safety and personal security. What are the challenges from a safety and privacy perspective that the pervasive introduction of sensors and devices pose? (e.g. at home, in cars, hospitals, wearables and ingestible, etc.)

Vint Cerf: Access control and strong authentication of parties authorized to access device information or control planes will be a primary requirement. The devices must be configurable to resist unauthorized access and use. Putting physical limits on the behavior of programmable devices may be needed or at least advisable (e.g., cannot force the device to operate outside of physically limited parameters).

Q5. Consumers want privacy. With IoT physical objects in our everyday lives will increasingly detect and share observations about us. How is it possible to reconcile these two aspects?

Vint Cerf: This is going to be a tough challenge. Videocams that help manage traffic flow may also be used to monitor individuals or vehicles without their permission or knowledge, for example (cf: UK these days). In residential applications, one might want (insist on) the ability to disable the devices manually, for example. One would also want assurances that such disabling cannot be defeated remotely through the software.

Q6. Let`s talk about more about security. It is reported that badly configured “smart devices” might provide a backdoor for hackers. What is your take on this?

Vint Cerf: It depends on how the devices are connected to the rest of the world. A particularly bad scenario would have a hacker taking over the operating system of 100,000 refrigerators. The refrigerator programming could be preserved but the hacker could add any of a variety of other functionality including DDOS capacity, virus/worm/Trojan horse propagation and so on.
One might want the ability to monitor and log the sources and sinks of traffic to/from such devices to expose hacked devices under remote control, for example. This is all a very real concern.

Q7. What measures can be taken to ensure a more “secure” IoT?

Vint Cerf: Hardware to inhibit some kinds of hacking (e.g. through buffer overflows) can help. Digital signatures on bootstrap programs checked by hardware to inhibit boot-time attacks. Validation of software updates as to integrity and origin. Whitelisting of IP addresses and identifiers of end points that are allowed direct interaction with the device.

Q8. Is there a danger that IoT evolves into a possible enabling platform for cyber-criminals and/or for cyber war offenders?

Vint Cerf: There is no question this is already a problem. The DYN Corporation DDOS attack was launched by a botnet of webcams that were readily compromised because they had no access controls or well-known usernames and passwords. This is the reason that companies must feel great responsibility and be provided with strong incentives to limit the potential for abuse of their products.

Q9. What are your personal recommendations for a research agenda and policy agenda based on advances in the Internet of Things?

Vint Cerf: Better hardware reinforcement of access control and use of the IOT computational assets. Better quality software development environments to expose vulnerabilities before they are released into the wild. Better software update regimes that reduce barriers to and facilitate regular bug fixing.

Q10. The IoT is still very much a work in progress. How do you see the IoT evolving in the near future?

Vint Cerf: Chaotic “standardization” with many incompatible products on the market. Many abuses by hackers. Many stories of bugs being exploited or serious damaging consequences of malfunctions. Many cases of “one device, one app” that will become unwieldy over time. Dramatic and positive cases of medical monitoring that prevents serious medical harms or signals imminent dangers. Many experiments with smart cities and widespread sensor systems.
Many applications of machine learning and artificial intelligence associated with IOT devices and the data they generate. Slow progress on common standards.

—————
Google-HS-9-2008
Vinton G. Cerf co-designed the TCP/IP protocols and the architecture of the Internet and is Chief Internet Evangelist for Google. He is a member of the National Science Board and National Academy of Engineering and Foreign Member of the British Royal Society and Swedish Royal Academy of Engineering, and Fellow of ACM, IEEE, AAAS, and BCS.
Cerf received the US Presidential Medal of Freedom, US National Medal of Technology, Queen Elizabeth Prize for Engineering, Prince of Asturias Award, Japan Prize, ACM Turing Award, Legion d’Honneur and 29 honorary degrees.

Resources

European Commission, Internet of Things Privacy & Security Workshop’s Report,10/04/2017

Securing the Internet of Things. US Homeland Security, November 16, 2016

Related Posts

Social and Ethical Behavior in the Internet of Things By Francine Berman, Vinton G. Cerf. Communications of the ACM, Vol. 60 No. 2, Pages 6-7, February 2017

Security in the Internet of Things, McKinsey & Company,May 2017

Interview to Vinton G. Cerf. ODBMS Industry Watch, July 27, 2009

Five Challenges to IoT Analytics Success. By Dr. Srinath Perera. ODBMS.org, September 23, 2016

Follow us on Twitter: @odbsmorg

##

]]>
http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/feed/ 0
Democratizing the use of massive data sets. Interview with Dave Thomas. http://www.odbms.org/blog/2016/09/democratizing-the-use-of-massive-data-sets-interview-with-dave-thomas/ http://www.odbms.org/blog/2016/09/democratizing-the-use-of-massive-data-sets-interview-with-dave-thomas/#comments Mon, 12 Sep 2016 19:04:14 +0000 http://www.odbms.org/blog/?p=4234

“Any important data driving a business decision needs to be sanity checked, just as it would if one was using a spreadsheet.”–Dave Thomas.

I have interviewed Dave Thomas,Chief Scientist at Kx Labs.

RVZ

Q1. For many years business users have had their data locked up in databases and data warehouses. What is wrong with that?

Dave Thomas: It isn’t so much an issue of where the data resides, whether it is in files, databases, data warehouses or a modern data lake. The challenge is that modern businesses need access to the raw data, as well as the ability to rapidly aggregate and analyze their data.

Q2. Typical business intelligence (BI) tool users have never seen their actual data. Why?

Dave Thomas: For large corporations hardware and software both used to be prohibitively expensive, hence much of their data was aggregated prior to making it available to users. Even today when machines are very inexpensive most corporate IT infrastructures are impoverished relative to what one can buy on the street or in the Cloud.
Compounding the problem, IT charge-back mechanisms are biased to reduce IT spending rather than to maximize the value of data delivered to the business.
Traditional technologies are not sufficiently performant to allow processing of large volumes of data.
Many companies have inexpensive data lakes and have realized after the fact that using a commodity storage systems, such as HDFS, has severely constrained their performance and limited their utility. Hence more corporations are moving data away from HDFS into high-performance storage or memory.

Q3. What are the limitations of the existing BI and extract, transform and load (ETL) data tools?

Dave Thomas: Traditional BI tools assume that it is possible for DBAs and BI experts to a priori define the best way to structure and query the data. This reduces the whole power of BI to mere reporting. In an attempt to deal with huge BI backlogs, generic query and reporting tools have become popular to shift reporting to self-serve. However, they are often designed for sophisticated BI users rather than for normal business users. They are often not performant because they depend on the implementation of the underlying data stores.
For the most part, existing ETL tools are constrained by having to move the data to the ETL process and then on to the end user. Many ETL tools only work against one kind of data source. ETL can’t be written by normal users and due to the cost of an incorrect ETL run, such tools are not available to the data analyst. One of the major topics of discussion in Big Data shops is the complexity and performance of their Big Data pipeline. ETL, data blending, shouldn’t be a separate process or product. It should be something one can do with queries in a single efficient data language.

Q4. What are the typical technical challenges in finance, IoT and other time-series applications?

Dave Thomas:
1. Speed, as data volumes and variety are always increasing.
2. Ability to deal with both real-time events and historical events efficiently. Ideally in a single technology.
3. To handle time-series one needs to be able to deal with simultaneous arrival of events. Time with nanosecond precision is our solution. Other solutions are constrained by using milliseconds and event counters that are much less efficient.
4. High-performance operations on time, over days, months and years are essential for time-series. This is why time is a native type in Kx.
5. The essence of time-series is processing sliding time windows of data for both joins and aggregations.
6. In IOT, data is always dirty. Kx’s native support for missing data and out of band data due to failing sensors, allows one to deal with the realities of sensor data.

Q5. Kx offers analysts a language called q. Why not extend standard SQL?

Dave Thomas: I think there is a misunderstanding about q. Q is a full functional data language that both includes and extends SQL. Selects are easier than SQL because they provide implicit joins and group-bys. This makes queries roughly 50% of the code of SQL. Unlike many flavors of SQL, q lets one put a functional expression in any position in an SQL statement. One can easily extend the aggregation operations available to the end-user.

Q6. Can you show the difference between a query written in q and in standard SQL?

Dave Thomas: Here’s an example of retrieving parts from an orders table with a foreign key join to a parts table, summing by quantity and then sorting by color:

q:
select sum qty by p.color from sp

SQL:
select p.color, sum(sp.qty) from sp, p
where sp.p=p.p group by p.color order by color

Q7. How do queries execute inside the database?

Dave Thomas: Q is native to the database engine. Hence queries and analytics execute in the columns of the Kx database. There is no data shipping between the client and database server.

Q8. Shawn Rogers of Dell said: “A ‘citizen data scientist’ is an everyday, non-technical user that lacks the statistical and analytical prowess of a traditional data scientist, but is equally eager to leverage data in order to uncover insights, and importantly, do so at the speed of business.” What is your take on this?

Dave Thomas: High-performance data technologies, such as Kx, using modern large-memory hardware, can support data analysts versus data scientist queries. In the product Analyst for Kx, for example, users can work interactively on a sample of data using visual tools to import, clean, query, transform, analyze and visualize data with minimal, if any programming or even SQL. Given correct operations on one or more samples they then can be run against trillions of rows of data. Data analysts today can truly live in their data.

Q9. What are the risks of bringing the power of analytics to users who are non-expert programmers?

Dave Thomas: Clearly any important analysis needs to be validated and cross-checked. Hence any important data driving a business decision needs to be sanity checked, just as it would if one was using a spreadsheet.
In our experience users do make initial mistakes, but as they live in their data they quickly learn.
Visualization really helps, as does the provision of metadata about the data sources. Reducing the cycle time provides increased understanding, and allows one to make mistakes.
Runaway query performance has been a concern of DBAs, but for many years frameworks have been in place such as our smart query router that will ensure that ad hoc queries against massive datasets are throttled so they don’t run away. Fortunately, recent cost reductions in non-volatile memory make it possible to have high-performance query-only replicas of data that can be made available to different parts of the organization based on its needs.

Q10. How can non-expert programmers understand if the information expressed in visual analytics such as heat maps or in operational dashboard charts, is of good quality or not?

Dave Thomas: In our experience users spot visual anomalies much faster than inconsistencies in a spreadsheet.

Q11. What are the opportunities arising in “democratizing” the use of massive data sets?

Dave Thomas: We are finally living in a world where for many companies it is possible to run a real-time business where everyone can have fast, efficient access to the data they need. Rather than being held hostage to aggregations, spreadsheets and all sorts of variants of the truth, the organization can expediently see new opportunities to improve results in sales, marketing, production and other business operations.

Q12. How important is data query and data semantics?

Dave Thomas: Unfortunately we are not educated on how to express data semantics and data query.
Even computer scientists often study less about writing queries than how to execute them efficiently.
We need to educate students and employees on how to live in their data. It may well be that the future of programming for most will be writing queries. Given powerful data languages even compiler optimizations can be expressed by queries.
We need to invest much more in data governance and the use of standard terminology in order to share data within and across companies.

——————-
Dave Thomas, Kx Labs.
As Chief Scientist Dave envisions the future roadmap for Kx tools. Dave has had a long and storied career in computer software development and is perhaps best known as the founder and past CEO of Object Technology International, formerly OTI, now IBM OTI Labs, a pioneer in Agile Product Development. He was the principal visionary and architect for IBM VisualAge Smalltalk and Java tools and virtual machines including the popular open-source, multi-language Eclipse.org IDE. As the cofounder of Bedarra Research Labs he led the creation of the Ivy visual analytics workbench. Dave is a renowned speaker, university lecturer and Chairman of the Australian developer YOW! conferences.

Resources

New Kx release includes encryption, enhanced compression and Tableau integration. ODBMS.org JULY 4, 2016.

Resources for learning more about kdb+ and q benchmarking results.

Kdb+ and the Internet of Things/Big Data. InDetail Paper by Bloor Research Author: Philip Howard. ODBMS.org- JANUARY 28, 2015

Related Posts

Democratizing fast access to Big Data. By Dave Thomas, chief scientist at Kx Labs. ODBMS.org-April 26, 2016

On Data Governance. Interview with David Saul. ODBMS Industry Watch, Published on 2016-07-23

On the Challenges and Opportunities of IoT. Interview with Steve Graves. ODBMS Industry Watch, Published on 2016-07-06

On Data Analytics and the Enterprise. Interview with Narendra Mulani. ODBMS Industry Watch, Published on 2016-05-24

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2016/09/democratizing-the-use-of-massive-data-sets-interview-with-dave-thomas/feed/ 0
On the Challenges and Opportunities of IoT. Interview with Steve Graves http://www.odbms.org/blog/2016/07/on-the-challenges-and-opportunities-of-iot-interview-with-steve-graves/ http://www.odbms.org/blog/2016/07/on-the-challenges-and-opportunities-of-iot-interview-with-steve-graves/#comments Wed, 06 Jul 2016 09:00:29 +0000 http://www.odbms.org/blog/?p=4172

“Assembling a team with the wide range of skills needed for a successful IoT project presents an entirely different set of challenges. The skills needed to build a ‘thing’ are markedly different than the skills needed to implement the data analytics in the cloud.”–Steve Graves.

I have interviewed, Steve Graves, co-founder and CEO of McObject. Main topic of the interview is the Internet of Things and how it relates to databases.

RVZ

Q1. What are in your opinion the main Challenges and Opportunities of the Internet of Things (IoT) seen from the perspective of a database vendor?

Steve Graves: Let’s start with the opportunities.

When we started McObject in 2001, we chose “eXtremeDB, the embedded database for intelligent, connected devices” as our tagline. eXtremeDB was designed from the get-go to live in the “things” comprising what the industry now calls the Internet of Things. The popularization of this term has created a lot of visibility and, more importantly, excitement and buzz for what was previously viewed as the relatively boring “embedded systems.” And that creates a lot of opportunities.

A lot of really smart, creative people are thinking of innovative ways to improve our health, our workplace, our environment, our infrastructure, and more. That means new opportunities for vendors of every component of the technology stack.
The challenges are manifold, and I can’t begin to address all of them. The media is largely fixated on security, which itself is multi-dimensional.
We can talk about protecting IoT-enabled devices (e.g. your car) from being hacked. We can talk about protecting the privacy of your data at rest. And we can talk about protecting the privacy of data in motion.
Every vendor needs recognize the importance of security. But, it isn’t enough for a vendor, like McObject, to provide the features to secure the target system; the developer that assembles the stack along with their own proprietary technology to create an IoT solution needs to use available security features, and use them correctly.

After security, scaling IoT systems is the next big challenge. It’s easy enough to prototype something.
But careful planning is needed to leap from prototype to full-blown deployment. Obvious decisions have to be made about connectivity and necessary bandwidth, how many things per gateway, one tier of gateways or more, and how much compute capacity is needed in the cloud. Beyond that, there are less obvious decisions to be made that will affect scalability, like making sure the DBMS used on devices and/or gateways is able to handle the workload (e.g. that the gateway DBMS can scale from 10 input streams to 100 input streams); determining how to divide the analytics workload between gateways and the cloud; and ensuring that the gateway, its DBMS and its communication stack can stream data to the cloud while simultaneously processing its own input streams and analytics.
Assembling a team with the wide range of skills needed for a successful IoT project presents an entirely different set of challenges. The skills needed to build a ‘thing’ are markedly different than the skills needed to implement the data analytics in the cloud. In fact, ‘things’ are usually very much like good ol’ embedded systems, and system engineers that know their way around real-time/embedded operating systems, JTAG debuggers, and so on, have always been at a premium.

Q2. Data management for the IoT: What are the main differences between data management in field-deployed devices and at aggregation points?

Steve Graves: Quite simply: scale. A field-deployed device (or a gateway to field-deployed devices that do not, themselves, have any data management need or capability) has to manage a modest amount of data. But an aggregation point (the cloud being the most obvious example) has to manage many times more data – possibly orders of magnitude more.
At the same time, I have to say that they might not be all that different. Some IoT systems are going to be closed, meaning the nature of the things making up the system is known, and these won’t require much scaling. For example, a building automation system for a small- to mid-size building would have perhaps 100s of sensors and 10s of gateways, and may (or may not) push data up to a central aggregation point. If there are just 10s of gateways, we can create a UI that connects to the database on each gateway where each database is one shard of a single logical database, and execute analytics against that logical database without any need of a central aggregation point. We can extend this hypothetical case to a campus of buildings, or to a landlord with many buildings in a metropolitan area, and then a central aggregation point makes sense.

But the database system would not necessarily be different, only the organization of the physical and logical databases.
The gateways of each building would stream to a database server in the cloud. In the case of 10 buildings, we could have 10 database servers in the cloud that represent 10 shards of that logical database in the cloud. This architecture allows for great scalability. The landlord acquires another building? Great, stand up another database server and the UI connects to 11 shards instead of 10. In this scenario, database servers are software, not hardware. For the numbers we’re talking about (10 or 11 buildings), it could easily be handled by a single hardware server of modest ability.

At the other end of the scale (pun intended) are IoT systems that are wide open. By that, I mean the creators are not able to anticipate the universe of “things” that could be connected, or their quantity. In the first case, the database system should be able to ingest data that was heretofore unknown. This argues for a NoSQL database system, i.e. a database system that is schema-less. In this scenario, the database system on field-deployed devices is probably radically different from the database system in the cloud. Field-deployed devices are purpose-specific, so A) they don’t need and wouldn’t benefit from a NoSQL database system, and B) most NoSQL database systems are too resource-hungry to reside on embedded device nodes.

Q3. If we look at the characteristics of a database system for managing device-based data in the IoT, how do they differ from the characteristics of a database system (typically deployed on a server) for analyzing the “big data” generated by myriad devices?

Steve Graves: Again, let’s recognize that field-deployed devices in the IoT are classic embedded systems. In practical terms, that means relatively modest hardware like an ARM, MIPS, PowerPC or Atom processor running at 100s of megahertz, or perhaps 1 ghz if we’re lucky, and with only enough memory to perform its function. Further, it may require a real-time operating system, or at least an embedded operating system that is less resource hungry than a full-on Linux distro. So, for a database system to run in this environment, it will need to have been designed to run in this environment. It isn’t practical to try to shoehorn in a database system that was written on the assumption that CPU cycles and memory are abundant. It may also be the case that the device has little-to-no persistent storage, which mandates an in-memory database.

So a database system for a field-deployed device is going to
1. have a small code size
2. use little stack
3. preferably, allocate no heap memory
4. have no, or minimal, external dependencies (e.g. not link in an extra 1 MB of code from the C run-time library)
5. have built-in ability to replicate data (to a gateway or directly to the cloud)
a. Replication should be “open”, meaning be able to replicate to a different database system
6. Have built-in security features

7. Nice to have:
a. built-in analytics to aggregate data prior to replicating it
b. ability to define the schema
c. ability to operate entirely in memory

A database system for the cloud might benefit from being schema-less, as described previously. It should certainly have pretty elastic scalability. Servers in the cloud are going to have ample resources and robust operating systems. So a database system for the cloud doesn’t need to have a small code size, use a small amount of stack memory, or worry about external dependencies such as the C run-time library. On the contrary, a database system for the cloud is expected to do much more (handle data at scale, execute analytics, etc.) and will, therefore, need ample resources. In fact, this database system should be able to take maximum advantage of the resources available, including being able to scale horizontally (across cores, CPUs, and servers).
In summary, the edge (device-based) DBMS needs to operate in a constrained environment. A cloud DBMS needs to be able to effectively and efficiently utilize the ample resources available to it.

Q4. Why is the ability to define a database schema important (versus a schema-less DBMS, aka NoSQL) for field-deployed devices?

Steve Graves: Field-deployed devices will normally perform a few specific functions (sometimes, just one function). For example, a building automation system manages HVAC, lighting, etc. A livestock management system manages feed, output, and so on. In such systems, the data requirements are well known. The hallmark NoSQL advantage of being able to store data without predefining its structure is unwarranted. The other purported hallmark of NoSQL is horizontal scalability, but this is not a need for field-deployed devices.
Walking away from the relational database model (and its implicit use of a database schema) has serious implications.
A great deal of scientific knowledge has been amassed around the relational database model over the last few decades, and without it developers are completely on their own with respect to enforcing sound data management practices.

In the NoSQL sphere, there is nothing comparable to the relational model (e.g. E.F. Codd’s work) and the mathematical foundation (relational calculus) underpinning it.
There should be overwhelming justification for a decision to not use relational.
In my experience, that justification is absent for data management of field-deployed devices.
A database system that “knows” the data design (via a schema) can more intelligently manage the data. For example, it can manage constraints, domain dependencies, events and much more. And some of the purported inflexibility imposed by a schema can be eliminated if the DBMS supports dynamic DDL (see more details on this in the answer to question Q6, below).

Q5. In your opinion, do IoT aggregation points resemble data lakes?

Steve Graves: The term data lake was originally conceived in the context of Hadoop and map-reduce functionality. In more recent times, the meaning of the term has morphed to become synonymous with big data, and that is how I use the term. Insofar as a gateway can also be an aggregation point, I would not say ‘aggregation points resemble data lakes’ because gateway aggregation points, in all likelihood, will not manage Big Data.

Q6. What are the main technical challenges for database systems used to accommodate new and unforeseen data, for example when a new type of device begins streaming data?

Steve Graves: The obvious challenges are
1. The ability to ingest new data that has a previously unknown structure
2. The ability to execute analytics on #1
3. The ability to integrate analytics on #1 with analytics on previously known data

#1 is handled well by NoSQL DBMSs. But, it might also be handled well by an RDBMS via “dynamic DDL” (dynamic data definition language), e.g. the ability to execute CREATE TABLE, ALTER TABLE, and/or CREATE INDEX statements against an existing database.
To efficiently execute analytics against any data, the structure of the data must eventually be understood.
RDBMS handle this through the database dictionary (the binary equivalent of the data definition language).
But some NoSQL DBMSs handle this through different meta data. For example, the MarkLogic DBMS uses JSON metadata to understand the structure of documents in its document store.
NoSQL DBMSs with no meta data whatsoever put the entire burden on the developers. In other words, since the data is opaque to the DBMS, the application code must read and interpret the content.

Q7. Client/server DBMS architecture vs. in-process DBMSs: which one is more suitable for IoT?

Steve Graves: For edge DBMSs (on constrained devices), an in-process architecture will be more suitable. It requires fewer resources than client/server architecture, and imposes less latency through elimination of inter-process communication. For cloud DBMSs, a client/server architecture will be more suitable. In the cloud environment, resources are not scarce, and the the advantage of being able to scale horizontally will outweigh the added latency associated with client/server.

Qx Anything else you wish to add?

Steve Graves: We feel that eXtremeDB is uniquely positioned for the Internet of Things. Not only have devices and gateways been in eXtremeDB’s wheelhouse for 15 years with over 25 million real world deployments, but the scalability, time series data management, and analytics built into the eXtremeDB server (big data) offering make it an attractive cloud database solution as well. Being able to leverage a single DBMS across devices, gateways and the cloud has obvious synergistic advantages.

———————
Steve Graves is co-founder and CEO of McObject, a company specializing in embedded Database Management System (DBMS) software. Prior to McObject, Steve was president and chairman of Centura Solutions Corporation and vice president of worldwide consulting for Centura Software Corporation.

Resources

Big Data, Analytics, and the Internet of Things, by Mohak Shah, analytics leader and research scientist at Bosch Research, USA.ODBMS.org APRIL 6, 2015

 Privacy considerations & responsibilities in the era of Big Data & Internet of Things, by Ramkumar Ravichandran, Director, Analytics, Visa Inc. ODBMS.org January 8, 2015.

 Securing Your Largest USB-Connected Device: Your Car,BY Shomit Ghose, General Partner, ONSET Ventures, ODBMs.org MARCH 31, 2016.

 eXtremeDB Financial Edition DBMS Sweeps Records in Big Data Benchmark,ODBMS.org JULY 2, 2016

 eXtremeDB in-memory database

 User Experience Design for the Internet of Things

Related Posts

On the Internet of Things. Interview with Colin MahonyODBMS Industry Watch, Published on 2016-03-14

A Grand Tour of Big Data. Interview with Alan MorrisonODBMS Industry Watch, Published on 2016-02-25

On the Industrial Internet of Things. Interview with Leon Guzenda, ODBMS Industry Watch,  January 28, 2016

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2016/07/on-the-challenges-and-opportunities-of-iot-interview-with-steve-graves/feed/ 0
On Data Interoperability. Interview with Julie Lockner. http://www.odbms.org/blog/2016/06/on-data-interoperability-interview-with-julie-lockner/ http://www.odbms.org/blog/2016/06/on-data-interoperability-interview-with-julie-lockner/#comments Tue, 07 Jun 2016 16:47:14 +0000 http://www.odbms.org/blog/?p=4151

“From a healthcare perspective, how can we aggregate all the medical data, in all forms from multiple sources, such as wearables, home medical devices, MRI images, pharmacies and so on, and also blend in intelligence or new data sources, such as genomic data, so that doctors can make better decisions at the point of care?”– Julie Lockner.

I have interviewed Julie Lockner.  Julie leads data platform product marketing for InterSystems. Main topics of the interview are Data Interoperability and InterSystems` data platform strategy.

RVZ

Q1. Everybody is talking about Big Data — is the term obsolete?

Julie Lockner: Well, there is no doubt that the sheer volume of data is exploding, especially with the proliferation of smart devices and the Internet of Things (IoT). An overlooked aspect of IoT is the enormous volume of data generated by a variety devices, and how to connect, integrate and manage it all.

The real challenge, though, is not just processing all that data, but extracting useful insights from the variety of device types. Put another way, not all data is created using a common standard. You want to know how to interpret data from each device, know which data from what type of device is important, and which trends are noteworthy. Better information can create better results when it can be aggregated and analyzed consistently, and that’s what we really care about. Better, higher quality outcomes, not bigger data.

Q2. If not Big Data, where do we go from here?

Julie Lockner: We always want to be focusing on helping our customers build smarter applications to solve real business challenges, such as helping them to better compete on service, roll out high-quality products quicker, simplify processes – not build solutions in search of a problem. A canonical example is in retail. Our customers want to leverage insight from every transaction they process to create a better buying experience online or at the point of sale. This means being able to aggregate information about a customer, analyze what the customer is doing while on the website, and make an offer at transaction time that would delight them. That’s the goal – a better experience – because that is what online consumers expect.

From a healthcare perspective, how can we aggregate all the medical data, in all forms from multiple sources, such as wearables, home medical devices, MRI images, pharmacies and so on, and also blend in intelligence or new data sources, such as genomic data, so that doctors can make better decisions at the point of care? That implies we are analyzing not just more data, but better data that comes in all shapes and sizes, and that changes more frequently. It really points to the need for data interoperability.

Q3. What are the challenges software developers are telling you they have in today’s data-intensive world?

Julie Lockner: That they have too many database technologies to choose from and prefer to have a simple data platform architecture that can support multiple data models and multiple workloads within a single development environment.
We understand that our customers need to build applications that can handle a vast increase in data volume, but also a vast array of data types – static, non-static, local, remote, structured and non-structured. It must be a platform that coalesces all these things, brings services to data, offers a range of data models, and deals with data at any volume to create a more stable, long-term foundation. They want all of these capabilities in one platform – not a platform for each data type.

For software developers today, it’s not enough to pick elements that solve some aspect of a problem and build enterprise solutions around them; not all components scale equally. You need a common platform without sacrificing scalability, security, resilience, rapid response. Meeting all these demands with the right data platform will create a successful application.
And the development experience is significantly improved and productivity drastically increased when they can use a single platform that meets all these needs. This is why they work with InterSystems.

Q4. Traditionally, analytics is used with structured data, “slicing and dicing” numbers. But the traditional approach also involves creating and maintaining a data warehouse, which can only provide a historical view of data. Does this work also in the new world of Internet of Things?

Julie Lockner: I don’t think so. It is generally possible to take amorphous data and build it into a structured data model, but to respond effectively to rapidly changing events, you need to be able to take data in the form in which it comes to you.

If your data platform lacks certain fields, if you lack schema definition, you need to be able to capitalize on all these forms without generating a static model or a refinement process. With a data warehouse approach, it can take days or weeks to create fully cleansed, normalized data.
That’s just not fast enough in today’s always-on world – especially as machine-generated data is not conforming to a common format any time soon. It comes back to the need for a data platform that supports interoperability.

Q5. How hard is it to make decisions based on real-time analysis of structured and unstructured data?

Julie Lockner: It doesn’t have to be hard. You need to generate rules that feed rules engines that, in turn, drive decisions, and then constantly update those rules. That is a radical enhancement of the concept of analytics in the service of improving outcomes, as more real-time feedback loops become available.

The collection of changes we describe as Big Data will profoundly transform enterprise applications of the future. Today we can see the potential to drive business in new ways and take advantage of a convergence of trends, but it is not happening yet. Where progress has been made is the intelligence of devices and first-level data aggregation, but not in the area of services that are needed. We’re not there yet.

Q6. What’s next on the horizon for InterSystems in meeting the data platform requirements of this new world?

Julie Lockner: We continually work on our data platform, developing the most innovative ways we can think of to integrate with new technologies and new modes of thinking. Interoperability is a hugely important component. It may seem a simple task to get to the single most pertinent fact, but the means to get there may be quite complex. You need to be able to make the right data available – easily – to construct the right questions.

Data is in all forms and at varying levels of completeness, cleanliness, and accuracy. For data to be consumed as we describe, you need measures of how well you can use it. You need to curate data so it gets cleansed and you can cull what is important. You need flexibility in how you view data, too. Gathering data without imposing an orthodoxy or structure allows you to gain access to more data. Not all data will conform to a schema a priori.

Q7. Recently you conducted a benchmark test of an application based on InterSystems Caché®. Could you please summarize the main results you have obtained?

Julie Lockner: One of our largest customers is Epic Systems, one of the world’s top healthcare software companies.
Epic relies on Caché as the data platform for electronic medical record solutions serving more than half the U.S. patient population and millions of patients worldwide.

Epic tested the scalability and performance improvements of Caché version 2015.1. Almost doubling the scalability of prior versions, Caché delivers what Epic President Cark Dvorak has described as “a key strategic advantage for our user organizations that are pursuing large-scale medical informatics programs as well as aggressive growth strategies in preparation for the volume-to-value transformation in healthcare.”

Qx Anything else you wish to add?

Julie Lockner: The reason why InterSystems has succeeded in the market for so many years is a commitment to the success of those who depend on our technology. A recent Gartner Magic Quadrant report found we had the highest number of customers surveyed – 85% – who would buy from us again. That is the highest number of any vendor participating in that study.

The foundation of the company’s culture is all about helping our customers succeed. When our customers come to us with a challenge, we all pitch in to solve it. Many times our solutions may address an unusual problem that could benefit others – which then becomes the source of many of our innovations. It is one of the ways we are using problem-solving skills as a winning strategy to benefit others. When our customers are successful at using our engine to solve the world’s most important challenges, we all win.

——————-

Julie Lockner leads data platform product marketing for InterSystems. She has more than 20 years of experience in IT product marketing management and technology strategy, including roles at analyst firm ESG as well as Informatica and EMC.

—————–

Resources

“InterSystems Unveils Major New Release of Caché,” Feb. 25, 2015.

“Gartner Magic Quadrant for Operational DBMS, Donald Feinberg, Merv Adrian, Nick Heudecker, Adam M. Ronthal, and Terilyn Palanca, October 12, 2015, ID: G00271405.

– White Paper: Big Data Healthcare: Data Scalability with InterSystems Caché® and Intel® Processors (LINK to .PDF)

Related Posts

– A Grand Tour of Big Data. Interview with Alan Morrison. ODBMs Industry Watch, February 25, 2016

–  RIP Big Data. By Carl Olofson, Research Vice President, Data Management Software Research, IDC. ODBMS.org, JANUARY 6, 2016.

What is data blending. By Oleg Roderick, David Sanchez, Geisinger Data Science. ODBMS.org, November 2015

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2016/06/on-data-interoperability-interview-with-julie-lockner/feed/ 0
On Data Analytics and the Enterprise. Interview with Narendra Mulani. http://www.odbms.org/blog/2016/05/on-data-analytics-and-the-enterprise-interview-with-narendra-mulani/ http://www.odbms.org/blog/2016/05/on-data-analytics-and-the-enterprise-interview-with-narendra-mulani/#comments Tue, 24 May 2016 16:31:20 +0000 http://www.odbms.org/blog/?p=4144

“A hybrid technology infrastructure that combines existing analytics architecture with new big data technologies can help companies to achieve superior outcomes.”–Narendra Mulani

I have interviewed Narendra MulaniChief Analytics Officer, Accenture Analytics. Main topics of our interview are: Data Analytics, Big Data, the Internet of Things, and their repercussion for the enterprise.

RVZ

Q1. What is your role at Accenture?

Narendra Mulani: I’m the Chief Analytics Officer at Accenture Analytics and I am responsible for building and inspiring a culture of analytics and driving Accenture’s strategic agenda for growth across the business. I lead a team of analytics professionals around the globe that are dedicated to helping clients transform into insight-driven enterprises and focused on creating value through innovative solutions that combine industry and functional knowledge with analytics and technology.

With the constantly increasing amount of data and new technologies becoming available, it truly is an exciting time for Accenture and our clients alike. I’m thrilled to be collaborating with my team and clients and taking part, first-hand, in the power of analytics and the positive disruption it is creating for businesses around globe.

Q2. What are the main drivers you see in the market for Big Data Analytics?

Narendra Mulani: Companies across industries are fighting to secure or keep their lead in the marketplace.
To excel in this competitive environment, they are looking to exploit one of their growing assets: Data.
Organizations see big data as a catalyst for their transformation into digital enterprises and as a way to secure an insight-driven competitive advantage. In particular, big data technologies are enabling companies with greater agility as it helps them to analyze data comprehensively and take more informed actions at a swifter pace. We’ve already passed the transition point with big data – instead of discussing the possibilities with big data, many are already experiencing the actual insight-driven benefits from it, including increased revenues, a larger base of loyal customers, and more efficient operations. In fact, we see our clients looking for granular solutions that leverage big data, advanced analytics and the cloud to address industry specific problems.

Q3. Analytics and Mobility: how do they correlate?

Narendra Mulani: Analytics and mobility are two digital areas that work hand-in-hand on many levels.
As an example, mobile devices and the increasingly connected world through the Internet of Things (IoT) have become two key drivers for big data analytics. As mobile devices, sensors, and the IoT are constantly creating new data sources and data types, big data analytics is being applied to transform the increasing amount of data into important and actionable insight that can create new business opportunities and outcomes. Also, this view can be reversed, where analytics feeds insight into mobile devices such as tablets to workers in offices or out in the field to enable them to make real-time decisions that could benefit their business.

Q4. Data explosion: What does it create ? Risks, Value or both?

Narendra Mulani: The data explosion that’s happening today and will continue to happen due to the Internet of Things creates a lot of opportunity for businesses. While organizations recognize the value that the data can generate, the sheer amount of data – internal data, external data, big data, small data, etc – can be overwhelming and create an obstacle for analytics adoption, project completion, and innovation. To overcome this challenge and pursue actionable insights and outcomes, organizations shouldn’t look to analyze all of the data that’s available, but identify the right data needed to solve the current project or challenge at hand to create value.

It’s also important for companies to manage the potential risk associated with the influx of data and take the steps needed to optimize and protect it. They can do this by aligning IT and business leads to jointly develop and maintain data governance and security strategies. At a high level, the strategies would govern who uses the data and how the data is analyzed and leveraged, define the technologies that would manage and analyze the data, and ensure the data is secured with the necessary standards. Suitable governance and security strategies should be requirements for insight-driven businesses. Without them, organizations could experience adverse and counter-productive results.

Q5. You introduced the concept of the “Modern Data Supply Chain”? How does it differ from the traditional Supply Chain?

Narendra Mulani: As companies’ data ecosystems are usually very complex with many data silos, a modern data supply chain helps them to simplify their data environment and generate the most value from their data. In brief, when data is treated as a supply chain, it can flow swiftly, easily and usefully through the entire organization— and also through its ecosystem of partners, including customers and suppliers.

To establish an effective modern data supply chain, companies should create a hybrid technology environment that enables a data service platform with emerging big data technologies. As a result, businesses will be able to access, manage, move, mobilize and interact with broader and deeper data sets across the organization at a much quicker pace than previously possible and place action on the attained analytics insights that could help it to more effectively deliver to its consumers, develop new innovative solutions, and differentiate in its market.

Q6. You talked about “Retooling the Enterprise”. What do you mean by this?

Narendra Mulani: Some businesses today are no longer just using analytics, they are taking the next step by transforming into insight-driven enterprises. To achieve “insight-driven enterprise” status, organizations need to retool themselves for optimization. They can pursue an insight-driven transformation by:

· Establishing a center of gravity for analytics – a center of gravity for analytics often takes the shape of a Center of Excellence or a similar concentration of talent and resources.
· Employing agile governance – build horizontal governance structures that are focused on outcomes and speed to value, and take a “test and learn” approach to rolling out new capabilities. A secure governance foundation could also improve the democratization of data throughout a business.
· Creating an inter-disciplinary high performing analytics team — field teams with diverse skills, organize talent effectively, and create innovative programs to keep the best talent engaged.
· Deploying new capabilities faster – deploy new, modern and agile technologies, as well as hybrid architectures and specifically designed toolsets, to help revolutionize how data has been traditionally managed, curated and consumed, to achieve speed to capability and desired outcomes. When appropriate, cloud technologies should be integrated into the IT mix to benefit from cloud-based usage models.
· Raising the company’s analytics IQ – have a vision of what would be your “intelligent enterprise” and implement an Analytics Academy that provides analytics training for functional business resources in addition to the core management training programs.

Q7. What are the risks from the Internet of Things? And how is it possible to handle such risks?

Narendra Mulani: The IoT is prompting an even greater focus on data security and privacy. As a company’s machines, employees and ecosystems of partners, providers, and customers become connected through the IoT, securing the data that is flowing across the IoT grid can be increasingly complex. Today’s sophisticated cyber attackers are also amplifying this complexity as they are constantly evolving and leveraging data technology to challenge a company’s security efforts.

To establish strong, effective real-time cyber defense strategy, security teams will need to employ innovative technologies to identify threat behavioral patterns — including artificial intelligence, automation, visualisation, and big data analytics – and an agile and fluid workforce to leverage the opportunities presented by technology innovations. They should also establish policies to address privacy issues that arise out of all the personal data that are being collected. Through this combination of efforts, companies will be able to strengthen its approach to cyber defense in today’s highly connected IoT world and empower cyber defenders to help their companies better anticipate and respond to cyber attacks.

Q8. What are the main lessons you have learned in implementing Big Data Analytic projects?

Narendra Mulani: Organizations should explore the entire big data technology ecosystem, take an outcome-focused approach to addressing specific business problems, and establish precise success metrics before an analytics project even begins. The big data landscape is in a constant state of change with new data sources and emerging big data technologies appearing every day that could offer a company a new value-generating opportunity. A hybrid technology infrastructure that combines existing analytics architecture with new big data technologies can help companies to achieve superior outcomes.
An outcome-focused strategy that embraces analytics experimentation and explores the possible data and technology that can help a company meet its goals and has checkpoints for measuring performance will be very valuable, as this strategy will help the analytics team to know if they should continue on course or need to make a course correction to attain the desired outcome.

Q9. Is Data Analytics only good for businesses? What about using (Big) Data for Societal issues?

Narendra Mulani: Analytics is helping businesses across industries and governments as well to make more informed decisions for effective outcomes, whether it might be to improve customer experience, healthcare or public safety.
As an example, we’re working with a utility company in the UK to help them leverage analytics insights to anticipate equipment failures and respond in near real-time to critical situations, such as leaks or adverse weather events. We are also working with a government agency to analyze its video monitoring feeds to identify potential public safety risks.

Qx Anything else you wish to add?

Narendra Mulani: Another area that’s on the rise is Artificial Intelligence – we define it as a collection of multiple technologies that enable machines to sense, comprehend, act and learn, either on their own or to augment human activities. The new technologies include machine learning, deep learning, natural language processing, video analytics and more. AI is disrupting how businesses operate and compete and we believe it will also fundamentally transform and improve how we work and live. When an organization is pursuing an AI project, it’s our belief that it should be business-oriented, people-focused, and technology rich for it to be most effective.

———

As Chief Analytics Officer and Head Geek – Accenture Analytics, Narendra Mulani is responsible for creating a culture of analytics and driving Accenture’s strategic agenda for growth across the business. He leads a dedicated team of 17,000 Analytic professionals that serve clients around the globe, focusing on value creation through innovative solutions that combine industry and functional knowledge with analytics and technology.

Narendra has held a number of leadership roles within Accenture since joining in 1997. Most recently, he was the managing director – Products North America, where he was responsible for creating value for our clients across a number of industries. Prior to that, he was managing director – Supply Chain, Accenture Management Consulting, leading a global practice responsible for defining and implementing supply chain capabilities at a diverse set of Fortune 500 clients.

Narendra graduated from Bombay University in 1978 with a Bachelor of Commerce, and received an MBA in Finance in 1982 as well as a PhD in 1985 focused on Multivariate Statistics, both from the University of Massachusetts.

Outside of work, Narendra is involved with various activities that support education and the arts. He lives in Connecticut with his wife Nita and two children, Ravi and Nikhil.

———-

Resources

– Ducati is Analytics Driven. Analytics takes Ducati around the world at speed and precision.

Accenture Analytics. Launching an insights-driven transformation.  Download the point of view on analytics operating models to better understand how high performing companies are organizing their capabilities.

– Accenture Cyber Intelligence Platform. Analytics helping organizations to continuously predict, detect and combat cyber attacks.

–  Data Acceleration: Architecture for the Modern Data Supply Chain, Accenture

Related Posts

On Big Data and Data Science. Interview with James KobielusSource: ODBMS Industry Watch,  2016-04-19

On the Internet of Things. Interview with Colin Mahony Source: ODBMS Industry Watch, 2016-03-14

A Grand Tour of Big Data. Interview with Alan MorrisonSource: ODBMS Industry Watch, 2016-02-25

On the Industrial Internet of Things. Interview with Leon GuzendaSource: ODBMS Industry Watch,  2016-01-28

On Artificial Intelligence and Society. Interview with Oren EtzioniSource: ODBMS Industry Watch,  2016-01-15

 

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2016/05/on-data-analytics-and-the-enterprise-interview-with-narendra-mulani/feed/ 0
On the Internet of Things. Interview with Colin Mahony http://www.odbms.org/blog/2016/03/on-the-internet-of-things-interview-with-colin-mahony/ http://www.odbms.org/blog/2016/03/on-the-internet-of-things-interview-with-colin-mahony/#comments Mon, 14 Mar 2016 08:45:56 +0000 http://www.odbms.org/blog/?p=4101

“Frankly, manufacturers are terrified to flood their data centers with these unprecedented volumes of sensor and network data.”– Colin Mahony

I have interviewed Colin Mahony, SVP & General Manager, HPE Big Data Platform. Topics of the interview are: The challenges of the Internet of Things, the opportunities for Data Analytics, the positioning of HPE Vertica and HPE Cloud Strategy.

RVZ

Q1. Gartner says 6.4 billion connected “things” will be in use in 2016, up 30 percent from 2015.  How do you see the global Internet of Things (IoT) market developing in the next years?

Colin Mahony: As manufacturers connect more of their “things,” they have an increased need for analytics to derive insight from massive volumes of sensor or machine data. I see these manufacturers, particularly manufacturers of commodity equipment, with a need to provide more value-added services based on their ability to provide higher levels of service and overall customer satisfaction. Data analytics platforms are key to making that happen. Also, we could see entirely new analytical applications emerge, driven by what consumers want to know about their devices and combine that data with, say, their exercise regimens, health vitals, social activities, and even driving behavior, for full personal insight.
Ultimately, the Internet of Things will drive a need for the Analyzer of Things, and that is our mission.

Q2. What Challenges and Opportunities bring the Internet of Things (IoT)? 

Colin Mahony: Frankly, manufacturers are terrified to flood their data centers with these unprecedented volumes of sensor and network data. The reason? Traditional data warehouses were designed well before the Internet of Things, or, at least before OT (operational technology) like medical devices, industrial equipment, cars, and more were connected to the Internet. So, having an analytical platform to provide the scale and performance required to handle these volumes is important, but customers are taking more of a two- or three-tier approach that involves some sort of analytical processing at the edge before data is sent to an analytical data store. Apache Kafka is also becoming an important tier in this architecture, serving as a message bus, to collect and push that data from the edge in streams to the appropriate database, CRM system, or analytical platform for, as an example, correlation of fault data over months or even years to predict and prevent part failure and optimize inventory levels.

Q3. Big Data: In your opinion, what are the current main demands/needs in the market?

Colin Mahony: All organizations want – and need – to become data-driven organizations. I mean, who wants to make such critical decisions based on half answers and anecdotal data? That said, traditional companies with data stores and systems going back 30-40 years don’t have the same level playing field as the next market disruptor that just received their series B funding and only knows that analytics is the life blood of their business and all their critical decisions.
The good news is that whether you are a 100-year old insurance company or the next Uber or Facebook, you can become a data-driven organization by taking an open platform approach that uses the best tool for the job and can incorporate emerging technologies like Kafka and Spark without having to bolt on or buy all of that technology from a single vendor and get locked in.  Understanding the difference between an open platform with a rich ecosystem and open source software as one very important part of that ecosystem has been a differentiator for our customers.

Beyond technology, we have customers that establish analytical centers of excellence that actually work with the data consumers – often business analysts – that run ad-hoc queries using their preferred data visualization tool to get the insight need for their business unit or department. If the data analysts struggle, then this center of excellence, which happens to report up through IT, collaborates with them to understand and help them get to the analytical insight – rather than simply halting the queries with no guidance on how to improve.

Q4. How do you embed analytics and why is it useful? 

Colin Mahony: OEM software vendors, particularly, see the value of embedding analytics in their commercial software products or software as a service (SaaS) offerings.  They profit by creating analytic data management features or entirely new applications that put customers on a faster path to better, data-driven decision making. Offering such analytics capabilities enables them to not only keep a larger share of their customer’s budget, but at the same time greatly improve customer satisfaction. To offer such capabilities, many embedded software providers are attempting unorthodox fixes with row-oriented OLTP databases, document stores, and Hadoop variations that were never designed for heavy analytic workloads at the volume, velocity, and variety of today’s enterprise. Alternatively, some companies are attempting to build their own big data management systems. But such custom database solutions can take thousands of hours of research and development, require specialized support and training, and may not be as adaptable to continuous enhancement as a pure-play analytics platform. Both approaches are costly and often outside the core competency of businesses that are looking to bring solutions to market quickly.

Because it’s specifically designed for analytic workloads, HPE Vertica is quite different from other commercial alternatives. Vertica differs from OLTP DBMS and proprietary appliances (which typically embed row-store DBMSs) by grouping data together on disk by column rather than by row (that is, so that the next piece of data read off disk is the next attribute in a column, not the next attribute in a row). This enables Vertica to read only the columns referenced by the query, instead of scanning the whole table as row-oriented databases must do. This speeds up query processing dramatically by reducing disk I/O.

You’ll find Vertica as the core analytical engine behind some popular products, including Lancope, Empirix, Good Data, and others as well as many HPE offerings like HPE Operations Analytics, HPE Application Defender, and HPE App Pulse Mobile, and more.

Q5. How do you make a decision when it is more appropriate to “consume and deploy” Big Data on premise, in the cloud, on demand and on Hadoop?

Colin Mahony: The best part is that you don’t need to choose with HPE. Unlike most emerging data warehouses as a service where your data is trapped in their databases when your priorities or IT policies change, HPE offers the most complete range of deployment and consumption models. If you want to spin up your analytical initiative on the cloud for a proof-of-concept or during the holiday shopping season for e-retailers, you can do that easily with HPE Vertica OnDemand.
If your organization finds that due to security or confidentiality or privacy concerns you need to bring your analytical initiative back in house, then you can use HPE Vertica Enterprise on-premises without losing any customizations or disruption to your business. Have petabyte volumes of largely unstructured data where the value is unknown? Use HPE Vertica for SQL on Hadoop, deployed natively on your Hadoop cluster, regardless of the distribution you have chosen. Each consumption model, available in the cloud, on-premise, on-demand, or using reference architectures for HPE servers, is available to you with that same trusted underlying core.

Q6. What are the new class of infrastructures called “composable”? Are they relevant for Big Data?

Colin Mahony: HPE believes that a new architecture is needed for Big Data – one that is designed to power innovation and value creation for the new breed of applications while running traditional workloads more efficiently.
We call this new architectural approach Composable Infrastructure. HPE has a well-established track record of infrastructure innovation and success. HPE Converged Infrastructure, software-defined management, and hyper-converged systems have consistently proven to reduce costs and increase operational efficiency by eliminating silos and freeing available compute, storage, and networking resources. Building on our converged infrastructure knowledge and experience, we have designed a new architecture that can meet the growing demands for a faster, more open, and continuous infrastructure.

Q7. What is HPE Cloud Strategy? 

Colin Mahony: Hybrid cloud adoption is continuing to grow at a rapid rate and a majority of our customers recognize that they simply can’t achieve the full measure of their business goals by consuming only one kind of cloud.
HPE Helion not only offers private cloud deployments and managed private cloud services, but we have created the HPE Helion Network, a global ecosystem of service providers, ISVs, and VARs dedicated to delivering open standards-based hybrid cloud services to enterprise customers. Through our ecosystem, our customers gain access to an expanded set of cloud services and improve their abilities to meet country-specific data regulations.

In addition to the private cloud offerings, we have a strategic and close alliance with Microsoft Azure, which enables many of our offerings, including Haven OnDemand, in the public cloud. We also work closely with Amazon because our strategy is not to limit our customers, but to ensure that they have the choices they need and the services and support they can depend upon.

Q8. What are the advantages of an offering like Vertica in this space?

Colin Mahony: More and more companies are exploring the possibility of moving their data analytics operations to the cloud. We offer HPE Vertica OnDemand, our data warehouse as a service, for organizations that need high-performance enterprise class data analytics for all of their data to make better business decisions now. Built by design to drastically improve query performance over traditional relational database systems, HPE Vertica OnDemand is engineered from the same technology that powers the HPE Vertica Analytics Platform. For organizations that want to select Amazon hardware and still maintain the control over the installation, configuration, and overall maintenance of Vertica for ultimate performance and control, we offer Vertica AMI (Amazon Machine Image). The Vertica AMI is a bring-your-own-license model that is ideal for organizations that want the same experience as on-premise installations, only without procuring and setting up hardware. Regardless of which deployment model to choose, we have you covered for “on demand” or “enterprise cloud” options.

Q9. What is HPE Vertica Community Edition?

Colin Mahony: We have had tens of thousands of downloads of the HPE Vertica Community Edition, a freemium edition of HPE Vertica with all of the core features and functionality that you experience with our core enterprise offering. It’s completely free for up to 1 TB of data storage across three nodes. Companies of all sizes prefer the Community Edition to download, install, set-up, and configure Vertica very quickly on x86 hardware or use our Amazon Machine Image (AMI) for a bring-your-own-license approach to the cloud.

Q10. Can you tell us how Kiva.org, a non-profit organization, uses on-demand cloud analytics to leverage the internet and a worldwide network of microfinance institutions to help fight poverty? 

Colin Mahony: HPE is a major supporter of Kiva.org, a non-profit organization with a mission to connect people through lending to alleviate poverty. Kiva.org uses the internet and a worldwide network of microfinance institutions to enable individuals lend as little as $25 to help create opportunity around the world. When the opportunity arose to help support Kiva.org with an analytical platform to further the cause, we jumped at the opportunity. Kiva.org relies on Vertica OnDemand to reduce capital costs, leverage the SaaS delivery model to adapt more quickly to changing business requirements, and work with over a million lenders, hundreds of field partners and volunteers, across the world. To see a recorded Webinar with HPE and Kiva.org, see here.

Qx Anything else you wish to add?

Colin Mahony: We appreciate the opportunity to share the features and benefits of HPE Vertica as well as the bright market outlook for data-driven organizations. However, I always recommend that any organization that is struggling with how to get started with their analytics initiative to speak and meet with peers to learn best practices and avoid potential pitfalls. The best way to do that, in my opinion, is to visit with the more than 1,000 Big Data experts in Boston from August 29 – September 1st at the HPE Big Data Conference. Click here to learn more and join us for 40+ technical deep-dive sessions.

————-

Colin Mahony, SVP & General Manager, HPE Big Data Platform

Colin Mahony leads the Hewlett Packard Enterprise Big Data Platform business group, which is responsible for the industry leading Vertica Advanced Analytics portfolio, the IDOL Enterprise software that provides context and analysis of unstructured data, and Haven OnDemand, a platform for developers to leverage APIs and on demand services for their applications.
In 2011, Colin joined Hewlett Packard as part of the highly successful acquisition of Vertica, and took on the responsibility of VP and General Manager for HP Vertica, where he guided the business to remarkable annual growth and recognized industry leadership. Colin brings a unique combination of technical knowledge, market intelligence, customer relationships, and strategic partnerships to one of the fastest growing and most exciting segments of HP Software.

Prior to Vertica, Colin was a Vice President at Bessemer Venture Partners focused on investments primarily in enterprise software, telecommunications, and digital media. He established a great network and reputation for assisting in the creation and ongoing operations of companies through his knowledge of technology, markets and general management in both small startups and larger companies. Prior to Bessemer, Colin worked at Lazard Technology Partners in a similar investor capacity.

Prior to his venture capital experience, Colin was a Senior Analyst at the Yankee Group serving as an industry analyst and consultant covering databases, BI, middleware, application servers and ERP systems. Colin helped build the ERP and Internet Computing Strategies practice at Yankee in the late nineties.

Colin earned an M.B.A. from Harvard Business School and a bachelor’s degrees in Economics with a minor in Computer Science from Georgetown University.  He is an active volunteer with Big Brothers Big Sisters of Massachusetts Bay and the Joey Fund for Cystic Fibrosis.

Resources

What’s in store for Big Data analytics in 2016, Steve Sarsfield, Hewlett Packard Enterprise. ODBMS.org, 3 FEB, 2016

What’s New in Vertica 7.2?: Apache Kafka Integration!, HPE, last edited February 2, 2016

Gartner Says 6.4 Billion Connected “Things” Will Be in Use in 2016, Up 30 Percent From 2015, Press release, November 10, 2015

The Benefits of HP Vertica for SQL on Hadoop, HPE, July 13, 2015

Uplevel Big Data Analytics with Graph in Vertica – Part 5: Putting graph to work for your business , Walter Maguire, Chief Field Technologist, HP Big Data Group, ODBMS.org, 2 Nov, 2015

HP Distributed R ,ODBMS.org,  19 FEB, 2015.

Understanding ROS and WOS: A Hybrid Data Storage Model, HPE, October 7, 2015

Related Posts

On Big Data Analytics. Interview with Shilpa LawandeSource: ODBMS Industry Watch, Published on December 10, 2015

On HP Distributed R. Interview with Walter Maguire and Indrajit RoySource: ODBMS Industry Watch, Published on April 9, 2015

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2016/03/on-the-internet-of-things-interview-with-colin-mahony/feed/ 0