Big Data: Three questions to InterSystems.
“The absence of a schema has some flexibility advantages, although for querying the data, the absence of a schema presents some challenges to people accustomed to a classic RDBMS. “–Iran Hutchinson.
I start this new year with a new series of short interviews to leading vendors of Big Data technologies. I call them “Big Data: three questions to“. The first of such interviews is with Iran Hutchinson, Big Data Specialist at InterSystems.
Q1. What is your current “Big Data” products offering?
Iran Hutchinson: InterSystems has actually been in the Big Data business for some time, since 1978, long before anyone called it that. We currently offer an integrated database, integration and analytics platform based on InterSystems Caché®, our flagship product, to enable Big Data breakthroughs in a variety of industries.
Launched in 1997, Caché is an advanced object database that provides in-memory speed with persistence, and the ability to ingest huge volumes of transactional data at insanely high velocity. It is massively scalable, because of its very lean design. Its efficient multidimensional data structures require less disk space and provide faster SQL performance than relational databases. Caché also provides sophisticated analytics, enabling real-time queries against transactional data with minimal maintenance and hardware requirements.
InterSystems Ensemble® is our seamless platform for integrating and developing connected applications. Ensemble can be used as a central processing hub or even as backbone for nationwide networks. By integrating this connectivity with our high-performance Caché database, as well as with new technologies for analytics, high-availability, security, and mobile solutions, we can deliver a rock-solid and unified Big Data platform, not a patchwork of disparate solutions.
We also offer additional technologies built on our integrated platform, such as InterSystems HealthShare®, a health informatics platform that enables strategic interoperability and analytics for action. Our TrakCare unified health information system is likewise built upon this same integrated framework.
Q2. Who are your current customers and how do they typically use your products?
Iran Hutchinson: We continually update our technology to enable customers to better manage, ingest and analyze Big Data. Our clients are in healthcare, financial services, aerospace, utilities – industries that have extremely demanding requirements for performance and speed. For example, Caché is the world’s most widely used database in healthcare. Entire countries, such as Sweden and Scotland, run their national health systems on Caché, as well as top hospitals and health systems around the world. One client alone runs 15 percent of the world’s equity trades through InterSystems software, and all of the top 10 banks use our products.
It is also being used by the European Space Agency to map a billion stars – the largest data processing task in astronomy to date. (See The Gaia Mission One Year Later.)
Our configurable ACID (Atomicity Consistency Isolation Durability) capabilities and ECP-based approach enable us to handle these kinds of very large-scale, very high-performance, transactional Big Data applications.
Q3. What are the main new technical features you are currently working on and why?
Iran Hutchinson: There are several new paradigms we are focusing on, but let’s focus on analytics. Once you absorb all that Big Data, you want to run analytics. And that’s where the three V’s of Big Data – volume, velocity and variety – are critically important.
Let’s talk about the variety of data. Most popular Big Data analytics solutions start with the assumption of structured data – rows and columns – when the most interesting data is unstructured, or text-based data. A lot of our competitors still struggle with unstructured data, but we solved this problem with Caché in 1997, and we keep getting better at it. InterSystems Caché offers both vertical and horizontal scaling, enabling schema-less and schema-based (SQL) querying options for both structured and unstructured data.
As a result, our clients today are running analytics on all their data – and we mean real-time, operational data, not the data that is aggregated a week later or a month later for boardroom presentations.
A lot of development has been done in the area of schema-less data stores or so-called document stores, which are mainly key-value stores. The absence of a schema has some flexibility advantages, although for querying the data, the absence of a schema presents some challenges to people accustomed to a classic RDBMS. Some companies now offer SQL querying on schema-less data stores as an add-on or plugin. InterSystems Caché provides a high-performance key-value store with native SQL support.
The commonly available SQL-based solutions also require a predefinition of what the user is interested in. But if you don’t know the data, how do you know what’s interesting? Embedded within Caché is a unique and powerful text analysis technology, called iKnow, that analyzes unstructured data out of the box, without requiring any predefinition through ontologies or dictionaries. Whether it’s English, German, or French, iKnow can automatically identify concepts and understand their significance – and do that in real-time, at transaction speeds.
iKnow enables not only lightning-fast analysis of unstructured data, but also equally efficient Google-like keyword searching via SQL with a technology called iFind.
And because we married that iKnow technology with another real-time OLAP-type technology we call DeepSee, we make it possible to embed this analytic capability into your applications. You can extract complex concepts and build cubes on both structured AND unstructured data. We blend keyword search and concept discovery, so you can express a SQL query and pull out both concepts and keywords on unstructured data.
Much of our current development activity is focused on enhancing our iKnow technology for a more distributed environment.
This will allow people to upload a data set, structured and/or unstructured, and organize it in a flexible and dynamic way by just stepping through a brief series of graphical representation of the most relevant content in the data set. By selecting, in the graphs, the elements you want to use, you can immediately jump into the micro-context of these elements and their related structured and unstructured information objects. Alternately, you can further segment your data into subsets that fit the use you had in mind. In this second case, the set can be optimized by a number of classic NLP strategies such as similarity extension, typicality pattern parallelism, etc. The data can also be wrapped into existing cubes or into new ones, or fed into advanced predictive models.
So our goal is to offer our customers a stable solution that really uses both structured and unstructured data in a distributed and scalable way. We will demonstrate the results of our efforts in a live system at our next annual customer conference, Global Summit 2014.
We also have a software partner that has built a very exciting social media application, using our analytics technology. It’s called Social Knowledge, and it lets you monitor what people are saying on Twitter and Facebook – in real-time. Mind you, this is not keyword search, but concept analysis – a very big difference. So you can see if there’s a groundswell of consumer feedback on your new product, or your latest advertising campaign. Social Knowledge can give you that live feedback – so you can act on it right away.
In summary, today InterSystems provides SQL and DeepSee over our shared data architecture to do structured data analysis.
And for unstructured data, we offer iKnow semantic analysis technology and iFind, our iKnow-powered search mechanism, to enable information discovery in text. These features will be enabled for text analytics in future versions of our shared-nothing data architectures.
Follow ODBMS.org on Twitter: @odbmsorg