On Apache Ignite v1.0. Interview with Nikita Ivanov.
“Apache Ignite is an incubating Apache project, which provides a high-performance, distributed in-memory data management software layer between various data sources and applications.”–Nikita Ivanov
I have interviewed Nikita Ivanov, founder and CTO of GridGain Systems. Main topic of the interview is the new release of Apache Ignite.
Q1. In your opinion, what are the main differences between an In-Memory Database, an In-Memory Data Grid and an In-Memory Data Fabric?
Nikita Ivanov: The main difference between in-memory databases (IMDB) and in-memory data grids is that IMDBs support only SQL (or some proprietary NoSQL dialect) while most Data Grids (IMDG) support multiple ways to access and process data. In IMDB the only way to access and process data is SQL and SQL store-procedures, while IMDGs typically support at least the following paradigms: SQL, Key/Value, MapReduce, MPP, and MPI-based processing.
Compared with IMDGs, an In-Memory Data Fabric represents the latest generation of in-memory technologies, integrated into a single platform, which eliminates the need for point solutions such as IMDB’s or IMDGs. It is a software layer that sits between applications and data stores, and it allows for high-performance data access and processing across different types of data, such as SQL, NoSQL and Hadoop. All without any rip and replace of existing applications or databases.
Q2. How is it possible to accelerate Hadoop-based deployments with in-memory technology?
Nikita Ivanov: To accelerate Hadoop in a meaningful way one needs to find a way to accelerate two core technologies that define Hadoop: HDFS, a distributed file system where data is stored, and MapReduce, a framework that allows parallel processing of the data stored in HDFS.
At GridGain, we’ve developed a highly optimized in-memory file system that is 100% compatible with HDFS that allows to store data directly in DRAM of computers in a Hadoop cluster. We’ve also developed a specifically optimized YARN-based MapReduce implementation that takes full advantage of the data stored directly in DRAM instead of disks.
The combination of these two innovations allows GridGain to speed up any Hadoop payloads – including Pig, Hive, or hand-written MapReduce jobs in any language – up to 10x without any code change. GridGain provides the first Hadoop accelerator that provide a true plug-and-play acceleration to the existing Hadoop jobs.
Q3. Why did you decide to open source your product?
Nikita Ivanov: Even before October of last year, GridGain already had an open core model: We offered an in-memory data fabric under the Apache 2.0 license, and we also offered a commercial edition with a number of enterprise-grade features, such as enhanced security, data center replication, rolling updates, cross-language portability, and others.
The drivers for our decision to contribute our core open source code base to the Apache Software Foundation (ASF) were of course to ensure continued, broad adoption of in-memory technologies and the long-term viability of the code base. But equally importantly, we also want to build a thriving community that adopts and adapts this code base, and hence will be key in finding new use cases for in-memory computing.
Q4. What is Apache Ignite?
Nikita Ivanov: Apache Ignite (incubating) is an open source, distributed framework for a unified In-Memory Data Fabric, originally developed by GridGain Systems. Apache Ignite is an incubating Apache project, which provides a high-performance, distributed in-memory data management software layer between various data sources and applications. Its code is written mostly in Java and Scala with small amount of C++ code, and it will initially combine an in-memory data grid, in-memory compute grid and in-memory streaming processing in one framework.
Apache Ignite’s large scale, in-memory framework offers transactional and real-time analytics applications performance gains of 100-1,000 times faster throughput and/or lower latencies. It is also a key open source foundation to enable the emerging class of so-called hybrid transactional-analytical workloads.
Q5. What is special about v1.0?
Nikita Ivanov: In October of 2014, the GridGain In-Memory Data Fabric core code base was accepted by the Apache Software Foundation (ASF) into the Incubator program under the name “Apache Ignite”.
Since then, GridGain engineers as well as other contributors have been busy working on migrating the existing code base, documentation, and refactoring of the existing internal build, test & release processes to the “Apache Way”.
Version 1.0 represents the first release that meets these goals, and will include additional enhancements above and beyond the most recent open source In-Memory Data Fabric from GridGain. In fact, Apache Ignite has a large set of features, and one of its coolest new features is its ability to automatically integrate with different RDBMS systems, such as Oracle, MySql, Postgres, DB2, Microsoft SQL, etc. This feature automatically generates the application domain model based on the schema definition of the underlying database, and then loads the data.
Despite the breadth of its feature set, however, Ignite is actually very easy to use: For example, there are no custom installers. The product comes as one ZIP file, which is ready to go once you unzip it. And it has only 1 mandatory dependency – ignite-core.jar. All other dependencies, like integration with Spring for configuration, or with the H2 database for SQL, can be added to the process a la carte. Also, the project is fully mavenized, and is composed of over a dozen of maven artifacts that can be imported and used in any combination. Apache Ignite is based on standard Java APIs, and for distributed caches and data grid functionality Ignite implements the JCache (JSR 107) standard.
The new Apache Ignite v1.0 bits are available for download now from the Apache Ignite web site.
Q6. Who will be using the Apache Ignite In-Memory Data Fabric, and for what?
Nikita Ivanov: We expect developers and software architects of high-performance, hyper-scale on-premise and SaaS applications to take advantage of the following capabilities when building or performance-tuning their new or existing applications: compute grid, data grid, service grid, streaming, clustering, distributed data structures, distributed messaging, distributed events and in-memory file system.
Use cases can be found in software designed for financial services, telecommunications, retail, transportation, social media, online advertising, utilities, biosciences and many other industries.
Q7. What is positioning of the Apache Ignite project?
Nikita Ivanov: As we explained in our blog from last November, we believe Apache Ignite has all the right ingredients to become for the world of Fast Data what Hadoop is for Big Data today. This means that unlike Hadoop, which is a batch process focused on enabling the storage of large amounts of data economically, Ignite will enable extremely fast and ultra-low latency processing of data, allowing its users to derive actionable insights from their data much faster. Unlike Spark, a popular sister project of Ignite in the ASF, which is mainly focused on enhancing analytics and machine-learning for the Hadoop world, Ignite is a data source agnostic processing layer, which can be used for both Hadoop-like computation and many other computing paradigms like MPP, MPI, streaming processing.
In addition to real-time analytics, Ignite’s in-memory framework also offers support for full ACID transactions.
Q8. You have previously posted that Oracle and SAP are missing the point of In-Memory Computing. Could you please elaborate on this?
Nikita Ivanov: We continue to believe that Oracle and SAP are missing the point of in-memory computing for the following reasons: By offering a well-integrated platform of a compute grid, data grid, streaming/CEP and Hadoop acceleration, Apache Ignite (incubating) and the GridGain In-Memory Data Fabric offer a strategic approach to in-memory computing, across both transactional and analytical workloads, that delivers performance, scale and comprehensive capabilities far above and beyond what traditional in-memory databases, data grids or other in-memory-based point solutions can offer by themselves.
Both Apache Ignite and GridGain’s enterprise offering built on Apache Ignite will greatly benefit from a thriving community adapting the code base to new and emerging use cases; therefore, we believe this code base is extremely well positioned to drive superior innovation to the world of Fast Data, just as the Hadoop community has been doing for Big Data.
In addition, unlike Oracle or SAP Hana, Apache Ignite is more affordable, easier-to-access and more transparent open source software running on commodity hardware, which typically increases developers’ and architects’ motivation to explore the potential of in-memory computing. That said, if all the customer is looking for from in-memory technology is faster processing of their (SQL) data, then they may still choose to deploy proprietary software from Oracle or SAP.
Qx Anything else you wish to add?
Nikita Ivanov: I guess I should mention that even though Apache Ignite has been in incubation for less than 4 months only, we are excited to see that the project already has a very vibrant and growing community.
But we always welcome community contributions, so if there are readers that would like to contribute, please send an email to the Apache Ignite dev list, and we will get you started. And even if you are not ready to contribute immediately, we would like to invite everyone to join our dev list. Most of the discussions happen there, and you can find out a lot about where the project is going and also provide your own ideas. Another great way, of course, for people to familiarize themselves with Apache Ignite, is to take a look at the code and see what it can do for thier project. The Ignite bits can be downloaded on the Apache Ignite homepage.
Nikita Ivanov is founder and CTO of GridGain Systems, started in 2007 and funded by RTP Ventures and Almaz Capital. Nikita has led GridGain to develop advanced and distributed in-memory data processing technologies – the top Java in-memory data fabric starting every 10 seconds around the world today.
Nikita has over 20 years of experience in software application development, building HPC and middleware platforms, contributing to the efforts of other startups and notable companies including Adaptec, Visa and BEA Systems. Nikita was one of the pioneers in using Java technology for server side middleware development while working for one of Europe’s largest system integrators in 1996.
He is an active member of Java middleware community, contributor to the Java specification, and holds a Master’s degree in Electro Mechanics from Baltic State Technical University, Saint Petersburg, Russia.
– On Solr and Mahout. Interview with Grant Ingersoll. ODBMS Industry Watch, 2015-01-06
– Big Data: Three questions to McObject. ODBMS Industry Watch, February 14, 2014
Follow ODBMS.org on Twitter: @odbmsorg