Big Data: Three questions to Aerospike.
“Many tools now exist to run database software without installing software. From vagrant boxes, to one click cloud install, to a cloud service that doesn’t require any installation, developer ease of use has always been a path to storage platform success.”–Brian Bulkowski.
The fifth interview in the “Big Data: three questions to “ series of interviews, is with Brian Bulkowski, Aerospike co-founder and CTO.
RVZ
Q1. What is your current product offering?
Brian Bulkowski: Aerospike is the first in-memory NoSQL database optimized for flash or solid state drives (SSDs).
In-memory for speed and NoSQL for scale. Our approach to memory is unique – we have built our own file system to access flash, we store indexes in DRAM and you can configure data sets to be in a combination of DRAM or flash. This gives you close to DRAM speeds, the persistence of rotational drives and the price performance of flash.
As next gen apps scale up beyond enterprise scale to “global scale”, managing billions of rows, terabytes of data and processing from 20k to 2 million read/write transactions per second, scaling costs are an important consideration. Servers, DRAM, power and operations – the costs add up, so even developers with small initial deployments must architect their systems with the bottom line in mind and take advantage of flash.
Aerospike is an operational database, a fast key-value store with ACID properties – immediate consistency for single row reads and writes, plus secondary indexes and user defined functions. Values can be simple strings, ints, blobs as well as lists and maps.
Queries are distributed and processed in parallel across the cluster and results on each node can be filtered, transformed, aggregated via user defined functions. This enables developers to enhance key value workloads with a few queries and some in-database processing.
Q2. Who are your current customers and how do they typically use your products?
Brian Bulkowski: We see two use cases – one as an edge database or real-time context store (user profile store, cookie store) and another as a very cost-effective and reliable cache in front of a relational database like mySQL or DB2.
Our customers are some of the biggest names in real-time bidding, cross channel (display, mobile, video, social, gaming) advertising and digital marketing, including AppNexus, BlueKai, TheTradeDesk and [X+1]. These companies use Aerospike to store real-time user profile information like cookies, device-ids, IP addresses, clickstreams, combined with behavioral segment data calculated using analytics platforms and models run in Hadoop or data warehouses. They choose Aerospike for predictable high performance, where reads and writes consistently, meaning 99% of the time, complete within 2-3 milliseconds.
The second set of customers use us in front of an existing database for more cost-effective and reliable caching. In addition to predictable high performance they don’t want to shard Redis, and they need persistence, high availability and reliability. Some need rack-awareness and cross data center support and they all want to take advantage of Aerospike deployments that are both simpler to manage and more cost-effective than alternative NoSQL databases, in-memory databases and caching technologies.
Q3. What are the main new technical features you are currently working on and why?
Brian Bulkowski: We are focused on ease of use, making development easier – quickly writing powerful, scalable applications – with developer tools and connectors. In our Aerospike 3 offering, we launched indexes and distributed queries, user defined functions for in-database processing, expressive API support, and aggregation queries. Performance continues to improve, with support for today’s highly parallel CPUs, higher density flash arrays, and improved allocators for RAM based in-memory use cases.
Developers love Aerospike because it’s easy to run a service operationally. That scale comes after the developer builds their original applications, so developers want samples and connectors that are tested and work easily. Whether that’s an ETL loader for CSV and JSON that’s parallel and scalable, a Hadoop connector to pour insights directly to Aerospike in order to drive hot interface changes, or improving our Mac OSX client that developers need, or HTTP/REST interfaces, developers need the ability to write their core application code to easily use Aerospike.
Many tools now exist to run database software without installing software. From vagrant boxes, to one click cloud install, to a cloud service that doesn’t require any installation, developer ease of use has always been a path to storage platform success.
Related Posts
– Big Data: Three questions to McObject, ODBMS Industry Watch, February 14, 2014
– Big Data: Three questions to VoltDB. ODBMS Industry Watch, February 6, 2014.
– Big Data: Three questions to Pivotal. ODBMS Industry Watch, January 20, 2014.
–Big Data: Three questions to InterSystems. ODBMS Industry Watch, January 13, 2014.
Resources
##
Brian here again —
We’ve released our Aerospike Hadoop integration, that allows you to easily use Aerospike as your Hadoop datastore. It’s location-aware, so you can run MapReduce without network traffic, or you can real-time emit to Aerospike to use your insights in applications _immediately_.
http://github.com/aerospike/aerospike-hadoop