AsterixDB Big Data Management System (BDMS)
Overview (last updates October 2014)
The AsterixDB BDMS is the result of over four years of R&D involving researchers at UC Irvine, UC Riverside, and UC San Diego. The AsterixDB code base currently consists of nearly 300K lines of Java code that was co-developed at UC Irvine and UC Riverside.
Initiated in 2009, the NSF-sponsored ASTERIX project has been developing new technologies for ingesting, storing, managing, indexing, querying, and analyzing vast quantities of semi-structured information. The project has been combining ideas from three distinct areas—semi-structured data, parallel databases, and data-intensive computing (a.k.a. today’s Big Data platforms)—in order to create a next-generation, open-source software platform that scales by running on large, shared-nothing commodity computing clusters.
The ASTERIX effort has been targeting a wide range of semi-structured information, ranging from “data” use cases—where information is well-typed and highly regular—to “content” use cases—where data tends to be irregular, much of each datum may be textual, and the ultimate schema for the various data types involved may be hard to anticipate up front. The ASTERIX project has been addressing technical issues including highly scalable data storage and indexing, semi-structured query processing on large clusters, and merging time-tested parallel database techniques with modern data-intensive computing techniques to support performant yet declarative solutions to the problem of storing and analyzing semi-structured information effectively.
The fruits of this labor to date have been captured in the AsterixDB system that was released in preliminary or “Beta” release form in mid-2013 (and then refreshed several times since then). We are hoping that the arrival of AsterixDB will mark the beginning of the “BDMS era”, and we hope that both the Big Data community and the database community will find the AsterixDB system to be interesting and useful for a much broader class of problems than can be addressed with any one of today’s current Big Data platforms and related technologies (e.g., Hadoop, Pig, Hive, HBase, MongoDB, and so on). One of our project mottos has been “one size fits a bunch”—at least that has been our aim.