Google BigQuery Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery solves this problem by enabling super-fast, SQL-like queries against append-only tables, using the processing power...
Sort Benchmark. New: We will soon release the specification for a new sort benchmark, CloudSort, that measures the total cost of ownership for external sorts performed in a cloud environment. In order to encourage...
Stream processing: S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. LINK
List of companies providing products that include Apache Hadoop, a derivative work thereof, commercial support, and/or tools and utilities related to Hadoop (Link- open new tab). LINK (open new tab)
Distributed system:: Apache Kafka – distributed publish-subscribe messaging system. Apache Kafka is a distributed publish-subscribe messaging system. It is designed to support the following: – Persistent messaging with O(1) disk structures that provide constant time...
Stream processing: Storm is a freeand open source distributed realtime computation system. Storm makesit easy to reliably process unbounded streams of data, doing for realtime processing what Hadoopd is for batch processing. LINK
Distributed system:: Drill— Apache Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google’s Dremel, with the additional flexibility needed to support a broader range of query languages,...
Stratosphere is an open-source system for Big Data Analytics that can be deployed in a local cluster using HDFS or in the Amazon cloud. The platform is jointly developed by TU Berlin, HU Berlin,...