Kafka: a Distributed Messaging System for Log Processing

Jay Kreps, NehaNarkhede, Jun Rao, LinkedIn Corp.

ABSTRACT: Log processing has become a critical component of the data pipeline for consumer internet companies. We introduce Kafka, a distributed messaging system that we developed for collecting and delivering high volumes of log data with low latency. Our system incorporates ideas from existing log aggregators and messaging systems, and is suitable for both offline and online message consumption. We made quite a few unconventional yet practical design choices in Kafka to make our system efficient and scalable. Our experimental results show that Kafka has superior performance when compared to two popular messaging systems. We have been using Kafka in production for some time and it is processing hundreds of gigabytes of new data each day. NetDB’11, Jun. 12, 2011, Athens, Greece.

Copyright 2011 ACM

LINK (.PDF)

You may also like...