We have a large distributed system here at Rackspace, which we’re scaling and which currently processes 10 million metrics per minute. Part of the scaling effort required us to switch to using Apache Kafka for our message queues. We’ll be looking at a few possible clients to explore our options and benchmarking their throughput.
We operate a monitoring system, which resides in 3 main Rackspace datacenters. We use a lightweight monitoring agent to keep track of things. The agent is written in Lua+luvit and currently resides on over 125,000 individual servers.
Our system already has a few different programming languages in it, so we figured it made sense to look at a few different clients. We have kafka wired up to our ingestors and have 64 partitions, three brokers, and snappy compression enabled.
First we ran benchmarks against the Node.js client, because a very large part of the system is written in it. The most popular kafka client for node.js is https://github.com/SOHU-Co/kafka-node.
We didn’t get the best results from it, and it averaged 6,000–8,000 messages per second. Using a node.js cluster module we could scale this up at the cost of CPU and memory overhead. We have a decent amount of memory on our servers and 12 CPU cores each.
With the SimpleConsumer from kafka-node and 1 worker per core, we managed to get about 70,000 messages per second. Using the HighLevelConsumer, we got about 40,000 messages per second.
The Mailgun team at Rackspace also uses kafka and has written an excellent HTTP aggregating proxy. It translates kafka to simple HTTP get and put requests. Kafka-pixy is written in Go and uses Shopify’s Sarama kafka client library.
With the HTTP overhead on a single thread, this performed significantly worse, managing 700–800 messages per second. But it handles quite a few implementation details that need to be taken care of and provides a language agnostic interface to kafka.
Almost satirical, but you can wire up logstash to kafka. We investigated this since our metrics storage database blueflood has a logstash input plugin. We wanted to see if this was a viable option, but it wasn’t, managing around 250 messages per second.
Go is a fairly new language that’s been rapidly gaining popularity, and one of the better kafka consumer implementations exist in this language. Built and opensourced by Shopify, it’s called Sarama. We got fairly decent throughput, with a single thread being able to handle 28k-30k messages per second.
Last but not least, we got benchmarks using Java. Java performed better than any other library. After all, Kafka’s native API is java, so this is the only language where you’re not using a third-party library.
We managed to comfortably get between 40,000 to 50,000 messages per second. Perhaps the only library that could compare, or even beat this is the C++ library librdkafka, which has been cited by Confluent as being capable of 85,000 messages per second. We did not benchmark this ourselves, though.
Node.js isn’t optimized for high throughput applications such as kafka. So if you need the high processing rates that come standard on kafka, perhaps Go, C++ or Java are your best friends.
In the end we went with Go, because it provides a nice middle ground between developer time spent building our components and performance.
|Java||40,000 - 50,0000|
|Go||28,000 - 30,0000|
|Node||6,000 - 8,0000|
|Kafka-pixy||700 - 800|