Elevating Kafka: Driving operational excellence with Albertsons + Forrester | Watch Webinar

Better to Be Wrong Than Vague: Apache Kafka and Software Architecture Predictions for 2021

Get started with Confluent, for free

Watch demo: Kafka streaming in 10 minutes

작성자:

On a recent episode of Streaming Audio, Gwen Shapira, Michael Noll, and Ben Stopford joined me to hold forth about the near future of Apache Kafka® and software architecture in general. House rules were that predictions could cover any topic, but they had to be “precise” in the spirit of Bob Metcalfe, who, back in the 90s, famously predicted a particular day the dot-com bubble would pop, under the theory that it was better to be wrong than vague. At least we all felt pretty sure that our predictions couldn’t possibly be worse than the ones being made at the same time in 2019.

10 million partitions in a single production Kafka cluster

Apache KafkaWe began, fairly enough, with Kafka itself. Gwen started us off by predicting that by the end of the year it will be possible to run a Kafka cluster with 10 million partitions, facilitated by some consequential architectural changes: KIP-405 (Tiered Storage) and KIP-500 (ZooKeeper removal). These KIPs enable growth in the number of partitions by moving data out of the cluster proper and enabling metadata to be managed in a more scalable and robust way.

Double the size of a Kafka cluster in seconds

DatabasesBen predicted being able to double the size of a Kafka cluster in seconds, a task specifically enabled by Tiered Storage. Tiered Storage is often understood simply as a cost-savings and storage play, which is fair enough. People using Kafka as a system of record tend to want longer retention periods, and Tiered Storage is an obvious enough improvement to the economics of that architecture, but it doesn’t stop there. You also get quick autoscaling. Because so much state gets offloaded to your friendly neighborhood cloud object store, when you go to scale brokers, there is significantly less data to move around.

Another architectural benefit is a potential performance boost: data in the object store tier is accessed over the network, with the presumption that it is accessed less frequently than data still on disk. If one plays one’s cards right, that local hotset can fit entirely into the broker’s page cache, making I/O on the hotset a vastly faster proposition. Remember, when it comes to data access patterns, the power law works for you; you don’t work for it.

Streaming everywhere

Event streamingMichael’s prediction, for which there was consensus (see what I did there, KIP-595?), was the continued growth of Kafka-like streaming features in products across the data landscape: from relational stalwarts like Oracle, to Redis, to traditional messaging systems like RabbitMQ. Users have increasingly come to expect features that will let them work with real-time, unbounded datasets, and vendors tend to notice things that users expect.

Given that this transition to understand systems “events first” is well underway and already looks rooted in the emerging software architecture consensus, I will see Michael’s prediction for the year and raise him another couple of decades: I predict event streaming will be seen as the dominant paradigm of this generation’s software architectures.

Multi-paradigm products

StorageSo it’s clear that event streaming is happening, but another question is how it can best be added to existing database products, since many existing tools came to life before this paradigm was yet a thing. To begin with, companies each have their own idea for how streaming should even be defined, as Michael has seen firsthand with his work on the committee writing the SQL standard’s streaming extension. And it can be hard to retrofit an existing product built under batch- or state-oriented assumptions, particularly when one wants to operate it at scale. As Gwen pointed out, it may even require completely new data structures to make a truly successful multi-paradigm solution.

As a side note, the broader Kafka ecosystem is making its own claim on multi-paradigm status, since it began with streaming and later added ksqlDB, which brings database concepts and SQL itself into an event-driven system.

Conclusion

So our money is on the table. I must say that I would be surprised if when I’m starting to roll into my Christmas playlist in October of 2021, streaming isn’t even more on the minds of those in the industry than it is now. I don’t know that it will be completely mainstream, but if you’re not doing it already by then, or at least thinking seriously about it, you might start to feel a bit behind the zeitgeist. It would also be surprising if by that same time the effects of KIP-500 and the rest of the gang haven’t started to make their mark in the community’s collective imagination, as we continue to think about what we might build with Kafka next.

Interested in more?

If you want to hear the episode for yourself, have a listen to Streaming Audio and make sure to subscribe through Apple Podcasts or wherever fine podcasts are sold.

Listen Now

  • Tim Berglund는 강사이자 저술가이며 StarTree 개발자 관계 리더로 재직하고 있습니다. 미국 및 전 세계의 컨퍼런스에서 발표하는 시간도 자주 갖고 있습니다. 또한 Git부터 Distributed Systems에 이르는 다양한 주제를 아우르는 O'Reilly 교육 비디오의 공동 진행자이며 Gradle Beyond the Basics의 저자이기도 합니다. Tim은 X(구 Twitter)에서 @tlberglund 계정으로 활동 중이고, 아주 가끔이지만 운영 중인 블로그 http://timberglund.com에 포스팅하며, http://devrelrad.io 팟캐스트를 공동으로 진행하고 있습니다. 장성한 두 자녀를 독립시킨 후 어린 시절 만난 아내와 막내 자녀와 함께 미국 콜로라도주 리틀턴에 거주 중입니다.

Get started with Confluent, for free

Watch demo: Kafka streaming in 10 minutes

이 블로그 게시물이 마음에 드셨나요? 지금 공유해 주세요.