Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
It has only been two weeks since the inaugural Kafka Summit London, and I’m sure many of you who attended are still internalizing what you learned there. But what about the many tens of thousands in the Apache Kafka® Community who didn’t attend? Well, I have good news for you: session videos are available!
But first, let’s pause to remember a few stats that made this Summit so unforgettable.
Now then. About the videos. A complete playlist is available here, including sessions and keynotes both. Go there and check it out!
If you want to get started on Summit videos, here’s Neha’s keynote entitled The Present and Future of the Streaming Platform:
That keynote defines a five-step adoption journey we observe organizations going through as they build streaming systems. (It’s interesting to follow along in the talk and ask yourself where you fit in.) First, you become aware of streaming as an architectural paradigm and build a pilot system. After that pilot succeeds, streaming goes live with the first production system. Third, the small production system spreads to an mission-critical application. Fourth, as the organization gains competence with the new platform and comes to depend more and more on the advantages conferred by a streaming platform, that mission-critical application is expanded to a global use case, where streaming data is produced and consumed across geographies. Finally, the organization learns that a streaming platform is the proper core on which to layer all of its applications and services, and the “central nervous system” metaphor of comprehensive streaming adoption finally becomes true. Streaming adoption, having begun with an idea and a pilot project, has finally transformed a business and its entire information architecture.
Data, as the cliche tells us, is the new oil. Now, Stefan Bauer of Audi would know a thing or two about oil, and he disagrees with that analogy: he says data is the new DNA. His talk describes the back-end systems Audi is creating to capture and analyze data generated by conventional connected cars, and the 4TB/day of data generated by the present prototypes of self-driving cars. One must remind oneself that cutting-edge automotive manufacturers like this still do, in fact, manufacture automobiles, because the real growth edge in the business is no longer metal, glass, and rubber, but data. And because their products are a transportation medium—that is, they are fundamentally dynamic things—that data must be modeled as streaming data. Kafka is at the heart of the whole endeavor.
Piyush Vijay and Charly Molter of Apple shared their organization’s significant experience running Kafka at scale. They have centralized Kafka operations across the company, and offer Kafka as a service to groups within Apple. They are able to centralize developer tooling and security configuration, both of which are essential at scale. Like many large enterprises, they have a thin wrapper layer around the standard Kafka library which enables them to offer their not-inconsiderable set of standard Kafka features to internal developers, without requiring those developers to learn too much about Kafka administration or architecture. Their solution to end-to-end encryption in topic data was particularly interesting, but for the details, you’ll have to watch the whole thing.
A recurring theme of the ops-oriented talks at the Summit was that operating Kafka is not easy, and I suppose we at Confluent have long agreed with this. But there’s still no substitute for listening to seasoned operators like Gwen Shapira and Xavier Léauté tell us how to monitor Kafka like pros. They gave authoritative advice on how to bread-and-butter monitoring tasks like rolling broker upgrades, client upgrades, and slow replication. There is so much practical advice in this talk about what to monitor (server logs, JVM GC log, heap dumps, JMX metrics) and how to monitor it (gceasy.io, Flame Graph, Confluent Control Center) that all you can really go do is go watch the talk. Bonus question: how many 0.9 Consumer clients did it take to take down a 1.0 cluster, and what was the mechanism? Report back once you know the answer.
It should come as no surprise that streaming data would be the theme of the Kakfa Summit, but we should not let this observation pass us by unremarked. Kafka in its origins was a front end to a large, static block store: people used it to ingest big event streams into Hadoop. One does not need to keep one’s ear to close to the ground in today’s Kafka Community—and indeed, in the software and data architecture communities at large—to see that streaming is now the architectural paradigm dominating new designs. Not data ingestion to a static analytics platform, but a true substrate on which a dynamic, evolvable, real-time enterprise can be built.
In other words, there’s some really important content in here that is not just helpful to individual Kafka developers and admins, but helpful to drive the Kafka community forward. This is not a platform that is standing still, solving the problems of five years ago. It is a rapidly evolving platform focused on solving the problems you’re about to have—skating, as is so widely quoted as to risk cliché—to where the hockey puck is going to be. To keep up, we all need to contribute, listen, and amplify what the community is saying. So please share these videos in your own networks. They’re helpful to each of us as individuals, and helpful to us as a community as we continue to grow Kafka as the definitive streaming platform solution.
And join us, if you can, in San Francisco this October.
If you’d like to know more, here are some resources for you:
Join Confluent at AWS re:Invent 2024 to learn how to stream, connect, process, and govern data, unlocking its full potential. Explore innovations like GenAI use cases, Apache Iceberg, and seamless integration with AWS services. Visit our booth for demos, sessions, and more.