Go Big or Go Home: Approaching Kafka Replication at Scale

« Current 2023

Processing a lot of data with Kafka means knowing how and when to scale horizontally and vertically. When you’ve exhausted the boundaries of scaling inside a single cluster, replication becomes critical but sometimes standard replication is not enough.

New Relic once earned the dubious title of “World’s Largest Kafka Cluster”, and in our journey to break this cluster into dozens of smaller clusters, we needed to route events between clusters and topics based on headers.

At the time, this meant we had to do it ourselves. Starting out, our goal was fan out (one-to-many) replication. Since then our needs have expanded to include many-to-one and many-to-many replication.

In this talk we'll discuss what bottlenecks we have hit as we scaled out, and what measures we took to remove them, such as:

Replicating data based on Kafka Headers
Connecting to many source and destination Kafka clusters
Managing the replication of Kafka topics of varying traffic
The use of an intermediary Kafka cluster

At the end of this talk you will understand how we have scaled replication and routing to support New Relic's ever growing data ingestion, and all the mitigations it took to get us there.

Presenter

Julia Holgado

New Relic

Before entering the world of Streaming, Julia earned her Bachelor of Arts in Computer Science from Vassar College and her Bachelor of Engineering from Thayer School of Engineering at Dartmouth. She first joined New Relic as part of their Ignite Program and soon settled into one of the teams in New Relic's Streaming Data and Metrics platform, where she'd dive head-first into maintaining and developing high-throughput Kafka clients. Recently, she shifted her role to more on the platform side, joining the Kafka Platform team at the helm of New Relic's cloud migration.

Go Big or Go Home: Approaching Kafka Replication at Scale

Presenter

Julia Holgado

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how