Freight Clusters: Up to 90% savings at GBps+ scale | Learn more

Presentation

Go Big or Go Home: Approaching Kafka Replication at Scale

« Current 2023

Processing a lot of data with Kafka means knowing how and when to scale horizontally and vertically. When you’ve exhausted the boundaries of scaling inside a single cluster, replication becomes critical but sometimes standard replication is not enough.

New Relic once earned the dubious title of “World’s Largest Kafka Cluster”, and in our journey to break this cluster into dozens of smaller clusters, we needed to route events between clusters and topics based on headers.

At the time, this meant we had to do it ourselves. Starting out, our goal was fan out (one-to-many) replication. Since then our needs have expanded to include many-to-one and many-to-many replication.

In this talk we'll discuss what bottlenecks we have hit as we scaled out, and what measures we took to remove them, such as:

  • Replicating data based on Kafka Headers
  • Connecting to many source and destination Kafka clusters
  • Managing the replication of Kafka topics of varying traffic
  • The use of an intermediary Kafka cluster

At the end of this talk you will understand how we have scaled replication and routing to support New Relic's ever growing data ingestion, and all the mitigations it took to get us there.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how