In a typical deployment of Kafka with many topics and partitions, scaling the Kafka consumer efficiently is one of the important tasks in maintaining overall smooth Kafka operations. The traditional Kubernetes Horizontal Pod Scaling (HPA) that uses basic CPU and/or memory metrics is not suitable for scaling Kafka consumers. The more appropriate workload metric for Kafka consumer is the number of messages in Kafka broker queue. More specifically, the message production rate of a specific topic would be the right workload metric for a Kafka consumer.
While using message production rate is a better way to decide the number of consumer replicas, this is still a reaction based auto-scaling. Using machine-learning based forecasting, it is possible for predict the upcoming increase or decrease of message production rate. With predicted workload, scaling the Kafka consumers could be achieved in a more timely manner, resulting with better performance KPI's.