Tom Quiggle, Sr. Staff Software Engineer, LinkedIn
The initial deployment of Espresso relies on MySQL’s built-in mechanism for Master-Slave replication. Storage hosts running MySQL masters service HTTP requests to store and retrieve documents, while hosts running slave replicas remain mostly idle. Since replication is at the MySQL instance level, masters and slaves must contain the exact same partitions – precluding flexible and dynamic partition placement and migration within the cluster.
Espresso is migrating to a new deployment topology where each Storage Node may host a combination of master and slave partitions; thus distributing the application requests equally across all available hardware resources. This topology requires per-partition replication between master and slave nodes. Kafka will be used as the transport for replication between partitions.
For use as the replication stream for the source-of-truth data store for LinkedIn’s most valuable data, Kafka must be as reliable as MySQL replication. The session will cover Kafka configuration options to ensure highly reliable, in-order message delivery. Additionally, the application logic maintains state both within the Kafka event stream and externally to detect message re-delivery, out of order delivery, and messages inserted out-of-band. These application protocols to guarantee high fidelity will be discussed.