"Robinhood uses Kafka in every line of its business, from stock and crypto trading to its self-clearing system and online data analytics. Robinhood’s fleet of microservices use Apache Kafka for building an event-driven architecture where services communicate with each other asynchronously. Producers and consumers to a kafka topic are almost always completely different teams, thus the schema of events in kafka is the only API for downstreams to rely on. We have seen over time that there can be multiple ways an event fails to be processed successfully by a downstream kafka consumer. The reasons range from being unable to deserialize, upstream code changes resulting in bad data, etc..
This talk discusses how we built libraries, templated micro services and tooling that leverages Postgres and Kafka for safely dealing with dead letters, inspecting and querying them, and republishing them to retry kafka topics for safe reprocessing at a later time. We also dive deeper into how this improved the operability and on-call health of all of our kafka application developers.