Live Demo: Build Scalable Event-Driven Microservices with Confluent | Register Now

Presentation

Isolating Streaming Ingest and Queries Using RocksDB

« Current 2023

In a real-time analytics architecture, streaming data ingestion, from a source like Kafka, and query serving run on the same compute unit, so that queries can reflect newly ingested data. These two distinct competing functions invariably contend for the available compute resources, which makes it difficult to handle situations where there are unexpected bursts of either streaming ingestion or queries that can slow down the system. We will examine common approaches to the problem of compute contention, such as scaling, replication, and querying from shared storage, and discuss their tradeoffs and how they remain incomplete solutions.

In this talk, we will present a real-time analytics architecture we implemented in the Rockset database, based on RocksDB, that effectively isolates streaming data ingestion from query serving. RocksDB is a popular log-structured merge-tree storage engine that writes to an in-memory memtable and periodically flushes to disk.

Core to our architecture is the separation of compute and storage. This allows multiple RocksDB instances to query from the same shared storage. We use cloud object storage to ensure durability and use SSD as a shared hot storage tier for low-latency reads. On the compute side, we designed our query processing engine to be completely separate from all the modules that perform data ingestion.

For fresh data to be available to multiple compute units, it is essential that the in-memory state of the ingester's RocksDB memtable be replicated to other RocksDB instances. We built a RocksDB memtable replicator that propagates changes to remote instances in single-digit milliseconds. This architecture enables compute isolation so that real-time streaming ingestion does not interfere with queries, while still allowing the most recent data to be queried.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how