Kafka Streams applications can process fast-moving, unbounded streams of data. This gives us the capability to process and react to events from many sources in near real time as they converge in Kafka. However, if the events in these data streams have a spatial component and their spatial relationships with each other determine how they should be processed or reacted to, this raises some fundamental challenges. Determining that, for example, a person is within an area or that routes are intersecting requires access to geospatial operations which are not readily available in Kafka Streams.
In this talk, we will first set the scene with a geospatial 101. Then, using a simplified taxi hailing use case, we will look at two approaches for processing spatial data with Kafka Streams. The first approach is a naive approach which uses Kafka Streams DSL, geohashing and the Java Spatial4j library. The second approach is a prototype which replaces the RocksDB statestore with Apache Lucene (an embedded storage engine with powerful indexing, search and geospatial capabilities), and implements a stateful spatial join with the Transformer API.
This talk will give you an appreciation of geospatial use cases and how Kafka Streams could enable them. You will see the role the state store plays in stateful processing and the implications for geospatial processing. It will also show you what is involved in integrating a custom state store with Kafka Streams. Overall, this talk will give you an understanding of how you might go about building custom processing capabilities on top of Kafka Streams for your own use cases.