Building a real-time pipeline from scratch that is able to handle billion+ transactions per day, store, analyze and visualize it all in real-time has never been easier. In this build-as-we-go talk, we’ll create a front-to-back architecture that does exactly that.
* we’ll start with a simple producer emitting a few messages and publishing them onto a Kafka queue
* on consuming end of the queue a Spark-based Streamliner process will pick them up and store in MemSQL
* ZoomData will connect to MemSQL for real-time visualization where we’ll be able to ask various questions and see answers change as data is flowing through the system
* we’ll quickly make the entire pipeline more complex by increasing the amount of data as well as complexity of the data, until reaching 100K transactions per second
As we walk through this demo, we will touch on cross data-center Kafka and MemSQL set-ups, speed limitations if any as well as echo back to real-life use cases of a similar set-up used in Goldman’s Asset Management division for the purposes of Portfolio Management & Trading.
|Anton Gorshkov, Managing Director, Goldman Sachs|