The following post is a guest blog by Paige Roberts, Product Manager, Syncsort. Paige spent 19 years in the data management industry in a wide variety of roles – programmer, analyst, trainer, technician, content creator, consultant, and evangelist. Now, she is the product manager for Syncsort’s big data software, keeping a close eye on market and technology trends in big data integration.
I talk to a lot of folks in the big data industry about the challenges they’re tackling with big data technology. I’m endlessly fascinated with how each company chooses a unique architecture based on their distinctive needs, and yet, there are always patterns in the chaos. One thing that has jumped out at me in the past few years is how much Kafka is becoming the backbone, the central spine, of the big data nervous system for companies in wildly unrelated industries with vastly different use cases.
Streaming data, little message bits flowing in a continuous series, is no longer some exotic anomaly. It’s the new way to deal with many kinds of data. The fundamental concept of Kafka and the Confluent platform is that every type of data is, or can be, a stream. Each message is like an individual nerve impulse that needs to flow through the system. Confluent and Kafka form the central nervous system that brings that information from every part of the business to the big data nerve centers where decisions are made.
The traditional data sets that make up the “long term memory” of most businesses are in old-fashioned databases and file systems in big at-rest piles. Data in traditional stores accessed with batch technologies hold the keys to context, that essential factor that’s needed to make sense of data streams.
Syncsort helps forge the connections between these two types of data, to make sense of each tiny message impulse in the larger context of historical data.
Syncsort and Confluent have been partnering to solve some pretty interesting and widely varied use cases where streaming and batch data processing mix and mingle. The surprising thing is, this nervous system of batch and streaming working together is an architecture that works wonders for such completely different cases in completely different industries.
As a simplified example, consider a fraud detection workflow. Your ATM transactions flow into a central data hub in real-time in a Kafka queue. Syncsort can pick up that data, link it with at-rest data, such as matching your account number with you, and where you live. If you live in California, and someone is making a transaction using your card in Texas, that could be flagged as a potentially fraudulent transaction. That flag triggers an SMS message that is sent to you seconds later, saying, “Someone in Texas is using your card. Is that a valid transaction?” If you’re not in Texas, fraud stopped before it can do any damage. If you’re in Texas for the week, your transaction flows smoothly, a good outcome either way.
Some other completely different examples include real-time reporting that enables a hotel chain to make better facility and inventory decisions with up-to-date data. Or a healthcare provider analyzing hospital data as it is generated, rather than after the fact, to improve patient outcomes, and save lives.
Modern data doesn’t just sit around in a musty old data warehouse feeding backwards-looking reports. It’s alive. It flows and reacts within the enterprise like the nervous system of a healthy body. Old batch systems are now connected and working in tandem with modern streaming data sources. Syncsort and Confluent working together can accomplish great things in virtually any industry.
Check out our joint webinar and see how you can put your data to work.