Kafka のコストを 25% 以上削減 | Confluent コスト削減チャレンジに参加
The world runs in real-time, generating massive amounts of data as events. Every action, issue, or update produces new data – the swipe of an app, a package delivery, a stock trade, or plane departure. With the power of real-time, complex event processing (CEP) come real-time benefits. Learn how CEP works, benefits, use cases, and modern technologies to help you start leveraging event streaming.
Similar to event stream processing, complex event processing (CEP) is a technology for aggregating, processing, and analyzing massive streams of data in order to gain real-time insights from events as they occur.
Today, companies are flooded with facts that they don’t know how to use. CEP separates the chaff from the grain by transforming low-level data into high-level business information that companies care about. In this way, CEP enables companies to take charge of external events as they happen in real-time.
CEP is a toolkit for extracting meaningful information from data streams. Often two streams describe the same external reality in two different ways. First, we hear the sound of thunder. Then we feel a drop of water falling on our cheek. We combine these two events with our knowledge of local weather to conclude that it is starting to rain.
CEP works the same way. CEP applies domain knowledge across multiple sources of data to understand what is happening in terms of high-level concepts and complex events. In our example, we combined the sound of thunder with the feelings on our skin to infer the concept of rain. CEP is designed to infer such complex events from raw data using business patterns and concepts. The aim is to identify meaningful facts that can be used to make informed decisions.
Complex event processing is a generalization of traditional stream processing. Traditional stream processing is concerned with finding low-level patterns in data, such as the number of mouse clicks within a fifteen-minute window. CEP promises much more. Using models of causality and conceptual hierarchies, CEP can make high-level inferences about complex events within the business domain.
One of the distinguishing features of CEP is the use of conceptual hierarchies. We all know that events in the real world come at different levels of abstraction. At one level, we can talk about a customer’s intentions and feelings. At a lower level, we can talk about her GPS trail or the actions of her mouse.
When we are dealing with two or more levels at once, we cannot expect to process a simple time-ordered stream of events. Instead, we must be able to consume “an event cloud". The difference between an event cloud and an event stream is that the event cloud contains data from many streams at different levels of abstraction. The CEP toolkit can identify complex patterns in such a multi-level system.
Another distinguishing feature of CEP is the use of causal relationships. Suppose we are searching for a business pattern in which a certain combination of GPS movements and mouse clicks is expected to cause a business event such as a purchase or a cancelation. With CEP, we can choose to flag this combination as a complex event. However, we must not assume that the data will always arrive in the correct time sequence. When we perform the search for the complex event, it may happen that the user data arrives earlier than the GPS data. In this case, we must be able to remember the first part of the complex event while continuing to search for the remaining parts.
The key benefit of CEP is that actions can be triggered by a combination of events happening at different times and in different contexts.
CEP gives companies the ability to synthesize meaningful business information out of raw data and domain knowledge. CEP has the ability to organize information into high-level concepts by considering different time-frames, contexts, and causal relationships within the data. Using CEP, companies can respond to business opportunities and threats in a consistent and rule-based way. As real-time data becomes more and more abundant, CEP is a key component of event-driven architectures.
Fraud Prevention and Detection: Banks can use CEP to inspect and identify fraudulent transactions by tracking real-time events against various patterns. A login from a new device can be combined with a password change and other account activity to create a complex event that flags the possibility of fraud. Multiple fraud alerts can be combined into a higher-level event that identifies a system-wide breach.
Real-Time Marketing: E-commerce retailers can use CEP to offer personalized recommendations based on a combination of GPS data, social network activity, holidays, and previous shopping habits. The ability to combine different data sources along with historical data is one of the key strengths of CEP.
Predictive Analytics: By combining events generated by pharmacy sales, social networking sites, twitter, and GPS streams, we can predict the emergence of new coronavirus clusters. Almost all forms of predictions rely on finding complex patterns in massive amounts of data from numerous sources, so CEP is a natural part of the predictive analytics landscape.
Hardware design: CEP was originally invented to design computer chips, allowing engineers to make sense of low-level events happening in the physical hardware in terms of the register-level design and the instruction set of the chip.
IoT: By combining information across various sources, CEP has a transformative effect by collecting IoT sensor streams for real-time monitoring, analytics, and troubleshooting. For example, by combining distributed data from lighting, alarms, and other devices with real-time weather, date, and time, a smart building can predict the behavior of its occupants and optimize the use of lights and heating while providing automated services to occupants. Such a system can also identify guests or intruders, and take appropriate action.
While event stream processing and complex event processing are often used interchangeably, they are not entirely synonymous. Traditional event streaming applications deal with a single stream of data arriving in the correct time order. An example would be algorithmic trading, where a simple ESP application could analyze a stream of pricing data and decide whether to buy or sell a stock. ESP applications do not normally include event causality or event hierarchies. This is precisely why CEP was invented. In effect, CEP is a more sophisticated version of ESP.
Used by over 30% of Fortune 500 companies, Apache Kafka is a scalable, fault-tolerant, low latency open-source event streaming platform that can process millions of events per second, and has numerous uses cases including distributed streaming, building event-driven applications, big data ingestion pipelines, and pub/sub messaging.
At the basic level, Kafka provides an excellent infrastructure for CEP applications because of its high-performance messaging pipeline and immutable commit log. Kafka also offers the Streams API and the KSQL interface for establishing a simple connection to any data store, capturing, and manipulating data streams in real-time for numerous use cases.
Apache Kafka’s distributed architecture is especially well-suited for any complex event processing needs by constructing analytical engines for data streams. In Kafka, all messages are published directly to the distributed event log. There’s no need for centralized control. Different CEP services can subscribe to the same data streams without blocking each other.
Most importantly, each CEP service can publish its own output as a stream of complex events on the same log. This means that a hierarchy of CEP layers can grow naturally within the Kafka ecosystem, each publishing its output of complex events for other services to consume. Apache Kafka’s open architecture is the deep reason that makes Kafka and CEP a winning combination.
Every enterprise application creates data, whether it’s log messages, metrics, user activity, outgoing messages, or something else. How to utilize data at scale is just as important as the data itself. In this complete Guide to Stream Processing, learn how real-time streaming technology works, its numerous benefits and use cases, architecture fundamentals, and how to get started.