[Demo] How to Build Streaming Agents with Flink, Claude LLM, & Anthropic’s MCP | Register Now
At Confluent, we've been talking to customers about Streaming Agents—intelligent systems that work in the background responding to system events instead of to human chat inputs. The key question has been: What triggers these events? The signals have to be of high quality for an agent to act on since feeding raw events to an agent can be wasteful and overwhelming.
A powerful pattern has emerged: agentic investigation and remediation of anomalies. ”Anomalies” here refer to the general sense of outliers that can be found in all kinds of event streams, from customer orders and inventory data to financial transactions and Internet of Things (IoT) sensor data.
Typically, customers face a choice between setting up reliable but non-intelligent automation or having humans dig through anomalous events, gather data from multiple sources, and figure out what to do. Both approaches have significant limitations. The problem is twofold:
Building an agentic system that works at scale
Finding anomalies as early as possible so that action can be taken sooner
Streaming Agents have you covered for solving the first problem and building scalable, event-driven agents. For the second problem, the old way is to train a model, deploy it, and embed it in your streams. What if this were an out-of-the-box solution?
Confluent provides two machine learning (ML) functions—ml_forecast() and ml_detect_anomalies()—that run statistical models under the hood, offering continuous training and continuous forecasting capabilities. They enable developers to extract more value from data streams without requiring ML expertise.
Built on forecasting, anomaly detection identifies unexpected deviations in real time, helping improve data quality, detect operational issues instantly, and enable faster decision-making. The traditional approach aggregates data into a database first and then runs anomaly detection jobs against historical data. So by the time you discover an issue, it's already impacted your business. Confluent's continuous anomaly detection monitors data as it arrives in the stream, catching problems while the data is still in motion.
Confluent Cloud for Apache Flink® integrates anomaly detection capabilities on your streams by using the ML_DETECT_ANOMALIES function. This function leverages a popular ML algorithm, autoregressive integrated moving average (ARIMA), that is optimized for real-time performance and enables accurate and reliable anomaly detection.
ARIMA is a powerful statistical technique used for time-series analysis and forecasting. It combines three key components to model and predict future values based on past data:
Autoregressive (AR): Uses past values of the time series to predict future values, assuming current observations are influenced by previous ones
Integrated (I): Refers to differencing raw observations to make the time series stationary, ensuring consistent statistical properties over time
Moving Average (MA): Incorporates the relationship between observations and residual errors from previous observations
The model is typically represented as ARIMA(p,q,d), where p is the number of autoregressive terms, q is the order of the moving average, and d is the degree of differencing. The beauty of Confluent's implementation is that it uses auto-ARIMA by default, automatically determining the best parameters for your specific data patterns.
The ML_DETECT_ANOMALIES function offers several powerful configuration options.
Seasonal Pattern Detection: Enable STL—Seasonal-Trend decomposition using LOESS (locally estimated scatterplot smoothing)—to separate your time series into trend, seasonal, and remainder components. This helps improve forecasting accuracy by enabling the ARIMA model to better analyze and predict the underlying patterns in your data.
Adaptive Training Windows: Configure minTrainingSize (default: 128 data points) and maxTrainingSize (default: 512 data points) to balance model accuracy with computational efficiency. The model continuously retrains as new data arrives, with updateInterval controlling how frequently this happens.
Confidence-Based Detection: Set your desired confidence level for computing anomaly bounds with the confidencePercentage parameter (default: 99.0%). The function returns TRUE if the actual value falls below the lower_bound or above the upper_bound of the forecasted value; otherwise FALSE.
This built-in anomaly detection opens up powerful patterns across industries:
Detect unusual transaction patterns in real-time payment streams. Instead of waiting for end-of-day batch processing, identify and prevent potential fraud attempts as they happen, triggering immediate investigation workflows.
Monitor sudden spikes or drops in order volumes, inventory levels, or customer behavior patterns. Automatically alert merchandising teams when product demand deviates significantly from forecasts.
Track sensor data from manufacturing equipment, detecting when temperature, pressure, or vibration readings indicate potential equipment failure before it causes downtime.
Monitor API request volumes, error rates, and system performance metrics in real time. Detect when usage patterns indicate potential system issues or unusual user behavior.
Track shipment volumes, delivery times, and inventory movements to identify disruptions or inefficiencies as they occur rather than discovering them in weekly reports.
Your data must include a timestamp column and a target column representing some quantity of interest at each timestamp. Here's a simple example:
The function returns a rich set of information, including the forecasted value, confidence bounds, root mean squared error (RMSE) and Akaike information criterion (AIC) metrics for model quality assessment, and, most importantly, the is_anomaly boolean flag that triggers your downstream actions. Visit the docs page for more information.
Traditional monitoring relies on static thresholds: Alert when CPU usage exceeds 80% or when error rates go above 5%. But business metrics rarely follow such predictable patterns. Customer behavior changes seasonally, usage patterns evolve, and what's normal for Black Friday looks very different from a typical Tuesday in February.
Statistical anomaly detection adapts to these natural patterns, learning what's normal for your specific data and alerting only when something truly unexpected occurs. This dramatically reduces false positives while catching the subtle anomalies that static thresholds miss.
Anomaly detection in Confluent’s fully managed Flink service is billed in Compute Flink Units (CFUs) as part of your compute pool usage, making it cost-effective to deploy across multiple streams and use cases.
The combination of built-in anomaly detection and Streaming Agents represents a fundamental shift from reactive to proactive data operations. Instead of discovering issues in reports and dashboards, your systems can detect, investigate, and respond to anomalies automatically.
What makes this particularly exciting is how it enables the event-driven agent pattern that was mentioned at the beginning of this post. Instead of sifting through noisy raw events, agents can now respond to high-quality anomaly signals. Picture an agent that:
Receives an anomaly detection event from your ecommerce platform showing unusual order volumes
Investigates by querying inventory levels, supplier delivery schedules, marketing campaign data, and historical seasonal patterns
Decides what this represents—a supply chain disruption, a viral product moment, or a data quality issue
Acts automatically, adjusting inventory forecasts, triggering emergency supplier orders, alerting merchandising teams, or scaling up fulfillment capacity
Currently, our ARIMA-based anomaly detection works on single variables, analyzing one metric at a time. But we're not stopping there. We're planning to add more robust algorithms that work on multi-variate data, enabling you to detect complex anomalies that emerge from the relationships between multiple metrics simultaneously. Imagine detecting fraud patterns that only become apparent when you analyze transaction amounts, frequencies, geographic locations, and user behavior patterns together in real time.
Ready to detect the unexpected? The ML_DETECT_ANOMALIES function is available now in Confluent Cloud for Apache Flink®, requiring no additional setup or model training. Your streams just got a lot smarter.
Apache®, Apache Kafka®, Apache Flink®, Flink®, and the Flink logo are trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.
Explore new Streaming Agents features — Agent Definition, Observability & Debugging, and access to Real-Time Context Engine — to build intelligent, context-aware AI on Confluent.
Build event-driven agents on Apache Flink® with Streaming Agents on Confluent Cloud—fresh context, MCP tool calling, real-time embeddings, and enterprise governance.