[Demo] How to Build Streaming Agents with Flink, Claude LLM, & Anthropic’s MCP | Register Now

Detecting the Unexpected: Built-in Real-Time Anomaly Detection With Confluent Cloud for Apache Flink®

作成者 :

At Confluent, we've been talking to customers about Streaming Agents—intelligent systems that work in the background responding to system events instead of to human chat inputs. The key question has been: What triggers these events? The signals have to be of high quality for an agent to act on since feeding raw events to an agent can be wasteful and overwhelming.

A powerful pattern has emerged: agentic investigation and remediation of anomalies. ”Anomalies” here refer to the general sense of outliers that can be found in all kinds of event streams, from customer orders and inventory data to financial transactions and Internet of Things (IoT) sensor data.

Typically, customers face a choice between setting up reliable but non-intelligent automation or having humans dig through anomalous events, gather data from multiple sources, and figure out what to do. Both approaches have significant limitations. The problem is twofold:

  1. Building an agentic system that works at scale

  2. Finding anomalies as early as possible so that action can be taken sooner

Streaming Agents have you covered for solving the first problem and building scalable, event-driven agents. For the second problem, the old way is to train a model, deploy it, and embed it in your streams. What if this were an out-of-the-box solution?

How Built-In Anomaly Detection in Flink Powers Agentic Investigation

Confluent provides two machine learning (ML) functions—ml_forecast() and ml_detect_anomalies()—that run statistical models under the hood, offering continuous training and continuous forecasting capabilities. They enable developers to extract more value from data streams without requiring ML expertise.

Built on forecasting, anomaly detection identifies unexpected deviations in real time, helping improve data quality, detect operational issues instantly, and enable faster decision-making. The traditional approach aggregates data into a database first and then runs anomaly detection jobs against historical data. So by the time you discover an issue, it's already impacted your business. Confluent's continuous anomaly detection monitors data as it arrives in the stream, catching problems while the data is still in motion.

Anomaly detection pattern with event-driven Streaming Agents

Under the Hood – Confluent’s Anomaly Detection Capabilities

Confluent Cloud for Apache Flink® integrates anomaly detection capabilities on your streams by using the ML_DETECT_ANOMALIES function. This function leverages a popular ML algorithm, autoregressive integrated moving average (ARIMA), that is optimized for real-time performance and enables accurate and reliable anomaly detection.

ARIMA: The Statistical Engine

ARIMA is a powerful statistical technique used for time-series analysis and forecasting. It combines three key components to model and predict future values based on past data:

  • Autoregressive (AR): Uses past values of the time series to predict future values, assuming current observations are influenced by previous ones

  • Integrated (I): Refers to differencing raw observations to make the time series stationary, ensuring consistent statistical properties over time

  • Moving Average (MA): Incorporates the relationship between observations and residual errors from previous observations

The model is typically represented as ARIMA(p,q,d), where p is the number of autoregressive terms, q is the order of the moving average, and d is the degree of differencing. The beauty of Confluent's implementation is that it uses auto-ARIMA by default, automatically determining the best parameters for your specific data patterns.

Intelligent Configuration Options

The ML_DETECT_ANOMALIES function offers several powerful configuration options.

Seasonal Pattern Detection: Enable STL—Seasonal-Trend decomposition using LOESS (locally estimated scatterplot smoothing)—to separate your time series into trend, seasonal, and remainder components. This helps improve forecasting accuracy by enabling the ARIMA model to better analyze and predict the underlying patterns in your data.

Adaptive Training Windows: Configure minTrainingSize (default: 128 data points) and maxTrainingSize (default: 512 data points) to balance model accuracy with computational efficiency. The model continuously retrains as new data arrives, with updateInterval controlling how frequently this happens.

Confidence-Based Detection: Set your desired confidence level for computing anomaly bounds with the confidencePercentage parameter (default: 99.0%). The function returns TRUE if the actual value falls below the lower_bound or above the upper_bound of the forecasted value; otherwise FALSE.

Real-World Use Cases

This built-in anomaly detection opens up powerful patterns across industries:

Financial Services

Detect unusual transaction patterns in real-time payment streams. Instead of waiting for end-of-day batch processing, identify and prevent potential fraud attempts as they happen, triggering immediate investigation workflows.

Retail and eCommerce

Monitor sudden spikes or drops in order volumes, inventory levels, or customer behavior patterns. Automatically alert merchandising teams when product demand deviates significantly from forecasts.

IoT and Manufacturing

Track sensor data from manufacturing equipment, detecting when temperature, pressure, or vibration readings indicate potential equipment failure before it causes downtime.

Software-as-a-Service (SaaS) and Platform Operations

Monitor API request volumes, error rates, and system performance metrics in real time. Detect when usage patterns indicate potential system issues or unusual user behavior.

Supply Chain and Logistics

Track shipment volumes, delivery times, and inventory movements to identify disruptions or inefficiencies as they occur rather than discovering them in weekly reports.

Getting Started With Anomaly Detection

Your data must include a timestamp column and a target column representing some quantity of interest at each timestamp. Here's a simple example:

SELECT 
  ML_DETECT_ANOMALIES(
    total_orders, 
    order_timestamp, 
    JSON_OBJECT(
      'minTrainingSize' VALUE 50,
      'confidencePercentage' VALUE 95.0,
      'enableStl' VALUE true
    )
  ) OVER (
    ORDER BY order_timestamp 
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS anomaly_result
FROM order_stream;

The function returns a rich set of information, including the forecasted value, confidence bounds, root mean squared error (RMSE) and Akaike information criterion (AIC) metrics for model quality assessment, and, most importantly, the is_anomaly boolean flag that triggers your downstream actions. Visit the docs page for more information.

Beyond Traditional Monitoring

Traditional monitoring relies on static thresholds: Alert when CPU usage exceeds 80% or when error rates go above 5%. But business metrics rarely follow such predictable patterns. Customer behavior changes seasonally, usage patterns evolve, and what's normal for Black Friday looks very different from a typical Tuesday in February.

Statistical anomaly detection adapts to these natural patterns, learning what's normal for your specific data and alerting only when something truly unexpected occurs. This dramatically reduces false positives while catching the subtle anomalies that static thresholds miss.

Anomaly detection in Confluent’s fully managed Flink service is billed in Compute Flink Units (CFUs) as part of your compute pool usage, making it cost-effective to deploy across multiple streams and use cases.

The Agent-Ready Future

The combination of built-in anomaly detection and Streaming Agents represents a fundamental shift from reactive to proactive data operations. Instead of discovering issues in reports and dashboards, your systems can detect, investigate, and respond to anomalies automatically.

What makes this particularly exciting is how it enables the event-driven agent pattern that was mentioned at the beginning of this post. Instead of sifting through noisy raw events, agents can now respond to high-quality anomaly signals. Picture an agent that:

  • Receives an anomaly detection event from your ecommerce platform showing unusual order volumes

  • Investigates by querying inventory levels, supplier delivery schedules, marketing campaign data, and historical seasonal patterns

  • Decides what this represents—a supply chain disruption, a viral product moment, or a data quality issue

  • Acts automatically, adjusting inventory forecasts, triggering emergency supplier orders, alerting merchandising teams, or scaling up fulfillment capacity

Currently, our ARIMA-based anomaly detection works on single variables, analyzing one metric at a time. But we're not stopping there. We're planning to add more robust algorithms that work on multi-variate data, enabling you to detect complex anomalies that emerge from the relationships between multiple metrics simultaneously. Imagine detecting fraud patterns that only become apparent when you analyze transaction amounts, frequencies, geographic locations, and user behavior patterns together in real time.

Ready to detect the unexpected? The ML_DETECT_ANOMALIES function is available now in Confluent Cloud for Apache Flink®, requiring no additional setup or model training. Your streams just got a lot smarter.


Apache®, Apache Kafka®, Apache Flink®, Flink®, and the Flink logo are trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.

  • Mayank is a Product Manager for Stream Processing at Confluent. He holds extensive experience of building and launching enterprise software products, with stints in VMware, Amazon, and growth-stage startups Livspace and Bidgely.

    Mayank holds an MBA with a specialization in Artificial Intelligence from Northwestern University, and a Computer Science degree from BITS Pilani, India.

このブログ記事は気に入りましたか?今すぐ共有