Ahorra un 25 % (o incluso más) en tus costes de Kafka | Acepta el reto del ahorro con Kafka de Confluent
The evolution of artificial intelligence (AI) in the enterprise has reached an inflection point. While the early days of generative AI focused on chatbots responding to human prompts, today's enterprise AI agents are fundamentally different—they're event-driven, autonomous systems that continuously process streams of business data, make real-time decisions, and take actions at scale.
After speaking with dozens of enterprise customers about their AI agent implementations, one question keeps coming up: "Why do we need stream processing for AI agents?" The answer lies in understanding what modern enterprise agents actually do and why traditional approaches fall short.
The journey of agentic AI has been remarkable. It started with large language models (LLMs) answering general knowledge questions, but this had limited enterprise value due to the lack of private, domain-specific data. The emergence of retrieval-augmented generation (RAG) patterns made it possible to augment LLMs with fresh contextual data using streaming technologies, typically through chatbot interfaces.
Now, in the era of agentic AI, we're seeing a fundamental shift. LLMs can enter "thinking loops," use tools, and tackle complex tasks like code generation. But more importantly, a new wave of enterprise agents that operate very differently from their chatbot predecessors has emerged.
These modern enterprise agents:
Respond to system-generated events rather than human chat instructions
Run continuously in the background without human intervention
Solve well-defined problems at massive scale
Process streams of business events in real time
There are increasingly high-volume use cases that require joining multiple inputs. Examples include patient intake processing that responds to electronic health record updates, product review analysis that processes continuous streams of customer feedback, and observability for power plants—going beyond anomaly detection and alerts to interpretation, triaging, and solution.
Most enterprise agents follow a remarkably consistent pattern that aligns perfectly with stream processing paradigms:
Continuous Event Processing: Enterprise workflows aren't synchronous or prompt-based. They're asynchronous, stateful, and continuous. Agents need to consume and respond to continuous streams of system-generated events, from transaction records to sensor telemetry to customer interactions.
Fresh, Contextual Data: Agents can't do anything useful without the right data. Whether detecting fraud, generating a recommendation, or planning a response, agents need a problem-specific view of live, accurate, and relevant context. Apache Flink® and streaming storages such as Apache Kafka® together form the ideal substrate to capture, process, and retain that data in motion. This enables agents to access timely context on demand, at the moment a decision needs to be made, without relying on stale snapshots or brittle polling mechanisms.
Scalable Operations With Fault Tolerance: Production agents must handle high-throughput scenarios with strong consistency guarantees. They need to process thousands of events per second while maintaining exactly-once semantics and recovering gracefully from failures.
Rich System Integration: Modern agents must connect to numerous enterprise systems to gather context and take action. They need extensive connector ecosystems to integrate seamlessly with existing infrastructure.
Replayability for Iteration and Safety: Event-driven systems enable replay of input data. This allows agents to be developed and evaluated using real data without invoking live side effects. It supports local testing, dark launches, A/B testing, and faster iteration.
Powerful Data Transformation: Before agents can make intelligent decisions, they often need to clean, enrich, and transform incoming data streams. This requires declarative APIs for writing complex transformations at scale.
Apache Flink addresses these needs natively. Its high-performance, low-latency runtime for continuous processing, extensive connector ecosystem, and declarative APIs make it the ideal foundation for enterprise AI agents. But more importantly, Flink enables us to think about agents as event-driven microservices.
Microservices architecture evolved from tightly coupled, request-response communication to event-driven design. This pattern has proven successful for scalable software architecture over decades. At their core, agents are microservices with a brain, functioning as independent units that execute specific tasks. Event-driven architecture allows agents to communicate asynchronously and collaborate without rigid dependencies—moving beyond static workflows to adaptive, scalable, and resilient multi-agent systems.
With stream processing, agents can tap into real-time, contextualized data for reasoning and optimal decision-making.
While Flink provides an excellent foundation, there are specific gaps when it comes to building AI agents. That's why we're announcing Flink Agents, a new Flink sub-project in FLIP-531 that's a collaborative effort between engineering teams from Confluent and Alibaba.
Flink Agents are built, tested, and running within Flink’s event-driven runtime. They address four critical gaps in the current ecosystem:
We're evolving Flink's language and existing APIs to include first-class agent semantics. This means developers can define agents using familiar Flink constructs while accessing powerful AI capabilities like model inference, tool invocation, and contextual search.
Unlike traditional data processing pipelines that follow sequential flows, agents require loops, conditional branching, and dynamic paths based on different inputs. Flink Agents introduce support for these dynamic topologies, enabling agents to implement complex reasoning patterns like ReAct (reasoning and acting) workflows.
While Flink's current observability focuses on data processing operators, agents need visibility into their decision-making processes. Flink Agents add observability for agent state, tool invocations, model inference calls, and decision traces—critical for debugging and optimizing agent behavior in production.
MCP has rapidly become the universal language for AI tool calling. Flink Agents provide native support for invoking tools via MCP, enabling agents to seamlessly integrate with the growing ecosystem of MCP-compatible tools and services.
The Developer Experience: Familiar Yet Powerful
One of our core principles is that "every engineer is an AI engineer." Rather than requiring specialized AI expertise, Flink Agents extend familiar Flink APIs that Java and Python developers already know.
Here's a glimpse of what building an agent looks like with Flink's Table API:
The beauty of this approach is that it integrates seamlessly with existing Flink data processing. You can perform complex stream joins, aggregations, and windowing operations alongside agent inference—all within the same runtime with end-to-end consistency guarantees.
The fundamental challenge with AI agents isn't model quality; it's infrastructure. Agents need access to live data, robust toolchains, and integration with multiple systems. They must operate continuously, share outputs asynchronously, and handle failures gracefully.
Most existing approaches require stitching together disparate systems: separate runtimes for stream processing, model inference, and orchestration. This creates operational complexity, limited visibility, and slow iteration cycles.
Flink Agents solve this by treating agents as first-class citizens in the stream processing runtime. This means:
Unified Infrastructure: One runtime for data processing and agent execution
End-to-End Consistency: Flink's checkpointing ensures consistency across data transformations and agent decisions
Built-in Fault Tolerance: Agents inherit Flink's exactly-once processing guarantees
Seamless Integration: Natural connection between streaming data and agent reasoning
Replayability: Event-driven architecture enables replay for testing, debugging, and compliance
We're taking a pragmatic approach to Flink Agents by focusing on delivering core capabilities that address real enterprise needs.
Our immediate focus is on the foundational elements that make event-driven agents possible. This includes robust model inference capabilities, seamless tool invocation through MCP, contextual search integration, and proper life cycle management. We want to get these core building blocks right before expanding into more advanced features.
While single agents can solve many problems, the most interesting enterprise use cases often involve multiple specialized agents working together. Enabling multi-agent scenarios is key, and Kafka's event streaming capabilities naturally allow for reliable, asynchronous coordination between agents.
Beyond the core functionality, we're investing heavily in making Flink Agents production-ready. This means comprehensive observability tools that give operators visibility into agent decision-making, debugging capabilities that work with event replay, and integration patterns that fit naturally into existing enterprise architectures.
We're also committed to maintaining the open source nature of this project. All development happens in the open, and we actively encourage contributions from the broader Flink community. The goal is to build something that serves the entire ecosystem, not just the companies involved in the initial development.
The shift toward event-driven agents represents a fundamental change in how we build autonomous AI systems. By bringing agents natively into stream processing, we're not just adding AI capabilities to Flink. We're enabling a new class of applications that can operate continuously, at scale, with the reliability that enterprises demand.
Ready to get started? Check out the Flink Agents proposal and join our community discussions to help shape the future of event-driven AI agents.
Apache®, Apache Kafka®, Apache Flink®, Flink®, and the Flink logo are trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.
Just as some problems are too big for one person to solve, some tasks are too complex for a single artificial intelligence (AI) agent to handle. Instead, the best approach is to decompose problems into smaller, specialized units so that multiple agents can work together as a team.
By combining Google A2A’s structured protocol with Kafka’s powerful event streaming capabilities, we can shift from brittle, point-to-point integrations to a dynamic ecosystem where agents publish insights, subscribe to context, and coordinate in real time. Observability, auditability, and...