Not long ago, I wrote about a growing problem in enterprise AI: agents that don’t talk to each other.
You’ve got a customer relationship management (CRM) agent doing its thing, a data warehouse agent crunching numbers, a knowledge bot quietly surfacing documents—but none of them are sharing what they know. Instead of a smart, connected ecosystem, we’re stuck with isolated pockets of intelligence: an island of agents.
Turns out, I wasn’t the only one thinking about this.
Google recently introduced Agent2Agent (A2A), an open protocol aimed at helping agents, regardless of who built them, work together. It gives agents a shared language to actually collaborate. Late last year, Anthropic launched the Model Context Protocol (MCP), which standardizes how agents use tools and access context.
Together, these standards tackle two sides of the problem: how agents think and how they talk. It’s a solid step toward fixing artificial intelligence (AI) silos.
But there’s still one thing missing: a communication layer built for scale.
Right now, both A2A and MCP are built on traditional web patterns—HTTP, JSON-RPC, and server-sent events (SSE)—which work well for simple point-to-point interactions. But as agent ecosystems grow more complex, so does the need for a shared, event-driven backbone. That’s where Apache Kafka® and event streaming come in.
In this post, we’ll dig into how A2A works, how it fits alongside MCP, and why Kafka, and event streaming more broadly, may be the key to turning a scattered network of agents into a truly connected AI ecosystem.
Before the internet became the seamless web of apps and services we rely on today, it was a mess of incompatible systems. Each one had its own way of serving content, its own interface, and its own assumptions. Then HTTP came along and changed everything.
HTTP gave us a universal way for clients and servers to communicate—a lingua franca that abstracted away the messy details and let any browser talk to any server, anywhere. It didn’t matter what operating system was used or how the backend was built. If it spoke HTTP, it could be part of the web.
A2A is trying to do the same thing, but for AI agents.
Right now, the agent ecosystem looks a lot like the pre-HTTP web: fragmented and siloed. Agents are built using different frameworks (e.g., LangGraph, CrewAI, Google’s Agent Development Kit [ADK], Microsoft AutoGen) and often can’t talk to each other. Each one is powerful, but they’re speaking different languages, locked inside their own vendor ecosystems.
A2A is Google’s proposed open protocol for fixing this. It provides a shared language for agents to:
Announce their capabilities (via an Agent Card)
Negotiate how to interact (through structured formats like text, forms, and files)
Collaborate on tasks (by sending and tracking requests across agent boundaries)
A2A builds on existing web standards, using HTTP, JSON-RPC, and Server-Sent Events, so that agents can plug into today’s infrastructure without needing entirely new plumbing. That makes adoption easier and integration more familiar for developers.
What A2A does for inter-agent communication, Anthropic’s MCP does for tool use. MCP provides a standard for how agents access external tools, APIs, and context. It helps agents reason and act based on external capabilities. Meanwhile, A2A focuses on the protocols that agents use to talk to each other, to coordinate, delegate, and collaborate.
Together, they form two halves of the same vision:
MCP empowers agents to take meaningful action
A2A enables them to work together while doing it
If successful, A2A could become the foundational fabric for a new kind of digital ecosystem, one where agents, like web servers before them, don’t need to know who built each other to collaborate. They just need to speak the same protocol.
In other words, if AI agents are the new apps, A2A is trying to be their HTTP—the connective tissue that makes a network of agents possible.
A2A takes a big step forward by giving agents a shared language. But the way agents communicate using that language still follows a familiar pattern: direct, point-to-point connections. One agent discovers another, sends it a task over HTTP, and maybe listens for updates via Server-Sent Events or a webhook.
This approach works fine when a handful of agents are working together. But in real enterprise environments, where agents span CRMs, data warehouses, security systems, and support platforms, point-to-point doesn’t scale.
Here’s why:
Too many connections: Every agent has to know how to talk to every other agent it might interact with. That creates an explosion of N2-N possible integrations for N agents. The more agents you add, the more brittle and complex the web becomes.
Tight coupling: Each agent needs to know the exact endpoint, format, and availability of its peers. If one goes down or changes, others break. This kills resilience and slows down development.
Limited visibility: Point-to-point communication is inherently private, with messages going directly from A to B. But in most real-world systems, you don’t want to just send a message to another agent. You also want to log it, store it in a data warehouse, monitor it for anomalies, trace how commands flow through the agent topology, or maybe even replay it later for debugging or compliance. Point-to-point makes all that much harder, often requiring bolt-on systems and duplicate effort.
Hard to orchestrate: Multi-agent workflows often require coordination across systems. With direct connections, you need a separate orchestrator or control plane to manage the flow, adding another layer of complexity.
In short, A2A gives agents the right words, but the current transport makes it hard to build a conversation at scale.
What’s missing is a shared, asynchronous backbone where agents can publish what they know and subscribe to what they need. A way to decouple producers and consumers of intelligence. A system where agents don’t need to know who will use their insights, just that they’ll be available when it matters.
This is where event-driven architecture comes in—and Kafka starts to look like a natural next step.
If you’ve worked in software over the last decade, this story might sound familiar.
We started with monoliths—big, tightly coupled applications where all functionality lived in a single codebase. They were simple to build at first, but quickly became bottlenecks. Every update risked breaking something, deployments were slow, and scaling required scaling everything.
To fix this, we moved to microservices—independent, focused services that handled specific responsibilities. They could be developed and deployed independently, scaled separately, and maintained by small, autonomous teams. But early microservice systems still leaned heavily on synchronous communication, REST, gRPC, and direct service-to-service calls. As systems grew, this approach brought back fragility: A slow or failing service could block others, and direct dependencies created complex webs of coordination.
The real breakthrough came with event-driven microservices.
Instead of calling each other directly, services started publishing events to a shared broker (such as Kafka), and others subscribed to the events they cared about. This shift decoupled services, improved fault tolerance, and made real-time responsiveness easier to achieve at scale. This shifted microservices from having a quadratic explosion of dependencies, NXM, into a linear number of dependencies, N+M, in a system with N producers and M consumers.
Now we’re seeing the same architectural shift begin again, this time with AI agents.
In the near-future world of enterprise AI, companies aren’t deploying just one agent. They’re deploying dozens—e.g., a sales agent, a data analysis agent, a support triage agent—each built using different frameworks or platforms, often with no clean way to talk to one another.
Google’s A2A protocol is an important step toward solving that. It gives agents a common language and structure to collaborate. And if you squint a little, A2A servers start to look a lot like microservices. Each one exposes specific capabilities. Each one handles requests, performs tasks, and returns results. And just like microservices, they need to communicate with other components in a scalable, reliable way.
But A2A’s default communication model is fundamentally point-to-point. It’s tightly coupled. And at enterprise scale, that becomes a problem in the same way it became a problem for microservices.
To build a real enterprise agent ecosystem, we need more than just a protocol. We need an architecture that supports:
Loose coupling between agents
Multiple consumers of the same output (e.g., other agents, logging systems, warehouses)
Durable communication that survives restarts and outages
Real-time flow of events across systems
In other words, we need to bring event-driven architecture to A2A, just like we did for microservices.
And Kafka is the natural foundation for that shift.
Instead of sending task requests directly over HTTP, an A2A client could publish the request as an event to a Kafka topic. The A2A server (i.e., the agent receiving the task) subscribes to that topic, processes the request, and publishes status updates and results to a reply topic.
Other systems, like monitoring tools, data warehouses, or even other agents, can also subscribe to these topics. This allows:
Multiple consumers to act on the same message (not just the target agent).
Decoupled communication, where agents don’t need to know each other’s endpoint or availability. This makes dynamic or even self-driven agent topologies far more viable.
Durable storage of agent interactions, enabling auditing, tracing, replay, and debugging.
Real-time orchestration, where downstream agents react instantly to upstream outputs.
There are a few ways this could be supported in practice:
Kafka as a Transport Layer: A2A messages (e.g., tasks/send, tasks/status) are wrapped and sent via Kafka topics instead of HTTP. This requires minimal changes to the message format but shifts the infrastructure underneath.
Kafka for Task Routing and Fan-Out: Keep direct A2A communication for core task execution, but publish all task submissions, updates, and artifacts to Kafka in parallel. Other systems can then react to these events in real time. Note: This setup isn’t ideal given that dual writes can create the usual consistency and atomicity challenges.
Hybrid Orchestration Pattern: Use Kafka to drive workflows involving multiple agents. An orchestrator listens for events (e.g., “lead-qualified”) and sends A2A tasks to downstream agents. This combines A2A’s structured interactions with Kafka’s scalable event backbone.
This isn’t about replacing A2A. It’s about extending it.
In modern enterprises, agent communication isn’t just about completing a task. It’s about being trackable, shareable, and composable across systems. Kafka extends A2A from point-to-point collaboration into a fully integrated, scalable agent ecosystem.
The Agent2Agent protocol is a major step toward making AI agents interoperable. It gives them a shared language and a standard way to collaborate, much like HTTP did for the early web. But language alone isn’t enough. For agents to operate at enterprise scale, they need a communication backbone that’s resilient, decoupled, and built for many-to-many collaboration.
That’s where event-driven architecture and Kafka come in.
By combining A2A’s structured protocol with Kafka’s powerful event streaming capabilities, we can shift from brittle, point-to-point integrations to a dynamic ecosystem where agents publish insights, subscribe to context, and coordinate in real time. Observability, auditability, and orchestration become native features, not afterthoughts.
This isn’t just a technical upgrade. It’s a shift in how we think about agent infrastructure.
If A2A is the shared language for agents, Kafka is the communication fabric that lets that language scale. Together, they give us the tools to move from scattered agents to a truly collaborative AI ecosystem.
Apache®, Apache Kafka®, Kafka®, Apache Flink®, and Flink® are registered trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.
An Apache Flink® job not producing results often indicates an issue with watermarks, which are necessary for handling out-of-order message processing in time-based aggregations from Kafka. Watermarks balance data loss and latency by defining how long to wait for late messages. Incorrectly setting...
Salesforce has Agentforce, Google launched Agentspace, and Snowflake recently announced Cortex Agents. But there’s a problem: They don’t talk to each other…