[Live Workshop] Streams on Tour: Hands-On Deep Dive into Confluent | Register Now
Modern companies generate large volumes of data, but often the internal users who need that data to do their jobs—data engineers, managers, business analysts, and developers—can find it challenging to quickly figure out answers to their questions. Apache Kafka® is a powerhouse for real-time data processing of high-throughput workloads, and many organizations use Kafka to enable self-service access to data streams. But getting specific insights out of those streams often requires specialized knowledge.
Developers may choose to write queries using engines such as Flink SQL to transform and analyze data streaming. While powerful, Flink SQL isn't always intuitive, especially for quick explorations or for users less familiar with the syntax. But what if they could just use plain language to ask the Kafka topic for the data they need? Imagine that a manager of a retail business wants to know how many orders of a certain product were placed yesterday or how many orders were placed for each day. Or a developer wants to check if a particular message with a certain key is inside all the topics of a Kafka Streams pipeline. Being able to query the relevant Kafka topics with natural language would make this kind of information much more accessible to a variety of users.
Recently, I explored doing just that by leveraging Cursor, an artificial intelligence (AI) coding assistant, and Model Control Protocol (MCP), an open standard for integrating large language model (LLM) applications and data sources. This combination made it easy to interact with a Kafka topic hosted on Confluent Cloud. The same can be done with other AI chatbots like Claude desktop and Goose. See how it worked in this demo video:
Want to try it yourself? Get started with Confluent Cloud for free.
Let’s look at a typical scenario for retail use cases like real-time order management or demand forecasting. Orders come in a Kafka topic named “sample_data_orders,” continuously receiving order events. Within the Confluent Cloud user interface (UI), the message viewer looks like this:
Each event is a JSON message containing details such as orderid, itemid, ordertime, and an address object with fields such as city, state, and zip code.
Let’s say that I want to look at order volume for a specific customer, region, or time period. To do that, I need to identify records based on their field values. For this demo, I want to see orders headed to Missouri—so I need to find all the order records where the state field within the address object is equal to "State_94."
Using Flink SQL's capabilities, one way to achieve this is by writing a Flink SQL query:
This works perfectly, but only if the following is known:
The exact topic name (sample_data_orders)
The structure of the JSON message (i.e., that state is nested under address)
The correct Flink SQL syntax
This is where Cursor and MCP come into play. Think of MCP as a structured way for the AI agent within Cursor to discover and use available tools or APIs. The specific tools enabling interaction with Confluent Cloud, as shown in this example, are available in the Confluent MCP repository on GitHub.
Here’s the process I followed:
Establishing Context Awareness: First, I needed the AI to understand the structure of my data. Confluent Cloud provides an MCP tool for the AI to interact with the Schema Registry instance associated with my Kafka cluster. Asking to list all schemas allows the AI to fetch this structural information.
Making the Request: With the schema context established, I simply made my request in natural language directly within Cursor's chat interface: “list all orders whose state is State_94.”
Translating the Request to Flink SQL: With the MCP tool, Cursor's AI was able to understand my request and then correctly identify the target topic, the filtering condition, and the required query language (Flink SQL). It then generated the precise Flink SQL statement.
Executing the Flink SQL Statement via MCP: The AI didn't just generate the query; it also recognized that another MCP tool (also available in the Confluent MCP repository) was available to execute Flink SQL statements against my Confluent Cloud environment. It invoked this tool, passing the generated SQL query.
Getting the Results: The MCP tool executed the query against the Kafka topic via Confluent's Flink SQL service. The results were passed back through MCP to the Cursor AI, which then presented them in a summarized, readable format, indicating that 175 orders were found.
Combining AI assistants like Cursor with integration frameworks like MCP allows for building intuitive interfaces over powerful backend systems. Translating natural language requests into executable queries like Flink SQL for Kafka unlocks data accessibility. This workflow demonstrates a powerful shift in how to interact with complex data systems like Kafka.
Accessibility: Users don't need to be Flink SQL experts to explore data.
Speed: It dramatically reduces the time from request to insight.
Leveraging Existing Infrastructure: It seamlessly integrates with Confluent Cloud features.
Extensibility: The MCP framework allows for adding more tools.
For those interested in the technical implementation, the tools used here can be found in the Confluent MCP repository. It's an exciting glimpse into a future where interacting with data is as simple as making a request.
Try it for yourself and get started with Confluent Cloud. Or learn more about how MCP and data streaming enables agentic AI.
Explore more generative and agentic AI resources and use cases.
Apache®, Apache Kafka®, Kafka®, Apache Flink®, and Flink® are registered trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.
This blog explores how to integrate Confluent Tableflow with Trino and use Jupyter Notebooks to query Apache Iceberg tables. Learn how to set up Kafka topics, enable Tableflow, run Trino with Docker, connect via the REST catalog, and visualize data using Pandas. Unlock real-time and historical an...
This blog post demonstrates using Tableflow to easily transform Kafka topics into queryable Iceberg tables. It uses UK Environment Agency sensor data as a data source, and shows how to use Tableflow with standard SQL to explore and understand the data.