[Live Workshop] Streams on Tour: Hands-On Deep Dive into Confluent | Register Now

How to Query Apache Kafka® Topics With Natural Language

Modern companies generate large volumes of data, but often the internal users who need that data to do their jobs—data engineers, managers, business analysts, and developers—can find it challenging to quickly figure out answers to their questions. Apache Kafka® is a powerhouse for real-time data processing of high-throughput workloads, and many organizations use Kafka to enable self-service access to data streams. But getting specific insights out of those streams often requires specialized knowledge.

Developers may choose to write queries using engines such as Flink SQL to transform and analyze data streaming. While powerful, Flink SQL isn't always intuitive, especially for quick explorations or for users less familiar with the syntax. But what if they could just use plain language to ask the Kafka topic for the data they need? Imagine that a manager of a retail business wants to know how many orders of a certain product were placed yesterday or how many orders were placed for each day. Or a developer wants to check if a particular message with a certain key is inside all the topics of a Kafka Streams pipeline. Being able to query the relevant Kafka topics with natural language would make this kind of information much more accessible to a variety of users.

Recently, I explored doing just that by leveraging Cursor, an artificial intelligence (AI) coding assistant, and Model Control Protocol (MCP), an open standard for integrating large language model (LLM) applications and data sources. This combination made it easy to interact with a Kafka topic hosted on Confluent Cloud. The same can be done with other AI chatbots like Claude desktop and Goose. See how it worked in this demo video:

Want to try it yourself? Get started with Confluent Cloud for free.

A Retail Scenario: Querying Order Records in Kafka

Let’s look at a typical scenario for retail use cases like real-time order management or demand forecasting. Orders come in a Kafka topic named sample_data_orders,” continuously receiving order events. Within the Confluent Cloud user interface (UI), the message viewer looks like this:

The Confluent Cloud UI showing the “sample_data_orders” topic receiving data

Each event is a JSON message containing details such as orderid, itemid, ordertime, and an address object with fields such as city, state, and zip code.

Example JSON message structure within the topic, highlighting the nested “state” field

Let’s say that I want to look at order volume for a specific customer, region, or time period. To do that, I need to identify records based on their field values. For this demo, I want to see orders headed to Missouri—so I need to find all the order records where the state field within the address object is equal to "State_94."

The Traditional Approach: Querying Kafka Topics With Flink SQL

Using Flink SQL's capabilities, one way to achieve this is by writing a Flink SQL query:

     SELECT *
FROM sample_data_orders
WHERE address.state = 'State_94';

This works perfectly, but only if the following is known:

  • The exact topic name (sample_data_orders)

  • The structure of the JSON message (i.e., that state is nested under address)

  • The correct Flink SQL syntax

The Natural Language Approach With AI and MCP

This is where Cursor and MCP come into play. Think of MCP as a structured way for the AI agent within Cursor to discover and use available tools or APIs. The specific tools enabling interaction with Confluent Cloud, as shown in this example, are available in the Confluent MCP repository on GitHub.

Here’s the process I followed:

  1. Establishing Context Awareness: First, I needed the AI to understand the structure of my data. Confluent Cloud provides an MCP tool for the AI to interact with the Schema Registry instance associated with my Kafka cluster. Asking to list all schemas allows the AI to fetch this structural information.

  2. Making the Request: With the schema context established, I simply made my request in natural language directly within Cursor's chat interface: “list all orders whose state is State_94.”

    Making the request in plain English Within the Cursor AI chat

  3. Translating the Request to Flink SQL: With the MCP tool, Cursor's AI was able to understand my request and then correctly identify the target topic, the filtering condition, and the required query language (Flink SQL). It then generated the precise Flink SQL statement.

    The AI translating the natural language query into the corresponding Flink SQL statement

  4. Executing the Flink SQL Statement via MCP: The AI didn't just generate the query; it also recognized that another MCP tool (also available in the Confluent MCP repository) was available to execute Flink SQL statements against my Confluent Cloud environment. It invoked this tool, passing the generated SQL query.

  5. Getting the Results: The MCP tool executed the query against the Kafka topic via Confluent's Flink SQL service. The results were passed back through MCP to the Cursor AI, which then presented them in a summarized, readable format, indicating that 175 orders were found.

    The AI executing the query via MCP and presenting the summarized results

Why Lowering the Barrier to Kafka Topic Querying Matters

Combining AI assistants like Cursor with integration frameworks like MCP allows for building intuitive interfaces over powerful backend systems. Translating natural language requests into executable queries like Flink SQL for Kafka unlocks data accessibility. This workflow demonstrates a powerful shift in how to interact with complex data systems like Kafka.

  • Accessibility: Users don't need to be Flink SQL experts to explore data.

  • Speed: It dramatically reduces the time from request to insight.

  • Leveraging Existing Infrastructure: It seamlessly integrates with Confluent Cloud features.

  • Extensibility: The MCP framework allows for adding more tools.

For those interested in the technical implementation, the tools used here can be found in the Confluent MCP repository. It's an exciting glimpse into a future where interacting with data is as simple as making a request. 

Try it for yourself and get started with Confluent Cloud. Or learn more about how MCP and data streaming enables agentic AI. 

Explore more generative and agentic AI resources and use cases.

‎ 

Apache®, Apache Kafka®, Kafka®, Apache Flink®, and Flink® are registered trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.

  • Rahul Bhattacharya is a seasoned consultant, a passionate developer, and a Kafka development thought leader. His Kafka technical leadership spans retail, financial services, and transportation, to name a few. Rahul currently works for Confluent to help customers adapt and use Kafka to its maximum potential. Before Confluent, he worked in several companies, including Target, Cisco, and SAP. He has a degree in computer science and an MBA from Boston University and is pursuing a Masters in Artificial Intelligence from UT Austin. In his free time, he enjoys spending time with his daughter, playing open-world video games, and traveling.

Did you like this blog post? Share it now