Introducing Connector Private Networking: Join The Upcoming Webinar!

4 Steps for Building Event-Driven GenAI Applications

Data Streaming for Real-time Artificial Intelligence

Build next-generation data intensive AI applications with a next generation data streaming platform. Tap into continuously enriched trustworthy data streams to quickly scale and build real-time AI applications.

Verfasst von

This article was originally published on The New Stack on Dec 27, 2023.

I’ve worked with artificial intelligence for nearly 20 years, applying technologies spanning predictive modeling, knowledge engineering, and symbolic reasoning. AI’s tremendous potential has always felt evident, but its widespread application always seemed to be just a few more years away. With the current embodiment of Generative AI (GenAI) technologies, however, this time it feels different.

A significant barrier in the past was that designing and training models required expertise that was in scarce supply. Now, we have foundation models, like LLMs, powering GenAI that are reusable and generalized—making the application of the technology far more democratized than it has ever been.

Companies worldwide are experimenting with building GenAI-enabled applications and tools to drive greater efficiency and innovation. A new forecast from IDC shows that enterprises will invest nearly $16 billion worldwide in GenAI solutions in 2023. But these investments won’t disproportionately benefit just a few firms as past iterations of AI have.

While a promising approach for building a GenAI-enabled application begins with zero-shot learning or few-shot learning to generate better outputs, most non-trivial use cases require prompts to be contextualized with domain-specific data that was not available when the LLM was trained. From semantic search to recommendation engines, most of the valuable use cases for GenAI-enabled applications require prompts to be paired with relevant, timely, and accurate corporate data to generate usable outputs, typically applying a pattern commonly known as Retrieval Augmented Generation (RAG).

Building these data-driven GenAI applications entails developing complex applications spanning many skill sets. Further, the goal isn’t to build a single GenAI-enabled application. For GenAI to truly transform your business, your team will deliver tens or hundreds of specialized applications over time that likely use the same foundation models but pull from different sources of truth across the enterprise.

Most modern enterprises will find building and deploying AI-based applications challenging, as their data is locked in siloed, heterogeneous operational data stores. Ultimately, bringing GenAI apps to market requires a common operating model and platform for data integration. 

Based on insights drawn from our team’s discussions with hundreds of customers who are building GenAI applications, we have found that the best way to build GenAI apps is by embracing event-driven patterns. We’ve identified four general steps that these applications tend to have. As we describe next, each step is ideally implemented as an event-driven application.

Critical steps to build LLM-driven applications with event streaming

LLM-driven applications usually have four steps—data augmentation, inference, workflows, and post-processing. Using an event-driven approach for each makes development and operations much more manageable.

Let’s see how:

Step 1. Data augmentation

This step prepares the data for contextualization in LLM queries with activities such as:

  • Chunking, where you break data up into semantically meaningful pieces 

  • Creating embeddings, which are mathematical representations of information that preserve meaning and relationships, makes it possible for AI models to understand and reason about information otherwise intended to be consumed by humans

  • Storing in a vector store for retrieving high-dimensional vector representations essential for supporting large language models (LLMs)

This step pulls unstructured data from disparate operational data sources (Amazon S3 and Salesforce for example) across the enterprise with the help of source connectors or native integrations—and then organizes unstructured data embeddings into vector stores, which can then be engineered into a prompt. We use one of data streaming’s well-established advantages of integrating disparate operational data across the enterprise in real time for reliable, trustworthy use.

The benefit of embracing an event-driven approach is that changes in operational data stores are consistent with the vector store staging information to contextualize prompts in an LLM-enabled application later. Brittle ETL pipelines can have cascading batch operations that mean the LLM is acting on stale data. The vector store is a durable cache, denormalizing enterprise knowledge to deliver the sophisticated, reactive experiences that consumers have come to expect.

This pattern is seen below, where an Apache Kafka consumer group pulls data from a connector sink, processes data, and creates embeddings passed through a sink connector or native integration into an appropriate vector store.

Step 2. Inference

The next step involves inference, which involves engineering prompts with data prepared in the previous steps and handling responses from the LLM. 

When a prompt from a user comes in, the application can gather relevant context from the augmented vector store or similar enterprise service to generate the best possible prompt. 

Now, let’s see how an event-driven approach can help.

If you look at the image below, you’ll see a web application on the left. Web applications are typically built by a full stack team mostly focused on how data flows in and out of the ORM mapping and managing sessions. With this pattern, they can work independently from the consumer group you see on the right, which could be done by a backend team specializing in AI application development. The consumer group calls the vector store about engineering a prompt, which calls out to the LLM service.

If you think about LLM calls when you're using something like ChatGPT, those calls can take seconds, which is an eternity for distributed systems. With this approach, you don't need your web application team to manage that concern. The teams can just treat all this as async communication, which is a really wonderful pattern for organizing one's teams and scaling them independently. 

Further, by having decomposed, specialized services rather than monoliths, these applications can be deployed and scaled independently. This can help with time to market given that new inference steps are consumer groups, and the organization can template infrastructure for instantiating these quickly.

Step 3. Workflows

Workflows are a common conceptual model (e.g., the chains in LangChain) for composing reasoning agents and inference steps to form GenAI-enabled applications. The intuition with agents is that we often need something that automates action, such as the next LLM call, based on what the previous response was. LLMs can be suitable intelligent agents for some uses, but these are often specialized, more traditional models that can rely on domain-specific knowledge. 

Consider the design of an insurance underwriting application: GenAI models often don’t make underwriting decisions (yet). Instead, the LLM powers a natural language interface that calls a traditional model to provide a prediction based on peril-specific modeling risk. The other reason we often decompose LLM agents into chains of calls is state of the art LLMs (as of this writing) tend to return better results when we ask multiple, simple questions rather than larger compound questions, though this characterization is rapidly evolving. 

Now let’s look at the image below. As before, web application developers can work independently. The full-stack engineers can build web apps and the backend system engineers can build consumer groups that can do natural language search over operational data, like a relational database management system. This is something that SQLBuilder and LangChain allow. It can use reasoning agents and contextualize prompts based on what's in the vector store. It can make as many subsequent calls as necessary to the LLM—which helps answer whatever query the web application requires.

Step 4. Post-processing

Hallucinations happen, and businesses must independently validate LLM outputs and enforce business logic to prevent the application from being counterproductive.

How can embracing an event-driven methodology help here? If you look at the image below, you’ll see an independent post-processing consumer group. Once again, this decouples post-processing from the rest of the application. 

This approach is useful as LLM workflows and dependencies evolve much more rapidly than the business logic that determines acceptability.

Usually, a different business group, such as a compliance team, will define these rules and build these applications. Event-driven microservices eliminate unnecessary out-of-band coordination as each microservice just produces and consumes well-governed events.

Ultimately, GenAI applications run on data—and providing these applications with the volume and quality of data they need to generate reliable results can be a challenge for most companies. And this is where a data streaming platform can help. Such platforms let you quickly build and scale these data-intensive real-time applications by enabling you to tap into continuously enriched, trustworthy, and contextualized data streams. 

Embracing a data streaming platform 

One of the core value propositions of data streaming is that you are not constrained to where your data lives. Data streaming enables businesses to route relevant data streams to anywhere they're needed in real time—making data easily available and accessible to gen AI-enabled applications. 

Data streaming platforms enable real-time generative applications at scale by:

  • Integrating disparate operational data across the enterprise in real-time for reliable, trustworthy use

  • Organizing unstructured enterprise data with embeddings into vector stores can then help engineer prompts

  • Decoupling customer-facing applications from LLM call management to provide reliable, reactive experiences that scale horizontally

  • Enabling LLMs, vector stores, and embedding models to be treated as modular components that can be substituted as technology improves

Ultimately, data streaming helps decouple systems, teams, and technologies. It facilitates data products that are well contextualized, trustworthy, and discoverable so teams can work confidently and independently to scale their applications, which is imperative for GenAI-enabled applications.

A data streaming platform ensures you can bring real-time, well formatted, and highly governed data streams to power your GenAI applications and promote data reusability, engineering agility, and greater trust. This allows businesses to quickly deliver the reactive, sophisticated experiences that consumers have come to expect. Access our AI resource hub to learn how Confluent can power your GenAI journey. 

  • Andrew Sellers leads Confluent’s Technology Strategy Group, a team supporting strategy development, competitive analysis, and thought leadership.

Data Streaming for Real-time Artificial Intelligence

Build next-generation data intensive AI applications with a next generation data streaming platform. Tap into continuously enriched trustworthy data streams to quickly scale and build real-time AI applications.

Ist dieser Blog-Beitrag interessant? Jetzt teilen