Real-time AI is the future, and AI/ML models have demonstrated incredible potential for predicting and generating media in various business domains. For the best results, these models must be informed by relevant data. AI-powered applications almost always need access to real-time data to deliver accurate results in a responsive user experience that the market has come to expect. Stale and siloed data can limit the potential value of AI to your customers and your business.
Confluent and Rockset power a critical architecture pattern for real-time AI. In this post, we’ll discuss why Confluent Cloud’s data streaming and Rockset’s vector search capabilities work so well to enable real-time AI app development and explore how an e-commerce innovator is using this pattern.
AI application designers follow one of two patterns when they need to contextualize models:
Extending models with real-time data: Many AI models, like the deep learners that power Generative AI applications like ChatGPT, are expensive to train with the current state of the art. Often, domain-specific applications work well enough when the models are only periodically retrained. More generally applicable models, such as the Large Language Models (LLMs) powering ChatGPT-like applications, can work better with appropriate new information that was unavailable when the model was trained. As smart as ChatGPT appears to be, it can’t summarize current events accurately if it was last trained a year ago and not told what’s happening now. Application developers can’t expect to be able to retrain models as new information is generated constantly. Rather, they enrich inputs with a finite context window of the most relevant information at query time.
Feeding models with real-time data: Other models, however, can be dynamically retrained as new information is introduced. Real-time information can enhance the query’s specificity or the model’s configuration. Regardless of the algorithm, one’s favorite music streaming service can only give the best recommendations if it knows all of your recent listening history and what everyone else has played when it generalizes categories of consumption patterns.
The challenge is that no matter what type of AI model you are working with, the model can only produce valuable output relevant to this moment in time if it knows about the relevant state of the world at this moment in time. Models may need to know about events, computed metrics, and embeddings based on locality. We aim to coherently feed these diverse inputs into a model with low latency and without a complex architecture. Traditional approaches rely upon cascading batch-oriented data pipelines, meaning data takes hours or even days to flow through the enterprise.
Whatnot is an organization that faced this challenge. Whatnot is a social marketplace that connects sellers with buyers via live auctions. At the heart of their product lies their home feed where users see recommendations for livestreams. As Whatnot states, "What makes our discovery problem unique is that livestreams are ephemeral content — We can’t recommend yesterday’s livestreams to today’s users and we need fresh signals."
Ensuring that recommendations are based on real-time livestream data is critical for this product. The recommendation engine needs user, seller, livestream, computed metrics, and embeddings as a diverse set of real-time inputs.
"First and foremost, we need to know what is happening in the livestreams — livestream status changed, new auctions started, engaged chats and giveaways in the show, etc. Those things are happening fast and at a massive scale."
Whatnot chose a real-time stack based on Confluent and Rockset to handle this challenge. Using Confluent and Rockset together provides reliable infrastructure that delivers low data latency, assuring data generated from anywhere in the enterprise can be rapidly available to contextualize machine learning applications.
Confluent is a data streaming platform enabling real-time data movement across the enterprise at any arbitrary scale, forming a central nervous system of data to fuel the business. Rockset is a real-time analytics database capable of low-latency, high-concurrency queries on heterogeneous data supplied by Confluent to inform AI algorithms.
Data streaming is the most flexible and powerful paradigm for connecting, processing, and using data to drive business value. At Confluent, we have proven that the benefits of data streaming are transformational to business success, but the most benefit is realized when data streams are used broadly throughout the entire enterprise. By moving business operational data through data streams, we can bridge disparate systems and break data silos, bringing important data to where it’s needed most. These data streams power operations and inform real-time and batch-based analytics, integrating with existing technologies and systems.
Confluent Cloud is a cloud-native and complete data streaming platform powered by our new Kora Engine. It offers 70+ managed connectors to integrate operational databases, analytic systems, SaaS applications, and streaming systems like Rockset. Confluent Cloud also supports multiple technologies for event stream processing to transform, normalize, and clean data to prepare it for consumption by downstream users. Confluent’s Stream Governance features track the provenance of data and its path through the enterprise, assure data integrity, and make data streams discoverable to other users while adhering to security and access control policies.
Confluent Cloud is ideal for efficiently integrating real-time data and transforming it to empower real-time AI when coupled with the vector search capabilities in Rockset.
The other half of the real-time AI equation is a serving layer capable of handling stringent latency and scale requirements. In applications powered by real-time AI, two performance metrics are top of mind:
Data latency measures the time from when data is generated to when it is queryable. In other words, how fresh is the data on which the model is operating? For a recommendations example, this could manifest in how quickly vector embeddings for newly added content can be added to the index or whether the most recent user activity can be incorporated into recommendations.
Query latency is the time taken to execute a query. In this case, we are running an ML model to generate user recommendations, so the ability to return results in milliseconds under heavy load is essential to a positive user experience.
With these considerations in mind, what makes Rockset an ideal complement to Confluent Cloud for real-time AI? Rockset offers vector search capabilities that open up possibilities for the use of streaming data inputs to semantic search and generative AI. Rockset users implement ML applications such as real-time personalization and chatbots today, and while vector search is a necessary component, it is by no means sufficient.
Beyond support for vectors, Rockset retains the core performance characteristics of a real-time analytics database, providing a solution to some of the hardest challenges of running real-time AI at scale:
Real-time updates are what enable low data latency, so that ML models can use the most up-to-date embeddings and metadata. The real-timeness of the data is typically an issue as most analytical databases do not handle incremental updates efficiently, often requiring batching of writes or occasional reindexing. Rockset supports efficient upserts because it is mutable at the field level, making it well-suited to ingesting streaming data, CDC from operational databases, and other constantly changing data.
Metadata filtering is a useful, perhaps even essential, companion to vector search that restricts nearest-neighbor matches based on specific criteria. Commonly used strategies, such as pre-filtering and post-filtering, have their respective drawbacks. In contrast, Rockset’s Converged Index accelerates many types of queries, regardless of the query pattern or shape of the data, so vector search and filtering can run efficiently in combination on Rockset.
Rockset’s cloud architecture, with compute-compute separation, also enables streaming ingest to be isolated from queries along with seamless concurrency scaling, without replicating or moving data.
Let’s dig deeper into Whatnot’s story featuring both products.
Whatnot is a fast-growing e-commerce startup innovating in the livestream shopping market, which is estimated to reach $32B in the US in 2023 and double over the next 3 years. They’ve built a live-video marketplace for collectors, fashion enthusiasts, and superfans that allows sellers to go live and sell products directly to buyers through their video auction platform.
Whatnot’s success depends on effectively connecting buyers and sellers through their auction platform for a positive experience. It gathers intent signals in real-time from its audience: the videos they watch, the comments and social interactions they leave, and the products they buy. Whatnot uses this data in their ML models to rank the most popular and relevant videos, which they then present to users in the Whatnot product home feed.
To further drive growth, they needed to personalize their suggestions in real time to ensure users see interesting and relevant content. This evolution of their personalization engine required significant use of streaming data and buyer and seller embeddings, as well as the ability to deliver sub-second analytical queries across sources. With plans to grow usage 4x in a year, Whatnot required a real-time architecture that could scale efficiently with their business.
Whatnot uses Confluent as the backbone of their real-time stack, where streaming data from multiple backend services is centralized and processed before being consumed by downstream analytical and ML applications. After evaluating various Kafka solutions, Whatnot chose Confluent Cloud for its low management overhead, ability to use Terraform to manage its infrastructure, ease of integration with other systems, and robust support.
The need for high performance, efficiency, and developer productivity is how Whatnot selected Rockset for its serving infrastructure. Whatnot’s previous data stack, including AWS-hosted Elasticsearch for retrieval and ranking of features, required time-consuming index updates and builds to handle constant upserts to existing tables and the introduction of new signals. In the current real-time stack, Rockset indexes all ingested data without manual intervention and stores and serves events, features, and embeddings used by Whatnot’s recommendation service, which runs vector search queries with metadata filtering on Rockset. That frees up developer time and ensures users have an engaging experience, whether buying or selling.
With Rockset’s real-time update and indexing capabilities, Whatnot achieved the data and query latency needed to power real-time home feed recommendations.
“Rockset delivered true real-time ingestion and queries, with sub-50 millisecond end-to-end latency…at much lower operational effort and cost,” — Emmanuel Fuentes, head of machine learning and data platforms at Whatnot.
Confluent and Rockset are helping more and more customers deliver on the potential of real-time AI on streaming data with a joint solution that’s easy to use yet performs well at scale.
If you’re looking for the most efficient end-to-end solution for real-time AI and analytics without any compromises on performance or usability, we hope you’ll give both Confluent Cloud and Rockset a try.
Using Confluent's massive ecosystem of connectors, organizations can tap into their existing data stores, modern or legacy, and curate them for consumption by AI tools to drive actionable intelligence. →