New in Confluent Cloud: Making Data & Pipelines Accessible for AI-Ready Streaming | Learn More

Jun 23, 2026Read Time: 19 min

Real-Time Hyper-Personalization in 2026: Architecture Guide

Written By

Manveer Chawla

Jun 23, 2026Read Time: 19 min

Executive Summary

Hyper-personalization in 2026 is the ability to act on a user's current intent within the current session, using signals from across the journey. Batch customer data platforms (CDPs) can't do this. They can't capture intent as it forms, can't hold session state, and can't activate inside the intent window.

A streaming-native real-time data engine, on the other hand, can capture every event, hold session state, join live operational data, enable decisions in flight, and activate on whatever cadence the use case demands.

Latency is a property of that intent moment, not a universal target. Real-time bidding (RTB) gets sub-100 ms. Web re-rank gets 200–500 ms. Push triggers run on second-to-minute cadences. Email and cross-channel orchestration play on hour-to-day windows. A single streaming-native data engine handles all of them; only the budget tightens or relaxes.

That architecture has four jobs: connect, stream, process, and govern. An AI-native layer sits on top to support agents, generative inference, and contextual retrieval. CDPs, recommendation tools, and feature stores are complements that sit above the data engine and its AI-native layer; their effectiveness is bounded by how fresh those layers keep their inputs.

When you evaluate a real-time data engine, score it on five capability areas (streaming, connectors, processing, governance, and AI primitives), not latency or total cost of ownership (TCO). A unified vendor avoids the integration debt and silent data-sync failures that compound across a fragmented stack.

This guide is written for digital product managers in retail, media, and consumer apps who are moving from segment-based marketing to event-triggered personalization. The concrete examples use Confluent's stack (Apache Kafka® powered by the Kora engine, Apache Flink®, Tableflow delivering streaming data as Apache IcebergTM and Delta Lake tables, and Confluent Intelligence), but the patterns apply to any streaming-native foundation.

Why Batch CDPs Fail at Real-Time Hyper-Personalization

Intent has a half-life. Every click, view, watch completion, and abandonment is a signal of current intent that decays in seconds to minutes, and batch architectures don't capture that signal at all. Nightly cohort recomputes, batch-loaded search indexes, and weekly lookalike refreshes all process yesterday's data. Web performance research consistently links faster page response to conversion lift, but the deeper structural problem is that by the time a batch sync completes, the intent moment is gone.

The cart abandonment example illustrates the failure mode. A user adds an item, closes the browser, and then receives a recovery email hours later. With global cart abandonment rates remaining stubbornly high, delayed outreach consistently underperforms immediate, in-session interventions. Every minute of recovery latency is a conversion-rate cliff your batch architecture cannot climb.

Tighter batch syncs don't change the math. Any batch mechanism (schedul

ed CDP exports, warehouse pushes, reverse-ETL jobs) runs on intervals rather than events, and the activation APIs it ultimately hits enforce strict payload caps and rate limits. HTTP-based activation APIs return bad request errors for oversized payloads and HTTP 429s when throughput spikes during critical events such as holiday sales. The cumulative latency of the sync interval, the API call, and any retry queueing puts intent windows shorter than a few minutes out of reach.

Real-Time Personalization Architecture: From Event to Experience

A real-time data engine for hyper-personalization has four jobs: connect to the operational systems that hold customer profiles, inventory and user actions, stream every event the moment it happens, process the data in flight to update session state and make decisions, and govern every step so the decision is correct, auditable, and compliant. An AI-native layer sits above the four for agentic and generative workloads. The intent moment determines the end-to-end budget for those four jobs; the jobs themselves do not change.

Intent Moments and Their Windows

Hyper-personalization runs across a spectrum of intent moments. The architecture is the same across all of them; only the budget tightens or relaxes.

Intent moment	Window	Examples
Real-time bidding (RTB)	Sub-100 ms	Programmatic ad auctions
Web re-rank	200–500 ms	Product carousels, search re-rank, feed reorder before the next page paint
Push and in-app	Seconds to minutes	Cart abandonment push, geofence trigger, in-app nudge
Email and cross-channel	Hours to days	Win-back campaigns, lifecycle marketing, multi-touch journeys

The next subsections walk through each of the four jobs. Where the budget matters, the prose calls out the RTB and web personalization profiles inline, since those are the cases where every millisecond is contested.

Stage 1: Stream and Connect

A personalized experience starts the millisecond a user interacts with your product, whether that's a product view, a cart addition, or a media pause. The event must be captured and routed into a durable streaming backbone immediately, and at the same instant, the system needs to pull current operational state from the systems of record. Stage 1 does both concurrently. The two halves run in the tightest budget of any stage: in the RTB profile, that's 15–35 ms (assuming co-located infrastructure); in web personalization, 50–100 ms.

The streaming substrate has to absorb peak-event spikes without manual capacity work and replicate cleanly off existing open source Apache Kafka clusters. The Kora engine is one option that delivers this with Kafka-protocol compatibility, so existing producers and consumers carry over unchanged. For zero-downtime migration off open source, Cluster Linking handles the underlying real-time replication and Kafka Copy Paste (KCP) automates the migration workflow end-to-end.

Personalization also requires context. A clickstream event is useless without instant knowledge of the user's loyalty status or the current inventory level of the clicked item. Confluent ships more than 120 pre-built connectors, of which more than 80 are fully managed, including PostgreSQL Debezium, Oracle change data capture (CDC), Oracle XStream, Snowflake, and Amazon S3. They pull operational data from systems of record into the same Kafka topics the rest of the platform reads from. Each managed connector saves three to six engineering months versus building and stabilizing your own.

Stage 2: Process

Once raw events and database changes stream concurrently, the system transforms them into a coherent decision. Instead of dumping data into a warehouse and querying it later, computation happens continuously on data while it's still in motion. This is the heaviest stage in the synchronous path. The RTB profile gives it 20–45 ms; web personalization, 100–250 ms.

Confluent Cloud for Apache Flink is the computational core. Engineering teams can write transforms in ANSI SQL for the common path, Python for custom chunking or hybrid retrieval logic, and Java when SQL isn't expressive enough. One engine spans the team's existing skill mix. Flink runs in-flight temporal joins that merge live session events with customer profiles and real-time inventory catalogs. Because Flink is stateful, it remembers the user's past actions within the session and calculates sliding-window aggregations (such as the number of times a user viewed a specific category in the last five minutes) without running a synchronous read against an external database. That capability is the lever that lets you re-rank the next page paint on in-session behavior alone.

The recommendation engine also runs at this stage. ML_PREDICT and AI_COMPLETE (part of Confluent Intelligence's Built-in ML Functions, introduced after Stage 3) ship inside Confluent Cloud for Apache Flink and turn a SQL statement into an inference call without a separate Python worker tier. The four recommendation-engine patterns built on these functions are detailed in their own section below.

Stage 3: Govern

The shift from batch to continuous real-time processing introduces hidden complexities that, if unmanaged, will silently corrupt recommendation algorithms and violate compliance mandates. When data moves at gigabytes per second, out-of-order events, network retries, and schema mutations are inevitable. Governance protects your personalization architecture and gives you a defensible answer to the question "why did this user see this offer?" It runs alongside processing rather than after it, so the budget overhead stays low (5–10 ms in the RTB profile, 20–40 ms in web personalization).

Real-time processing requires a strict delineation between event-time and processing-time. If a mobile user loses signal and their clicks arrive at the server five minutes late, processing that data based on server receipt time will destroy the chronological sequence of their session and produce nonsensical recommendations. Flink manages out-of-order events through watermarking strategies that ensure temporal joins respect the actual time the user took the action.

Equally critical: exactly-once semantics. In a distributed network, failure recovery often involves replaying messages. Without exactly-once processing, a single purchase event might be counted twice, artificially inflating an item's popularity in your recommendation model, or worse, charging a customer's ledger incorrectly. The streaming backbone must guarantee that no matter how many times a system component fails and restarts, every event is processed exactly once.

Beyond mechanical correctness, privacy and data contracts must be enforced in motion. Stream Governance covers all three dimensions:

Schema Registry and Data Contracts enforce semantic rules at the broker, rejecting malformed events before they enter the pipeline. A profile-table column rename does not silently corrupt your recommendation features.
Stream Catalog organizes Kafka topics as discoverable data products with metadata tagging, search, and self-service access requests, so marketers and growth engineers can adopt trusted streams without bottlenecking on a central data engineering team.
Stream Lineage visualizes data flows from source to destination, letting you audit every artificial intelligence (AI) agent's context source. It answers compliance questions such as "where did this retrieval-augmented generation (RAG) document come from, and what schema version produced its embedding?"

Layered on top, client-side field level encryption (CSFLE) keeps personally identifiable information (PII) obfuscated the moment it's generated and throughout the streaming infrastructure, decrypted only by authorized applications at the final destination. Sensitive fields never reach a downstream recommendation vendor in clear text.

The AI-Native Layer: Confluent Intelligence

The four jobs above keep the deterministic pipeline correct and current. Confluent Intelligence is the AI-native layer that sits on top of that pipeline, turning the streaming foundation into a runtime for agents, contextual retrieval, and generative inference. Three components live here:

Streaming Agents. Agents run as Flink jobs with always-on session state, tool calling via Model Context Protocol (MCP) and agent-to-agent (A2A) protocols, and replayable event flows. For personalization, this means an agent can interpret user intent, call ranking models, query live inventory, trigger downstream actions in applications and other agents, and adapt the recommendation set continuously, with every decision auditable through Stream Lineage.
Real-Time Context Engine. A managed service that serves structured user context to AI apps and downstream LLMs over MCP, with built-in authentication, RBAC, and audit logging. It removes the need to build a separate context-aggregation tier in front of every model call.
Built-in ML Functions. Native Flink SQL functions for embedding, anomaly detection, fraud prevention, forecasting, and sentiment analysis. The ML_PREDICT and AI_COMPLETE calls referenced throughout this article are part of this surface.

Because all three run as Flink jobs against the same Kafka topics that feed the deterministic pipeline, agent decisions inherit exactly-once and lineage guarantees by default. The AI-native layer isn't a sidecar bolted onto the streaming engine; it is the streaming engine, surfaced for agentic and generative workloads.

Closing the Loop: Tableflow for Offline Training

The synchronous online decision path covers in-session experiences. The asynchronous half of a machine learning (ML) stack, offline training, runs on the same Kafka topics. Tableflow converts the live event topics directly into Apache Iceberg™ or Delta Lake tables, forming the bronze and silver layers of an analytics medallion stack. The same interaction data that your online models score against also feeds offline retraining, eliminating the duplicate ETL pipelines most teams maintain today. Tomorrow's model is built from the same events that powered today's decisions, then ships back into the live decision path.

Building a Real-Time Recommendation Engine on the Streaming Backbone

A real-time recommendation engine isn't a separate piece of infrastructure. It's the same streaming-native stack, specialized along four capabilities (vector search, ranking model inference, contextual bandits, and generative AI copy), all running inside Apache Flink against the live event stream.

Vector Search for Semantic Recommendations

Embeddings of product catalogs, content libraries, or user histories sit in a vector database (Pinecone, Weaviate, Milvus, or pgvector). Confluent Cloud for Apache Flink calls an embedding model with ML_PREDICT on the live event, computes the query embedding, and the vector database returns nearest neighbors. The whole loop fits inside the synchronous decision budget for most workloads (low hundreds for web personalization), with no separate embedding-worker tier to provision and scale.

Ranking Model Inference Inside the Stream

Trained ranking models (XGBoost, LightGBM, neural rerankers) score candidates in flight. ML_PREDICT invokes a remote model endpoint or a custom function for self-hosted models, so the candidates retrieved from vector search arrive at the UI already ranked. No separate Python inference cluster, no additional network hop.

Contextual Bandits for Exploration vs Exploitation

Bandit algorithms balance exploration (serving new or uncertain items) against exploitation (serving high-confidence winners). Flink’s stateful operators manage the bandit’s 'memory' by storing per-user reward histories, sliding-window aggregations, and Thompson-sampling parameters within checkpointed state. ML_PREDICT provides a unified surface to invoke bandit policies on each event and Flink ensures exactly-once guarantees for the local state and provides lineage tracking for the decision flow.

Generative AI Copy and Creative

Large language model (LLM) calls for generative copy are the slowest inference type in this stack. Full responses from GPT, Claude, or Bedrock take seconds, and even distilled models return in hundreds of milliseconds. They don't fit a synchronous in-session decision path for ranked feeds. Production systems run them in three patterns, all anchored to AI_COMPLETE as the call surface:

Pre-generation and cache. For high-traffic SKUs, content, or user segments, generate the copy ahead of time, key it by segment, item ID, and signals like device type or time of day, and look it up at sub-10 ms. The streaming pipeline keeps the cache up to date as products and segments change.
Async overlay. Render the ranked carousel from the synchronous path, then overlay LLM-generated rationale or copy when it arrives, typically a few hundred milliseconds later.
Streaming-first response. For chat-style copilots and conversational assistants where the user is waiting on a single answer, partial output begins within 200–500 ms while the full response completes asynchronously.

In all three, AI_COMPLETE is the call surface; what differs is when it runs.

The point of routing all four capabilities through Flink is consolidation: no separate inference cluster, no separate bandit service, no separate LLM gateway. Fewer moving parts let the synchronous decision path stay tight and keep the asynchronous generative path coherent with the rest of the pipeline.

Three Real-Time Personalization Blueprints

Three customer-validated patterns (retail product recommendations, media feed personalization, and cross-channel orchestration) show what the architecture above looks like in production.

Retail Blueprint: Inventory-Aware Product Recommendations in Real Time

The intent moment in e-commerce is the user's current browsing session, with clicks and category visits accumulating across minutes. The personalization decision needs to render before the next page paint, a 200–500 ms server-side budget for most web flows. Recommending an item that has just gone out of stock destroys trust within that same window. Batch architectures struggle because their search indexes and recommendation models are often hours behind actual warehouse inventory.

The fix is to execute in-flight temporal joins between the live clickstream and real-time inventory CDC feeds. When a user navigates to a category page, the streaming engine cross-references their session context against live stock levels. If an item drops below a critical threshold, you either dynamically adjust pricing or suppress it from the recommendation carousel before the page renders.

Instacart runs the same shape of architecture across 59,000 U.S. retail locations. With Confluent powering the streaming layer, they scaled their data pipelines to handle 10 years of growth in six weeks, added 500,000 new customers in weeks rather than quarters, and freed engineering capacity for product work instead of Kafka operations. As an Instacart engineer put it: "Things like spinning up Kafka clusters and getting prototypes up and running very quickly, Confluent has been really helpful there. Those sort of exercises might take a long time for my team to do if we were doing this on vanilla open source Kafka. With Confluent, we can turn that around very quickly."

Media Blueprint: Real-Time Feed Personalization With Streaming and Embeddings

In digital media, watch-completion events are among the highest-intent signals, the moment a user finishes content and decides what to watch next. The feed has a short window from when the credits roll to the user’s next interaction (a window that may be seconds to minutes) to surface a recommendation. If a user finishes watching a thriller, recommending another thriller an hour later is useless. The signal needs to be in the model before the next paint.

The fix is continuous in-session feed re-ranking on watch-completion events and semantic embeddings. As the user scrolls and interacts, Flink processes each event and queries a vector database for semantically similar content.

Notion uses the Confluent Data Streaming Platform to keep its AI search and content discovery tools fed with up-to-the-second context, the exact freshness this blueprint depends on. As a Notion engineer noted:

"Confluent's platform allows us to stream changes as they happen, ensuring that our AI tools always provide the most relevant and timely information."

Orchestration Blueprint: In-App and Cross-Channel Triggers in Seconds

The intent moment for cart abandonment is broader than a single render window. Push notifications are most effective within seconds to a few minutes of abandonment, while the user's phone is still in their hand. Email plays on hour-to-day cadences. Retargeting ads run on multi-day windows. The streaming engine's job is to detect the abandonment event the instant it happens and trigger the right intervention on the right channel at the right time.

A unified event stream lets the streaming engine fire multi-touchpoint interventions on a single signal. If a user builds a cart on the web and closes the browser, the engine detects the session termination event instantly, evaluates notification preferences, and fires a personalized push to their mobile device with a limited-time incentive.

Cross-channel orchestration at this speed is only possible when the data foundation moves fast enough to keep up with customer behavior.

Healthcare technology vendor Henry Schein One frames the underlying constraint that orchestration depends on:

"Everyone wants AI, but the hard part is getting high-quality data moving in real time. The Confluent data streaming platform makes that possible for us. It's the foundation that gets our data moving and gets it where it needs to be."

The Vendor Categories That Stack on Top of the Streaming Foundation

A personalization stack has five vendor layers. The real-time data engine sits at the foundation; an AI context layer, CDPs, recommendation tools, and feature stores all sit above it. Each upper layer reads from the data engine, so understanding what each layer owns clarifies where to invest and where to let the foundation do the work.

Layer 1: Real-Time Data Engine (Streaming Ingestion, Processing, and Governance)

The data engine handles high-throughput ingestion, stateful stream processing, and strict data governance. Several vendors compete here with different scopes. Confluent ships a unified streaming platform with an AI layer. Snowplow specializes in event collection and behavioral data pipelines. Tinybird, Materialize, and RisingWave focus on streaming SQL databases for real-time analytics. ClickHouse is commonly used alongside this layer as a real-time analytical database, especially when teams want to query streamed event data for high-concurrency dashboards, product analytics, and join-heavy investigative workloads. Redpanda provides a high-performance Kafka-compatible broker without a built-in stream processor. AWS MSK, Glue, and Lambda offer a hyperscaler-native broker with bolt-on processing.

If you need ingestion, processing, and governance under a single operating model, a unified vendor such as Confluent offers the lowest TCO and the fastest time to production. The others play focused roles in fragmented stacks and need integration work to cover the full pipeline.

Layer 2: Real-Time Context for AI Apps and Agents

AI applications, copilots, and agents need fresh, structured user context at inference time: current session state, profile attributes, recent behavior, entitlements, and policy. Building that context-aggregation tier in front of every model call is one of the most expensive parts of putting agents into production. Confluent Intelligence's Real-Time Context Engine is the managed-service answer: it serves structured context over the Model Context Protocol (MCP) with built-in authentication, RBAC, and audit logging, and integrates natively with LangChain, Amazon Bedrock, Salesforce Agentforce, and Anthropic Claude. Adjacent products in this layer include vector databases used for embedding-based retrieval (Pinecone, Weaviate, Milvus) and emerging LLM-orchestration services that route prompts and assemble context across multiple sources.

Layer 3: Real-Time Activation (CDPs and Event-Driven Channels)

The activation layer turns data into action: emails, push notifications, ad audiences, in-app messages, and downstream workflows. Two patterns share this layer. CDPs like Segment, Hightouch, Amperity, and mParticle handle profile-based activation: assemble a unified user profile, push audiences to marketing APIs, and manage syndication, suppression rules, frequency capping, and cross-channel identity. Event-driven activation routes individual events from the streaming engine directly to applications and agents. Confluent's sink connectors deliver to webhooks and application APIs, and Streaming Agents (introduced in the architecture section) trigger actions in apps and other agents the moment the event lands.

Layer 4: Recommendation and Personalization Tools (Ranking and Merchandising)

Recommendation and personalization tools decide what the user sees on screen. Algolia, Bloomreach, Dynamic Yield, and Adobe Target ingest the real-time context that the data engine pushes to them and run final re-ranking, merchandising rules, and UI rendering. They own the A/B testing surface, visual merchandising, and contextual bandit execution at the presentation layer.

Layer 5: Feature Stores for Online Inference and Offline Training

Feature stores bridge online inference and offline training. Tecton and Feast maintain the same feature transformations across both: the transformation applied to historical data in the lakehouse is the same as the one applied to live events at inference time. They materialize stream-processed aggregates into low-latency databases so recommendation models can fetch features in single-digit milliseconds.

How to Evaluate Your Personalization Stack and Build a Real-Time Roadmap

Moving from batch to streaming-native personalization requires two things: a clear evaluation framework for the vendor decision and a phased rollout roadmap.

Evaluation Criteria for a Real-Time Data Engine

Generic vendor checklists (latency, integrations, TCO) miss what actually matters for hyper-personalization. Score real-time data engines on five capability areas:

Streaming. Verifiable GBps+ throughput and p99 tail-latency benchmarks under sustained load, on a 99.99% service-level agreement (SLA), with Kafka-protocol compatibility so existing producers and consumers carry over. Averages mean nothing; demand p99 numbers measured during peak.
Connectors. Breadth of fully managed CDC and SaaS connectors, especially for the operational systems that feed personalization (transactional databases, inventory, profiles, email service providers (ESPs), and ad platforms). Each connector you have to build and maintain yourself is engineering capacity you aren't spending on AI features.
Stream processing. Serverless processing with stateful joins, exactly-once semantics, and in-flight ML inference (ML_PREDICT and AI_COMPLETE or equivalents). Multi-language support (ANSI SQL, Python, Java) matters: SQL-only locks out custom retrieval logic, Java-only locks out non-Flink engineers.
Governance. Schema registry with data contracts enforced at the broker, a stream catalog that lets growth engineers self-serve trusted streams as data products, end-to-end lineage so you can answer "why did this user see this offer," and field-level encryption so PII never reaches downstream vendors in clear text.
AI primitives. First-class agent runtime (agents that run as Flink jobs, not bolt-on services), MCP-served context for LangChain, Bedrock, Agentforce, and Claude, and embedding-as-a-stream functions that eliminate a separate Python embedding tier.

A vendor that ships all five collapses the integration surface and operational footprint that point-tool stacks accumulate at scale.

Real-Time Personalization Maturity Model (Phases 1–3)

Phase 1: Single-Channel Session Re-Ranking (Prove Streaming-Native Basics)

Start by getting the event streaming backbone in place on one surface. The objective is not full coverage; it is proof. Capture real-time behavioral data on a single critical surface, such as the mobile homepage or checkout flow.

Reference implementation on Confluent: Kora-powered Kafka topics for clickstream ingestion, three managed connectors (PostgreSQL Debezium for the user profile table, your inventory CDC source, and a Snowflake connector to fold in offline customer-profile context), and Confluent Cloud for Apache Flink with simple sliding-window aggregates in ANSI SQL. Success in Phase 1 means re-ranking a single product feed entirely on in-session behavior, with no nightly batch job involved.

Phase 2: Cross-Channel Orchestration With Unified Event Streams and an AI-Native Layer

Once the streaming backbone is proven, shift focus to unification and the AI layer. This phase integrates real-time CDPs, unifies offline historical context with the live event stream, and introduces the AI-native primitives that Phase 3 leans on.

Reference implementation on Confluent:

A wider connector portfolio (CDC for transactional and inventory systems, SaaS connectors for ESPs and ad platforms).
The full Stream Governance (Schema Registry, Data Contracts, Stream Catalog, Stream Lineage) so growth engineers can self-serve trusted streams as data products.
Tableflow to extend the same Kafka topics to Apache Iceberg or Delta Lake for offline ML training, eliminating duplicate ETL pipelines.
The full Confluent Intelligence layer (introduced earlier), now deployed end-to-end with all three components:
- Streaming Agents. Agents run as Flink jobs inside the stream-processing pipeline, with always-on state, tool calling via Model Context Protocol (MCP) and agent-to-agent (A2A) protocols, and replayable, governed event flows. Because they are Flink jobs, the same exactly-once and lineage guarantees apply to agent decisions.
- Real-Time Context Engine. A managed service that serves structured user context to AI apps and agents over MCP, with built-in authentication, role-based access control (RBAC), and audit logging. MCP integrations include LangChain, Amazon Bedrock, Salesforce Agentforce, and Anthropic Claude.
- Built-in ML functions. Native Flink SQL functions for embedding, anomaly detection, fraud prevention, forecasting, and sentiment analysis, with hooks to invoke remote AI/ML models or custom self-hosted ones.

Success in Phase 2 means triggering a personalized action in a secondary channel in response to an event that occurred milliseconds earlier in the primary channel, with agentic primitives in place for Phase 3.

Phase 3: Agentic Personalization With Streaming Agents and LLMs

The final maturity stage moves beyond deterministic rules into generative and autonomous experiences. Streaming Agents from Phase 2 now orchestrate end-to-end personalization journeys, calling generative models via AI_COMPLETE for adaptive copy and creative, and the Real-Time Context Engine supplies the freshest user context to those calls.

Reference flow on Confluent: A Streaming Agent ingests customer interactions in real time, pulls context from purchase history, clickstream, and inventory through Flink temporal joins, has an LLM interpret intent, and adapts the recommendation set continuously. Every step runs as a Flink job inside the same governed pipeline, so the agent's decisions inherit Stream Lineage and exactly-once semantics by default.

Success in Phase 3 means content, copy, and UI all adapt to the user's real-time intent, generated and ranked by agents that share the same lineage and exactly-once guarantees as the rest of the pipeline. The same principle holds across industries: agents are only as good as the data they run on.

The Palmerston North City Council, a public-sector adopter, frames the dependency clearly:

"Good AI needs good data. Confluent is our trusted source of truth. The data streaming platform provides context and orchestration for our AI agents to automate workflows and accelerate our smart city transformation."

Conclusion

Your CDP, your recommendation vendor, and your feature store all do useful work, but each sits downstream of the real-time data engine. Pick that foundation right, and the rest of the stack gets faster, cheaper, and easier to reason about. Pick it wrong, and every system above it is constrained by what the foundation delivers: stale segments in your CDP, stale catalogs in your recommendation engine, stale features in your feature store. The streaming layer is not an implementation detail of the personalization stack; it is the foundation.

Next Steps

Explore Confluent's ML_PREDICT and AI_COMPLETE model-inference functions inside Confluent Cloud for Apache Flink to consolidate your embedding and ranking tiers into one engine.
Model your workload with the Confluent Cloud cost estimator to compare a unified data streaming platform against a fragmented build across Kafka, Flink, schema registry, lineage, and embedding-worker tiers.

Frequently Asked Questions

Why can't batch CDPs deliver true real-time personalization?

Batch CDPs can't capture intent as it forms. Their inputs lag minutes to hours, they can't hold session state across continuous behavior, and they can't fire activation inside the intent window. User intent changes faster than batch syncs can refresh.

How does stateful stream processing (such as Apache Flink) improve personalization?

Flink maintains session and user state continuously, enabling real-time joins and sliding-window features without synchronous database reads that add latency and bottlenecks.

What is "exactly-once" processing, and why does it matter for recommendations?

Exactly-once ensures events aren't double-counted during retries or failures. Without it, purchases and clicks can be duplicated and skew ranking, attribution, and even billing or ledger outcomes.

How do you handle late or out-of-order events in real time?

Use event-time processing with watermarks so joins and aggregations reflect when the user action occurred, not when the server received the event.

What data should be streamed for hyper-personalization?

Clickstream and session events, plus CDC from transactional systems (profiles, inventory, pricing, entitlements), so decisions are made with current behavioral and operational context.

Do I still need a CDP if I have a real-time data engine?

Often yes. CDPs are strong activation layers, but they depend on the streaming foundation to supply fresh, unified profiles and context.

What should I measure when evaluating a real-time personalization platform?

Prioritize p99 latency under load, operational overhead (managed/serverless), connector ecosystem, and end-to-end governance (schema, contracts, encryption, exactly-once).

Manveer Chawla is the co-founder of Zenith AI, where he helps technical companies optimize for AI search and answer engines. He was previously a Director of Engineering at Confluent leading the Kafka Storage organization and held engineering leadership roles at Dropbox and Facebook.

Did you like this blog post? Share it now

How to Eliminate Training-Serving Skew With a Unified Real-Time Streaming ML Pipeline (2026 Guide)

Jun 23, 2026

Separate batch and streaming pipelines for ML features cause training-serving skew. DoorDash measured a 35.7% feature mismatch in their dual setup. This guide covers a unified kappa architecture using Flink to compute features once for both training and serving, plus a 2026 tooling comparison.

Manveer Chawla

Build Compliant AI Agents With Stateful Stream Processing

Jun 15, 2026

EU AI Act obligations for high-risk systems hit in August 2026. Stateless agent frameworks can't satisfy them. This guide covers seven types of state compliant agents must maintain, four streaming patterns for auditability, and a reference architecture using Kafka and Flink as the control plane.

Manveer Chawla