[Demo] Design Event-Driven Microservices for Cloud → Register Now

Real-Time Data Streaming for Smart Warehouses

Written By

In the bustling world of retail, the concept of a smart warehouse is a game-changer. Picture this: automated systems seamlessly managing inventory counts and fulfillment orders, monitoring shelf weights, and orchestrating robot fleets for lightning-fast retrievals. Welcome to smart warehousing, where companies are revolutionizing traditional warehousing practices through cutting-edge technologies. According to Gartner, “By 2027, more than 75% of companies will have adopted some form of cyber-physical automation in their warehouse operations.”

At the heart of this transformation is a convergence of IoT, robotics, cloud-native tools and platforms, machine learning, and data streaming—all aimed at enhancing warehouse efficiency and accuracy. From live dashboards, IoT sensors monitoring humidity and temperature, algorithms dictating optimal shelf stocking to automated reporting and fulfillment, real-time data powering these innovations not only streamline operations and save costs but also offer a competitive edge, particularly in the era of same-day (and same-hour) delivery expectations.

Learn how Walmart made real-time inventory and replenishment a reality →

Successfully implementing data streaming for smart warehouses brings significant day-to-day benefits to retailers operating on a global scale that have a unique blend of e-commerce and brick-and-mortar stores:

  • Machine learning optimizes time for forklifts to stock shelves, with algorithms for stocking heavy or popular items toward the front of the shelf to enable easier access and accelerated order fulfillment.

  • FIFO (first in, first out) warehousing for perishable items where older inventory units are sold first or in order of their expiration dates—preventing obsolescence and spoilage. 

  • Automation for greater efficiency and accuracy, with robots automating order picking and minimizing rate of error. (Gartner estimates that “96% of businesses they spoke to say they either are—or are planning to—look at robotics” and “over 90% of customers say they plan to increase the size of their fleet.”)

  • Time savings and productivity boost for operators who no longer have to perform menial tasks such as walking around warehouses to count shelf space or add up weights in a given area. Data streaming puts this information at their fingertips so they know exactly whether a shelf can handle a given weight and how much space remains in dry and refrigerated areas, for example.

  • Predictive maintenance for robots that run on batteries, preventing downtime and extending equipment lifespan.

Today’s business and technical challenges 

In reality, implementing a smart warehouse and ensuring that it operates smoothly end to end is no easy feat. Consider FIFO, which is highly challenging to do at scale for millions of SKUs with varying product sizes, shapes, weights, and storage requirements. Chips are placed on new pallets to label them with a unique RFID. Then, pallets are scanned before leaving in order to determine whether there is an older, existing pallet in the warehouse. Without access to real-time data, retailers that aim for FIFO instead find themselves with FILO (first in, last out) where older, perishable or seasonal inventory is leaving shelves later, resulting in warehouse inefficiencies and higher costs. Other business challenges include:

  • Lack of real-time inventory analysis and reporting, with unreliable status of products on shelves

  • Higher rate of error due to inventory inaccuracies, undetected faulty IoT sensors or equipment, delayed communication between different systems and devices

  • Waste (products expire before they can be sold) or overordering, which result in financial losses

  • Out-of-stock items and delayed restocking, leading to canceled orders, unhappy customers, and greater risk of churn

At the same time, there are infrastructure and data challenges that hinder smart warehouses and divert development teams from building innovative new features, instead entangling them in the complexities of low-level integration tasks:

  • Batch-based ETL/ELT pipelines with hours-long or overnight processing, impeding same-day grocery delivery, for example

  • Inability to harness real-time sensor data and use of manual data entry, which is more error-prone and leads to slower decision-making

  • Point-to-point integrations adding to data pipeline sprawl

  • Legacy technologies such as messaging queues or third-party integration tools that don't allow for real-time data ingestion, relying on queries to pull inventory data from warehouses in different regions

  • Siloed data across operational databases, mainframe, and other systems

  • Difficulties joining inventory data with dimensional data in real time

Learn how to build streaming data pipelines →

How Confluent brings real time to smart warehouses

With Confluent, smart warehouses can run on real-time data to power greater automation, efficiency, and cost savings. To overcome the above challenges, retailers can leverage Confluent Data Streaming Platform to stream, connect, process, and govern data at scale:

  • Stream: Confluent is available on-premises and in any cloud, making it easy to stream across any environment to support warehouse operations on a global scale. 

  • Connect: Bring together all product inventory data, warehouse infrastructure data (e.g., shelf spacing, aisle width, rack configuration, floor plans, concrete vs. gravel floors, utilities), location data, logistics data, and more. Build streaming data pipelines to break down data silos and unlock real-time data flow (and integrate with acquired companies). Simplify and future-proof your data architecture away from point-to-point integrations, leveraging Confluent’s 120+ pre-built connectors to quickly connect source and destination systems. 

  • Process: Leverage stream processing with Apache Flink to join, enrich, transform, and normalize warehouse data in real time, all through simple SQL syntax. Teams can apply simple business logic such as stateless filtering or stateful aggregations, or design complex business rules with user-defined functions. Create materialized views to serve real-time inventory dashboards and reporting without needing to move data to another system and gain the ability to make operational decisions in real time. 

  • Govern: Guarantee data quality so that teams can focus on accelerating order fulfillment for customers, while ensuring trust and compliance in the data you depend on for detecting sensor/equipment anomalies, inventory fraud, or triggering immediate action based on real-time business-impacting events (e.g., shipping delays due to inclement weather).

Having a data streaming foundation in place also unlocks artificial intelligence (AI) and machine learning (ML) use cases such as FIFO with predictive analytics or building a GenAI warehouse chat assistant that can instantly answer questions such as:

  • Which products are expiring in the next 30 days? 

  • How much of this product is left? 

  • Where is this product on the shelves? 

  • Is this product in stock within 100 miles? 

  • Based on seasonal demand and how quickly it’s expected to sell, when should this product be restocked?

Abstracting away low-level data integration and processing, Confluent enables development teams to focus on building new features such as back-in-stock notifications and accelerating their time to market.

Solution implementation

This diagram provides an overview of the deployment architecture for a real-time smart warehouse in Confluent Platform and Confluent Cloud.

(See full-size image)

Inventory data in SQL Server and PostgreSQL as well as battery status and picker robot status from custom producers is written to topics on an on-prem cluster in Confluent Platform. At the same time, IoT sensor and equipment data from the warehouse is written to topics in Confluent Cloud via an MQTT source connector. Cluster linking connects these on-prem and cloud clusters, mirroring topics so that data can be seamlessly shared.

In Confluent Cloud, stream processing with Flink or ksqlDB joins and enriches data streams to create ready-to-use data products: Alert for Low Battery, Inventory Count, and Robots Needs Attn. These data products are shared with downstream systems such as custom apps for battery and robot alerts, Google BigQuery to create inventory dashboards, and GCS to train ML models for optimizing robot routes within the warehouse.

Here are sample Flink SQL queries for this use case—creating smart warehouse alerting for on-site maintenance crews when picker robot batteries are low or are decreasing at a rate that's too fast:

-- Flink SQL Example:
-- Create alerts when picking robots have batteries below the low charge threshold OR
-- when picking robots have batteries that are losing charge at a higher rate than acceptable.

CREATE TABLE battery_thresholds (
  `battery_type_id` INT NOT NULL
  , `battery_type` STRING  
  , `min_charge_percentage_allowed_for_active_pickers` DECIMAL(10,2)
  , `max_rate_of_decreasing_charge_per_hour` DECIMAL(10,2)
  , PRIMARY KEY(`battery_type_id`) NOT ENFORCED

-- Insert a record to set thresholds.
INSERT INTO battery_thresholds
VALUES (111, 'robot_picker_battery',.10,.05);

CREATE TABLE picker_robot_battery_status(
  `picker_robot_id` INT NOT NULL
  , `battery_type_id` INT NOT NULL  
  , `battery_charge` DECIMAL(10,2)
  , `event_time` TIMESTAMP_LTZ(3)

  , WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND  

-- Alert when a robot picker battery is below the minimum threshold.
  , 'LOW BATTERY' as alert
  , prbs.battery_charge
  , prbs.event_time
FROM picker_robot_battery_status prbs
LEFT JOIN battery_thresholds bt
  ON prbs.battery_type_id = bt.battery_type_id
WHERE prbs.battery_charge < bt.min_charge_percentage_allowed_for_active_pickers;

-- Alert when a robot picker battery is losing a charge faster than the maximum rate threshold.
CREATE TABLE picker_robot_battery_charge_loss (
  `window_start` TIMESTAMP_LTZ(3)
  , `window_end` TIMESTAMP_LTZ(3)  
  , `picker_robot_id` INT
  , `battery_type_id` INT
  , `charge_loss_per_hour` DECIMAL(10,2)

INSERT INTO picker_robot_battery_charge_loss(
  , window_end
  , picker_robot_id
  , battery_type_id
  , charge_loss_per_hour

  , window_end
  , picker_robot_id
  , battery_type_id
  , MAX(battery_charge) - MIN(battery_charge) as charge_loss_per_hour
	TUMBLE(TABLE picker_robot_battery_status, DESCRIPTOR(event_time), INTERVAL '1' hour))
GROUP BY window_start, window_end, picker_robot_id, battery_type_id;

  , prbcl.battery_type_id
  , prbcl.charge_loss_per_hour
  , bt.max_rate_of_decreasing_charge_per_hour
FROM picker_robot_battery_charge_loss prbcl
LEFT JOIN battery_thresholds bt
  ON prbcl.battery_type_id = bt.battery_type_id
WHERE prbcl.charge_loss_per_hour > bt.max_rate_of_decreasing_charge_per_hour;


Leveraging Confluent in smart warehouses brings greater automation, efficiency, and productivity—providing real-time self-service data access for warehouse management systems and teams to work together more effectively.

Confluent Data Streaming Platform helps reduce waste for perishable products, thanks to real-time inventory and better resource allocation (e.g., reducing time off shelf, allowing for faster loading of trucks, and optimizing routes for forklifts that navigate warehouses). Additionally, real-time data can be used to prioritize forklift tasks based on urgency (same-hour delivery vs. next-day deliveries), ensuring that customer demands are met promptly and efficiently. Real-time dashboards and IoT data integration improves communication in the warehouse and allows employees to access up-to-the-minute data on mobile devices to make informed decisions on the go. 

Providing real-time data for route optimization algorithms enables substantial cost savings, while innovative packing strategies also reduces labor costs. By detecting real-time patterns within the warehouse and leveraging analytics with ML, operations can be continuously streamlined. 

The benefits of Confluent for smart warehouses extend beyond efficiency gains—it helps retailers shift toward more agile and more sustainable warehouse operations, supporting continued innovation in the ever-evolving logistics landscape.

To learn more, here are additional resources: 

  • Dillon is an Advisory Solutions Engineer at Confluent. Coming from a background in data engineering and architecture, he helps customers with a need for streaming data align a viable architecture to solve their business needs. Guiding customers on their streaming journey, he provides consulting services in the areas of data engineering best practices, application lifecycle, and cloud development.

Did you like this blog post? Share it now

MiFID II: Data Streaming for Post-Trade Reporting

The Markets in Financial Instruments Directive II (MiFID II) came into effect in January 2018, aiming to improve the competitiveness and transparency of European financial markets. As part of this, financial institutions are obligated to report details of trades and transactions (both equity and...

Unlocking the Edge: Data Streaming Goes Where You Go with Confluent

While cloud computing adoption continues to accelerate due to its tremendous value, it has also become clear that edge computing is better suited for a variety of use cases. Organizations are realizing the benefits of processing data closer to its source, leading to reduced latency, security and...