The Data Streaming Glossary | Confluent

a-d

Agentic Artificial Intelligence

Agentic AI

Agentic AI refers to advanced artificial intelligence systems with autonomous and adaptive decision-making capabilities. An agent can set objectives, devise strategies, and execute multistep tasks with minimal human supervision.

MSK vs. Confluent Cloud

Amazon MSK vs. Confluent Cloud

This in-depth Confluent Cloud and Amazon MSK comparison will show you how each stacks up on scalability, resilience, and platform capabilities.

Apache Iceberg

What is Apache Iceberg?

Apache Iceberg is a high-performance open table format for large analytic datasets.

Data Streaming Platform

Apache Kafka®

Kafka is a distributed data streaming or event streaming engine, used to power event-driven architectures, real-time data pipelines, and much more. Learn what Kafka is, how it works, its benefits and challenges, and popular use cases.

Apache Kafka

Apache Kafka: Benefits and Use Cases

Apache Kafka is an open-source distributed streaming platform that's incredibly popular due to being reliable, durable, and scalable. Created at LinkedIn in 2011 to handle real-time data feeds, today, it's used by over 80% of the Fortune 100 today to build streaming data pipelines, integrate data, enable event-driven architecture, and more.

Kafka Fundamentals

Apache ZooKeeper

Learn about ZooKeeper’s role in managing critical metadata for Kafka, why it’s being replaced by KRaft, the deadline for migration for different Kafka platforms, and how to migrate your clusters safely with Confluent.

API

Application Programming Interface (API)

An application programming interface (API) is a set of protocols that help computer programs interact with one another. Learn how APIs work, with examples, an introduction to each API type, and the best tools to use.

Security

Application Security (AppSec)

Application security refers to the different sets of processes, practices, and tools maintaining the security of the software application against any external threat or vulnerability.

Software Development

Application Integration

Application integration ensures interdependent components of software applications can share data and workflows in real time. Explore the key benefits of different app integration approaches for business efficiency, how it differs from data integration, and the common challenges involved.

Process Improvement

Automotive SPICE

ASPICE is a framework designed to assess and enhance the software development processes within the automotive industry.

Batch Processing

Batch processing is when the processing and analysis happens on a set of data that have already been stored over a period of time. An example is payroll and billing systems that have to be processed weekly or monthly. Learn how batch processing works, when to use it, common tools, and alternatives.

Apache Beam

Beam: Unified Data Pipelines, Batch Processing, and Streaming

Apache Beam is a unified model that defines and executes batch and stream data processing pipelines. Learn Beam architecture, its benefits, examples, and how it works.

Analytics

Big Data

Learn the fundamental definition of big data, including key characteristics known as the "3 V's" (volume, velocity, and variety). You'll discover why the overall concept is so important in enterprise analytics and review its common challenges, benefits, and use cases.

BYOC

Bring Your Own Cloud

Bring Your Own Cloud (BYOC) involves deploying a vendor's software in a customer's cloud environment, typically within their own VPC (Virtual Private Cloud), while data resides in that customer’s cloud environment.

Software Development

Building Real-Time Applications

Learn how data flows through real-time applications and best practices for building them with event-driven architectures that process data as it's created. You'll also explore common use cases, like real-time fraud detection, and understand why Kafka and Flink are essential technologies for powering these systems.

Change Data Capture

Change Data Capture (CDC)

Change Data Capture (CDC) is a software process that identifies, processes, and tracks changes in a database. Ultimately, CDC allows for low-latency, reliable, and scalable data movement and replication between all your data sources.

Data Integration

CI/CD

In today’s fast-paced environment, success in software development depends significantly on development speed, reliability, and security.

Cloud Adoption

Cloud Computing

A comprehensive guide to cloud computing, explaining what it is, how it works, and its various pros and cons. Learn about the different types of cloud services, compare the top cloud providers, and get tips on how to choose the right service for your needs.

Cloud Adoption

Cloud Migration Strategies

Discover six effective cloud migration strategies to transform your business. Learn how to optimize costs, boost scalability, and ensure a smooth transition to the cloud.

Cloud Adoption

Cloud Migrations

There are plenty of benefits for moving to the cloud, however cloud migrations are not a simple, one-time project. Learn how cloud migrations work, and the best way to undergo this complex process.

CQRS

Command Query Responsibility Segregation (CQRS)

CQRS is an architectural design pattern that helps handle commands to read and write data in a scalable way. Learn how it works, its benefits, use cases, and how to get started.

Introduction to CEP

Complex Event Processing (CEP)

Similar to event stream processing, complex event processing (CEP) is a technology for aggregating, processing, and analyzing massive streams of data in order to gain real-time insights from events as they occur.

Compare

Confluent vs. Amazon Kinesis

Compare Confluent and Amazon Kinesis (Data Streams, Firehose, KDA). Understand differences in architecture, use cases, Kafka compatibility, and cloud-native streaming.

Data Fabric

Data fabric architectures enable consistent data access and capabilities across distributed systems. Learn how it’s used, examples, benefits, and common solutions.

Data Flow

Also known as dataflow or data movement, data flow refers to how information moves through a system. Learn how it works, its benefits, and modern dataflow solutions.

Real-Time Data Governance

Data Governance

Data governance is a process to ensure data access, usability, integrity, and security for all the data enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.

Real-Time Data Streaming

Data in Motion

Also known as data in transit or data in flight, data in motion is a process in which digital information is transported between locations either within or between computer systems. The term can also be used to describe data within a computer's RAM that is ready to be read, accessed, updated or processed. Data in motion is one of the three different states of data; the others are data at rest and data in use.

Data Ingestion

Data ingestion is the extraction of data from multiple sources into a data store for further processing and analysis. Learn about ingestion architectures, processes, and the best tools.

Streaming Data Integration

Data Integration

Data integration works by unifying data across disparate sources for a complete view of your business. Learn how data integration works with benefits, examples, and use cases.

Data Storage and Analytics

Data Lakes, Databases, and Data Warehouses

Learn the most common types of data stores: the database, data lake, relational database, and data warehouse. You'll also learn the difference, commonalities, and which to choose.

Data Mesh

Data Mesh Basics, Principles and Architecture

Data mesh is a decentralized approach for data management, data federation, governance designed to enhance data sharing and scalability within organizations.

Streaming Data Pipelines

Data Pipeline

A data pipeline is a set of data processing actions to move data from source to destination. From ingestion and ETL, to streaming data pipelines, learn how it works with examples.

Data Streaming

Data Products

Learn how data products power data streaming, their importance for real-time analytics, and best practices for building effective data integration solutions.

Data Integration

Data Routing

If computer networks were cities, routing would be the interstates and freeways connecting them all, and vehicles would be the data packets traveling along those routes.

Beginner's Guide

Data Serialization

Data serialization can be defined as the process of converting data objects to a sequence of bytes or characters to preserve their structure in an easily storable and transmittable format.

Data Management

Data Strategy

You data strategy should unify your operational and analytical systems, effectively breaking down the "slow, brittle ETL wall." Discover how to create a practical roadmap that aligns with your existing investments and focuses on high-impact value streams to get started.

What is Data Streaming?

Data Streaming

Streaming Data is the continuous, simultaneous flow of data generated by various sources, which are typically fed into a data streaming platform for real-time processing, event-driven applications, and analytics.

Data Streaming

Data Streaming Platform

Learn how a data streaming platform (DSP) enables organizations to capture, store, and process data as a continuous flow of real time events.

Guide to Databases & DBMS

Databases & DBMS

A database is a collection of structured data (or information) stored electronically, which allows for easier access, data management, and retrieval. Learn the different types of databases, how they're used, and how to use a database management system to simplify data management.

Process Automation

Distributed Control System

A Distributed Control System (DCS) is a control system used in industrial processes to manage and automate complex operations.

DISTRIBUTED COMPUTING

Distributed Systems

Also known as distributed computing, a distributed system is a collection of independent components on different machines that aim to operate as a single system.

Dynamic Content Creation

Dynamic content creation is the key to creating personalized experiences that resonate with your audience

Data Management

Data Asset Management

Data asset management is an approach that treat an organization's data as a valuable business asset instead of just bits of information to store and secure. Learn the key pillars for transform raw data into insightful, reliable data assets, including governance, discovery, lineage, and quality.

Compare

Confluent vs. Google Pub/Sub

Read this comparison of Confluent vs. Google Pub/Sub to learn where each technology fits within a modern data architecture and understand how to evaluate your need for strict ordering and replayability of a log-based platform vs. the lightweight, serverless flexibility of a GCP-native messaging queue.

e-l

Enterprise Service Bus (ESB)

An ESB is an architectural pattern that centralizes integrations between applications.

Event Stream Processing

Event Streaming

Event streaming (similar to event sourcing, stream processing, and data streaming) allows for events to be processed, stored, and acted upon as they happen in real-time.

Event-Driven Architecture

Event-driven architecture is a software design pattern that can detect, process, and react to real-time events as they happen. Learn how it works, benefits, use cases, and examples.

Event Sourcing

Event sourcing tracks the current state of the system, and how it evolves over time. Learn how event sourcing works, its benefits and use cases, and how to get started.

Data Integration

Extract Transform Load (ETL)

Extract, Transform, Load (ETL) is a three-step process used to consolidate data from multiple sources. Learn how it works, and how it differs from ELT and Streaming ETL.

Apache Flink

Flink: Unified Streaming and Batch Processing

Apache Flink is an open-source framework that unifies real-time distributed streaming and batch processing. Learn about Flink architecture, how it works, and how it's used.

Apache Flume

Flume: Log Collection, Aggregation, and Processing

Apache Flume is an open-source distributed system designed for efficient data extraction, aggregation, and movement from various sources to a centralized storage or processing system.

Generative Artificial Intelligence

Generative AI (GenAI)

GenAI refers to deep-learning models that generate text, images, audio, and videos from trained data in real time. Learn how GenAI works with use case examples.

Data Streaming

How to Build Streaming Data Pipelines

Building a streaming data pipeline might seem complex, but it can be broken down into manageable steps.

Infrastructure

Infrastructure as Code (IaC)

IaC is a transformative approach to managing IT infrastructure by allowing organizations to define and provision their resources through code rather than manual processes.

Interoperability

Interoperability is when disparate systems, devices, and software can communicate and exchange data in order to accomplish tasks. Here’s why it’s important, how it works, and how to get started.

Data Integration

Integrating Legacy Systems

Legacy systems can often be dusty relics of a bygone era, leaving you with outdated technology.

Kafka Fundamentals

Kafka Auto Offset Reset

Learn what the auto.offset.reset configuration is in Apache Kafka® and how to use this setting to dictate when Kafka consumers start reading messages when the designated consumer group has no consumption history and a previously committed offset is not found.

Apache Kafka

Kafka Backup

Apache Kafka's backup mechanisms are essential components of a robust data infrastructure strategy.

Apache Kafka

Kafka Benefits and Use Cases

Apache Kafka is the most commonly used stream processing / data streaming system. Learn how Kafka benefits companies big and small, why it's so popular, and common use cases.

Kafka Fundamentals

Kafka Cheat Sheet

This Kafka cheat sheet cover essential terminology and core architectural concepts you need to know to succeed with data streaming. Learn the key Kafka CLI commands needed to manage topics, producers, and consumers in this quick-reference guide.

Kafka Fundamentals

Kafka Console

Learn the difference between Kafka's command-line interance (CLI) tools and web UI's like Confluent Cloud Console and see which fits better for your job role and needs, your organizational data challenges, and your business requirements.

Apache Kafka

Kafka Dead Letter Queue

A Kafka Dead Letter Queue (DLQ) is a special type of Kafka topic where messages that fail to be processed by downstream consumers are routed.

Kafka Strategies

Kafka – Horizontal vs. Vertical Scaling

Learn the key differences between horizontal scaling (adding more servers to your cluster), and vertical scaling (increasing the compute resources) in Kafka. You'll understand why Kafka is especially designed to excel at horizontal scaling and how you can use this strategy to handle massive data loads, ensure high availability, and lower your Kafka costs.

Kafka Fundamentals

Kafka Issues in Production

In production, Kafka developers often run into issues like consumer lag, under-provisioned partitions, and broker disk exhaustion. Learn how to diagnose these common Kafka issues and preventative best practices to help you troubleshoot, monitor, and configure your cluster for high performance and reliability.

Apache Kafka

Kafka Message Key

A Kafka message key is an attribute that you can assign to a message in a Kafka topic. Each Kafka message consists of two primary components: a key and a value.

Apache Kafka

Kafka Message Size Limit

The Kafka message size limit is the maximum size a message can be to be successfully produced to a Kafka topic.

Apache Kafka

Kafka MirrorMaker

Learn how Kafka MirrorMaker enables cross-cluster replication. Explore its architecture, setup, use cases, best practices, and troubleshooting tips for seamless Kafka data mirroring.

Apache Kafka

Kafka Partition Key

A partition key in Apache Kafka is a fundamental concept that plays a critical role in Kafka's partitioning mechanism.

Apache Kafka

Kafka Partition Strategy

Apache Kafka partition strategy revolves around how Kafka divides data across multiple partitions within a topic to optimize throughput, reliability, and scalability.

Apache Kafka

Kafka Performance Testing

Learn how to conduct effective Kafka performance testing with this guide. Explore best practices, essential tools, and key metrics to optimize Kafka's throughput, latency, and scalability.

Kafka Strategies

Kafka Rebalancing

Kafka rebalancing is a necessary process where topic partitions are automatically redistributed among consumers in a group to ensure fault tolerance and balance the workload. Learn what triggers a rebalance, its significant impact on performance and latency, and discover strategies to manage and minimize its disruptions.

Kafka Strategies

Kafka Retention

Kafka's retention capabilities allow you to configure topics to automatically delete data after a certain time period or once a specific storage size is reached. Learn about how to configure Kafka retention and the log compaction policy, an alternative mechanism for retaining only the most recent value for each message key.

Kafka Strategies

Kafka Scaling Best Practices

Kafka strategically uses partitions to parallelize processing and achieve elastic horizontal scalability. Discover key best practices for scaling Kafka workloads, including how to select the right number of partitions, monitor consumer lag, and manage both stateful and stateless applications effectively.

Kafka Strategies

Kafka Security Vulnerabilities

Learn about the common security vulnerabilities and misconfigurations that can impact your Kafka environment. You'll read about essential Kafka security best practices, including proper encryption, authentication, and authorization, to help you mitigate these risks and secure your data.

Apache Kafka

Kafka Topic Naming Convention

Kafka Topic Naming convention keeps your data organized and makes it easier to understand, scale, and maintain.

Kafka Strategies

Kafka Tradeoffs

There are critical design tradeoffs (e.g., cost, complexity, feature velocity) that you need to consider when configuring and operating Kafka environments. You'll understand how to balance competing factors like performance and throughput against durability, latency, and overall cost to fit your specific use case.

Compare

Kafka Streams vs. Apache Spark

Learn the fundamental differences between Kafka Streams and Apache Spark, including how their architectures and processing models compare. You'll get get a clear breakdown of their different approaches to state management and see which specific real-time or large-scale analytical use cases are ideal for each technology.

Compare

Kafka vs. Amazon Kinesis

Compare Kafka and Amazon Kinesis and learn architecture and performance differences, ideal use cases, and how Confluent extends Kafka hybrid and multicloud environments.

Compare

Kafka vs. Azure Event Hubs

Learn the differences between Apache Kafka®, Azure Event Hubs, Confluent Cloud so you can understand how well the architecture, integrated platform capabilities, and deployment & configuration options fit your technical requirements and business needs for data streaming use cases.

Compare

Kafka vs. Confluent

Learn the differences between Kafka, an open source streaming engine, and Confluent, an enterprise-grade data streaming platforms, built by the original co-creators of Kafka and complete with pre-built integrations, governance, stream processing, and analytics & AI capabilities.

Compare

Kafka vs. Google Pub/Sub

See how Kafka and Google Pub/Sub compare for asynchronous event communication. Learn architecture, ideal use cases, and the pros and cons of each, as well as how Confluent extends Kafka beyond managed event streaming.

Data Processing

Event Processing

Learn about event processing, when and where to use it, and how it differs from event streaming. You'll understand the differences between simple and complex event processing, as well as how they fit into event-driven architectures built with Apache Kafka and enable you to capture, analyze, and act on data in real time.

m-r

Message Brokers

Message brokers facilitate communication between applications, systems, and services. Learn how they work, their use cases, and how to get started.

Data Architecture

Message Brokers vs. Message Queues

While message brokers and message queues are often used interchangeably, understanding their distinct characteristics and use cases can help you make better architectural decisions.

What are microservices?

Microservices Architecture

Microservices refers to an architectural approach where software applications are composed of small, independently deployable services that communicate with each other over a network.

Middleware Messaging

Middleware

Middleware is a type of messaging that simplifies integration between applications and systems. Learn how middleware works, its benefits, use cases, and common solutions.

Apache Nifi

Nifi - Data Processing, Routing, and Distribution

Apache NiFi is an integrated data logistics platform for automating the movement of data between disparate systems. It can handle various data types and support various protocols. Learn how it's used, and how it works.

Security

NIST SSDF

The National Institute of Standards and Technology's Secure Software Development Framework NIST SSDF is a set of guidelines that are intended to assist organizations in developing their software securely.

Monitoring & Management

Observability

Observability is the ability to measure the current state or condition of your system based on the data it generates. With the adoption of distributed systems, cloud computing, and microservices, observability has become more critical, yet complex.

Data Integration

Operational vs. Analytical Data

There's a massive divide between operational data, which powers real-time daily business activities, and analytical data, which often relies on historical information used for busienss insights and decision-making? Learn the challenges of bridging the gap between these two data types and discover modern solutions for integrating them effectively.

Security

OWASP

OWASP is a non-profit organization dedicated to improving the security of software.

Artificial Intelligence (AI)

Prompts vs. Workflows vs. Agents

Learn the differences between prompts, workflows, and agents. See a breakdown of each, with example use cases and how to select an approach.

PUB/SUB

Publish-Subscribe Messaging

Pub/sub is a messaging framework commonly used for inter-service communication and data integration pipelines. Learn how it works, with examples, benefits, and use cases.

RabbitMQ

RabbitMQ is a message broker that routes messages between two applications. Learn how RabbitMQ works, common use cases, pros, cons, and best alternatives.

RabbitMQ vs Apache Kafka

RabbitMQ and Apache Kafka are both open-source distributed messaging systems, but they have different strengths and weaknesses.

Retrieval-Augmented Generation

RAG

RAG leverages real-time, domain-specific data to improve the accuracy of LLM-generated responses and prevent hallucinations. Learn how RAG works with use case examples.

real-time data (RTD)

Real-Time Data & Analytics

Real-time data (RTD) refers to data that is processed, consumed, and/or acted upon immediately after it's generated. While data processing is not new, real-time data streaming is a newer paradigm that changes how businesses run.

Redpanda

Redpanda vs Kafka

A complete comparison of Kafka vs Redpanda and two cloud Kafka services - Confluent vs Redpanda. Learn how each works, the pros and cons, and how their features stack up.

Data Quality

Refactoring

Refactoring is an important part of software development that optimizes the code's internal structure without changing how the application works on the outside.

Data Integration

Rest API

REST API stands for Representational State Transfer. Learn more about REST API, how it simplifies server communication, and how it leverages large-scale data.

Data Architecture

Real-Time Streaming Architectures

Learn core concepts of real-time streaming architectures as you explore real-world examples from industries like finance, e-commerce, and healthcare to understand how Apache Kafka is used to build these powerful, event-driven systems.

s-z

Security

Security Information and Event Management (SIEM)

SIEM involves aggregating and analyzing log data to detect threats based on security event data. You'll learn the key benefits of implementing a SIEM solution, such as preventing security breaches, enabling rapid incident response, and ensuring regulatory compliance.

Kafka Fundamentals

Serverless Kafka

Learn what serverless Apache Kafka® is, how it works, and how it's helping accelerating streaming innovation. And explore use cases to try with serverless Kafka on Confluent Cloud.

Data Integration

Shift Left

Shift Left in data integration is derived from the software engineering principles of Shift Left Testing where tests are performed earlier in the software development lifecycle to improve quality of software, accelerate time to market, and identify issues earlier.

Apache Kafka

The Data Glossary

a-d

e-l

m-r

s-z