Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

The Data Glossary

Get introductions for all real-time data technologies and concepts including data streaming, ETL pipelines, IT architecture, and event-driven systems from Confluent, the original creators of Apache Kafka.

all
a-d
e-l
m-r
s-z

a-d

Agentic Artificial Intelligence

Agentic AI

Agentic AI refers to advanced artificial intelligence systems with autonomous and adaptive decision-making capabilities. An agent can set objectives, devise strategies, and execute multistep tasks with minimal human supervision.

Apache Kafka

Apache Kafka: Benefits and Use Cases

Apache Kafka is an open-source distributed streaming platform that's incredibly popular due to being reliable, durable, and scalable. Created at LinkedIn in 2011 to handle real-time data feeds, today, it's used by over 80% of the Fortune 100 today to build streaming data pipelines, integrate data, enable event-driven architecture, and more.

API

Application Programming Interface (API)

An application programming interface (API) is a set of protocols that help computer programs interact with one another. Learn how APIs work, with examples, an introduction to each API type, and the best tools to use.

Security

Application Security (AppSec)

Application security refers to the different sets of processes, practices, and tools maintaining the security of the software application against any external threat or vulnerability.

Process Improvement

Automotive SPICE

ASPICE is a framework designed to assess and enhance the software development processes within the automotive industry.

Batch Processing

Batch Processing

Batch processing is when the processing and analysis happens on a set of data that have already been stored over a period of time. An example is payroll and billing systems that have to be processed weekly or monthly. Learn how batch processing works, when to use it, common tools, and alternatives.

Apache Beam

Beam: Unified Data Pipelines, Batch Processing, and Streaming

Apache Beam is a unified model that defines and executes batch and stream data processing pipelines. Learn Beam architecture, its benefits, examples, and how it works.

BYOC

Bring Your Own Cloud

Bring Your Own Cloud (BYOC) involves deploying a vendor's software in a customer's cloud environment, typically within their own VPC (Virtual Private Cloud), while data resides in that customer’s cloud environment.

Change Data Capture

Change Data Capture (CDC)

Change Data Capture (CDC) is a software process that identifies, processes, and tracks changes in a database. Ultimately, CDC allows for low-latency, reliable, and scalable data movement and replication between all your data sources.

Data Integration

CI/CD

In today’s fast-paced environment, success in software development depends significantly on development speed, reliability, and security.

Infrastructure

Cloud Computing vs Distributed Systems

Learn about the key differences between cloud computing and distributed systems, their benefits, use cases, and how to choose the best fit for your IT strategy.

Cloud Adoption

Cloud Migration Strategies

Discover six effective cloud migration strategies to transform your business. Learn how to optimize costs, boost scalability, and ensure a smooth transition to the cloud.

Cloud Adoption

Cloud Migrations

There are plenty of benefits for moving to the cloud, however cloud migrations are not a simple, one-time project. Learn how cloud migrations work, and the best way to undergo this complex process.

CQRS

Command Query Responsibility Segregation (CQRS)

CQRS is an architectural design pattern that helps handle commands to read and write data in a scalable way. Learn how it works, its benefits, use cases, and how to get started.

Introduction to CEP

Complex Event Processing (CEP)

Similar to event stream processing, complex event processing (CEP) is a technology for aggregating, processing, and analyzing massive streams of data in order to gain real-time insights from events as they occur.

Data Fabric

Data Fabric

Data fabric architectures enable consistent data access and capabilities across distributed systems. Learn how it’s used, examples, benefits, and common solutions.

Data Flow

Data Flow

Also known as dataflow or data movement, data flow refers to how information moves through a system. Learn how it works, its benefits, and modern dataflow solutions.

Data Architecture

Data Flow Design

Good data flow enhances the efficiency, scalability, responsiveness, and reliability of systems.

Real-Time Data Governance

Data Governance

Data governance is a process to ensure data access, usability, integrity, and security for all the data enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.

Real-Time Data Streaming

Data in Motion

Also known as data in transit or data in flight, data in motion is a process in which digital information is transported between locations either within or between computer systems. The term can also be used to describe data within a computer's RAM that is ready to be read, accessed, updated or processed. Data in motion is one of the three different states of data; the others are data at rest and data in use.

Data Ingestion

Data Ingestion

Data ingestion is the extraction of data from multiple sources into a data store for further processing and analysis. Learn about ingestion architectures, processes, and the best tools.

Streaming Data Integration

Data Integration

Data integration works by unifying data across disparate sources for a complete view of your business. Learn how data integration works with benefits, examples, and use cases.

Data Integration

Data Integration Best Practices

This guide explores essential data integration best practices that will help you streamline your processes and maximize the value of your data assets.

Data Integration

Data Integration Security

Data integration security protects sensitive information during the integration process.

Data Storage and Analytics

Data Lakes, Databases, and Data Warehouses

Learn the most common types of data stores: the database, data lake, relational database, and data warehouse. You'll also learn the difference, commonalities, and which to choose.

Data Mesh

Data Mesh Basics, Principles and Architecture

Data mesh is a decentralized approach for data management, data federation, governance designed to enhance data sharing and scalability within organizations.

Streaming Data Pipelines

Data Pipeline

A data pipeline is a set of data processing actions to move data from source to destination. From ingestion and ETL, to streaming data pipelines, learn how it works with examples.

Data Integration

Data Routing

If computer networks were cities, routing would be the interstates and freeways connecting them all, and vehicles would be the data packets traveling along those routes.

Beginner's Guide

Data Serialization

Data serialization can be defined as the process of converting data objects to a sequence of bytes or characters to preserve their structure in an easily storable and transmittable format.

What is Data Streaming?

Data Streaming

Streaming Data is the continuous, simultaneous flow of data generated by various sources, which are typically fed into a data streaming platform for real-time processing, event-driven applications, and analytics.

Data Streaming

Data Streaming Platform

Learn how a data streaming platform (DSP) enables organizations to capture, store, and process data as a continuous flow of real time events.

Guide to Databases & DBMS

Databases & DBMS

A database is a collection of structured data (or information) stored electronically, which allows for easier access, data management, and retrieval. Learn the different types of databases, how they're used, and how to use a database management system to simplify data management.

Process Automation

Distributed Control System

A Distributed Control System (DCS) is a control system used in industrial processes to manage and automate complex operations.

DISTRIBUTED COMPUTING

Distributed Systems

Also known as distributed computing, a distributed system is a collection of independent components on different machines that aim to operate as a single system.

Dynamic Content Creation

Dynamic Content Creation

Dynamic content creation is the key to creating personalized experiences that resonate with your audience

e-l

Enterprise Service Bus (ESB)

Enterprise Service Bus (ESB)

An ESB is an architectural pattern that centralizes integrations between applications.

Event Stream Processing

Event Streaming

Event streaming (similar to event sourcing, stream processing, and data streaming) allows for events to be processed, stored, and acted upon as they happen in real-time.

Event-Driven Architecture

Event-Driven Architecture

Event-driven architecture is a software design pattern that can detect, process, and react to real-time events as they happen. Learn how it works, benefits, use cases, and examples.

Event Sourcing

Event Sourcing

Event sourcing tracks the current state of the system, and how it evolves over time. Learn how event sourcing works, its benefits and use cases, and how to get started.

Data Integration

Extract Load Transform (ELT)

ELT (Extract, Load, Transform) is a data integration process where raw data is loaded first and transformation happens after.

Data Integration

Extract Transform Load (ETL)

Extract, Transform, Load (ETL) is a three-step process used to consolidate data from multiple sources. Learn how it works, and how it differs from ELT and Streaming ETL.

Data Integration

Extract Transform Load Examples

Fast-paced businesses rely on smooth data exchange and analysis between systems.