Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

The Data Glossary

Get introductions for all real-time data technologies and concepts including data streaming, ETL pipelines, IT architecture, and event-driven systems from Confluent, the original creators of Apache Kafka.

all
a-d
e-l
m-r
s-z

a-d

Apache Kafka

Apache Kafka: Benefits and Use Cases

Apache Kafka is an open-source distributed streaming platform that's incredibly popular due to being reliable, durable, and scalable. Created at LinkedIn in 2011 to handle real-time data feeds, today, it's used by over 80% of the Fortune 100 today to build streaming data pipelines, integrate data, enable event-driven architecture, and more.

API

Application Programming Interface (API)

An application programming interface (API) is a set of protocols that help computer programs interact with one another. Learn how APIs work, with examples, an introduction to each API type, and the best tools to use.

Batch vs Real-Time Streams

Batch Processing

Batch processing is when the processing and analysis happens on a set of data that have already been stored over a period of time. An example is payroll and billing systems that have to be processed weekly or monthly. Learn how batch processing differs from stream processing, and the best toosl to get started.

Change Data Capture

Change Data Capture (CDC)

Change Data Capture (CDC) is a software process that identifies, processes, and tracks changes in a database. Ultimately, CDC allows for low-latency, reliable, and scalable data movement and replication between all your data sources.

Cloud Adoption

Cloud Migrations

There are plenty of benefits for moving to the cloud, however cloud migrations are not a simple, one-time project. Learn how cloud migrations work, and the best way to undergo this complex process.

Introduction to CEP

Complex Event Processing (CEP)

Similar to event stream processing, complex event processing (CEP) is a technology for aggregating, processing, and analyzing massive streams of data in order to gain real-time insights from events as they occur.

Real-Time Data Governance

Data Governance

Data governance is a process to ensure data access, usability, integrity, and security for all the data enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.

Real-Time Data Streaming

Data in Motion

Also known as data in transit or data in flight, data in motion is a process in which digital information is transported between locations either within or between computer systems. The term can also be used to describe data within a computer's RAM that is ready to be read, accessed, updated or processed. Data in motion is one of the three different states of data; the others are data at rest and data in use.

Streaming Data Integration

Data Integration

Data integration works by unifying data across disparate sources for a complete view of your business. Learn how data integration works with benefits, examples, and use cases.

Data Storage and Analytics

Data Lakes, Databases, and Data Warehouses

Learn the most common types of data stores: the database, data lake, relational database, and data warehouse. You'll also learn the difference, commonalities, and which to choose.

Streaming Data Pipelines

Data Pipeline

A data pipeline is a set of data processing actions to move data from source to destination. From ingestion and ETL, to streaming data pipelines, learn how it works with examples.

What is Data Streaming?

Data Streaming

Streaming Data is the continuous, simultaneous flow of data generated by various sources, which are typically fed into a data streaming platform for real-time processing, event-driven applications, and analytics.

Guide to Databases & DBMS

Databases & DBMS

A database is a collection of structured data (or information) stored electronically, which allows for easier access, data management, and retrieval. Learn the different types of databases, how they're used, and how to use a database management system to simplify data management.

DISTRIBUTED COMPUTING

Distributed System

Also known as distributed computing, a distributed system is a collection of independent components on different machines that aim to operate as a single system.

e-l

Event Stream Processing

Event Streaming

Event streaming (similar to event sourcing, stream processing, and data streaming) allows for events to be processed, stored, and acted upon as they happen in real-time.

Data Pipelines/Integration

Extract Transform Load (ETL)

Extract, Transform, Load (ETL) is a three-step process used to consolidate data from multiple sources. Learn how it works, and how it differs from ELT and Streaming ETL.

Apache Kafka

Kafka Benefits and Use Cases

Learn how Kafka benefits companies big and small, why it's so popular, and common use cases.

m-r

Monitoring & Management

Observability

Observability is the ability to measure the current state or condition of your system based on the data it generates. With the adoption of distributed systems, cloud computing, and microservices, observability has become more critical, yet complex.

PUB/SUB

Publish-Subscribe Messaging

Pub/sub is a messaging framework commonly used for inter-service communication and data integration pipelines. Learn how it works, with examples, benefits, and use cases.

real-time data (RTD)

Real-Time Data & Analytics

Real-time data (RTD) refers to data that is processed, consumed, and/or acted upon immediately after it's generated. While data processing is not new, real-time data streaming is a newer paradigm that changes how businesses run.

Redpanda

Redpanda vs Kafka

A complete comparison of Kafka vs Redpanda and two cloud Kafka services - Confluent vs Redpanda. Learn how each works, the pros and cons, and how their features stack up.

s-z

Streaming vs Batch Processing

Stream Processing

Stream processing allows for data to be ingested, processed, and managed in real-time, as it's generated. Learn how streaming differs from batch processing, how it works, and the best technologies to get started.

Streaming

Streaming Data Pipelines

Streaming data pipelines move data from multiple sources to multiple target destinations in real time. Learn how they work, with examples and demos.

ETL vs ELT vs Streaming ETL

Streaming ETL vs ELT vs ETL

What is ETL vs ELT streaming, and how are they different from streaming ETL pipelines? Learn the differences between data pipeline and integration tools, their processes, and which to choose.