Hands-on Workshop: ZooKeeper to KRaft Without the Hassle | Secure Your Spot
Traditionally, compliance teams have had to rely on batch exports for their audit logs, a method that, while functional, is proving to be woefully inadequate in today's fast-paced digital landscape. The truth is, waiting hours, or even days, for batch exports of your audit data leaves your organization vulnerable.
We’re long past the point where the ability to monitor and audit system activity is a luxury—it's a critical necessity, whether you’re an enterprise organization in a highly regulated field like healthcare or finance or in a local small to mid-size business. Threats are everywhere, and the stakes are high. That’s why many organizations have turned to real-time data streaming to achieve immediate visibility into all system events, ensuring the integrity of their compliance posture.
TL;DR: Build an Apache Kafka®-based real-time audit pipeline with Avro + Schema Registry, Kafka Connect for ingestion, a stream processor for normalization/enrichment, and immutable/WORM object storage or Tiered Storage for long-term retention. Enforce role-based access control (RBAC), lineage, and schema validation; surface real-time alerts and auditor-friendly access.
Ready to get started? Start building with Confluent Cloud, a serverless data streaming platform that combines the best of Kafka, Apache Flink®, streaming governance, and more.
Any delay between security events and time to insight represent significant blind spots, hindering your ability to detect anomalies, respond to security incidents, or prove compliance promptly. In a world where minutes and even seconds matter, batch processing is simply too slow to meet modern compliance requirements.
Real-time pipelines transform audit logging from a reactive, retrospective task into a proactive, continuous process, empowering compliance teams with the timely insights needed to safeguard data, uphold regulatory standards, and mitigate risks effectively.
Building a pipeline for compliance and audit logging isn't just about collecting data; it's about building a system of trust and accountability. To meet the stringent demands of regulations like HIPAA, PCI DSS, and SOC 2, your architecture must be flawless. Here are the five non-negotiable requirements for a system you can depend on.
Audit logs serve as the definitive record of “who did what, and when”. These records must be completely accurate and, most importantly, immutable. Once an event is written, it cannot be altered or deleted. This tamper-proof characteristic is fundamental to a trustworthy audit trail. Streaming platforms like Apache Kafka excel here, as their underlying architecture is an append-only log, providing a natural safeguard against data tampering.
Regulations often mandate that audit data be retained for extended periods, often seven years or more. Your system must be designed for cost-effective, long-term retention without sacrificing accessibility. Modern data streaming platforms solve this by using tiered storage, which automatically moves older data to cheaper object storage while keeping it fully queryable, ensuring you can meet regulatory obligations without breaking the bank.
Audit logs often contain highly sensitive information. Protecting this data requires a comprehensive security strategy and compliance-ready pipeline:
Encryption: Data must be encrypted both in transit and at rest.
Authentication: Every access request must be securely authenticated.
Authorization: Granular, role-based access control (RBAC) is essential to ensure that only authorized personnel can view or manage audit data.
Waiting for batch processes is no longer sufficient. For effective compliance and security monitoring, you need to know about critical events as they happen. A real-time pipeline must ingest, process, and deliver data with extremely low latency. This enables immediate alerting on suspicious activities, helping security teams move from reactive investigation to proactive threat detection and mitigation.
Auditors need proof that your data is trustworthy. Data lineage provides a clear map of where your data came from, what transformations it underwent, and where it went. Strong data governance practices are critical here. Using a tool like a Schema Registry ensures that your audit data conforms to a strict, predefined structure, preventing data quality issues and making the entire pipeline transparent and auditable. This structured approach is foundational for building reliable and trustworthy compliance systems.
Together, these five requirements form the foundation of a reliable, secure, and compliant audit logging pipeline, ensuring organizations can meet regulatory obligations while maintaining operational trust.
Building a real-time compliance and audit logging pipeline requires a well-designed architecture that can handle high throughput, ensure data integrity, and provide timely access to information. Below is a block diagram illustrating a common, robust pattern, highlighting key components and considerations for each layer.
A robust pattern for real-time compliance and audit logging pipelines built with Kafka, a stream processor, and connectors with external source, storage, and analytics systems
This is where all audit-relevant events originate and are first captured.
Sources: Events can come from various origins:
Applications: Internal services and microservices generate application-specific audit logs (e.g., user logins, data modifications, API calls).
Infrastructure: Cloud provider logs (AWS CloudTrail, Azure Monitor, GCP Audit Logs), Kubernetes events, network device logs, security appliance logs, and operating system logs.
Databases: Change Data Capture (CDC) from transactional databases for critical data changes.
Connectors: Kafka Connectors (available on Confluent Hub) are crucial here. They provide pre-built, fault-tolerant, and scalable means to ingest data from a vast array of sources into Kafka topics with minimal custom code. This includes sources like S3, JDBC databases, various log files, and even proprietary systems.
Once events are in Kafka, the processing layer transforms raw data into compliance-ready audit records. This layer leverages Kafka's stream processing capabilities.
Technology: Kafka Streams or Apache Flink® are ideal for this layer due to their ability to process events with low latency and high throughput.
Functions:
Normalization: Raw audit events often come in disparate formats. This step standardizes them into a consistent schema (e.g., using Avro with Schema Registry).
Enrichment: Adding context to events, such as linking user IDs to full user profiles, adding geo-location data, or cross-referencing with threat intelligence feeds.
Validation: Ensuring that events meet predefined rules and compliance policies. Flagging or routing non-compliant events for immediate attention.
Filtering: Dropping irrelevant events to reduce storage costs and noise.
Real-Time Aggregation: Creating summary events for high-level dashboards (e.g., "login failures per minute").
This layer is designed for durable, immutable, and long-term retention of audit data. That’s especially important for financial services organizations subject to strict regulatory requirements that require them to always stay audit-ready.
Primary Storage: Kafka topics themselves provide immediate, durable storage for a configurable retention period (e.g., a few days or weeks).
Immutable/WORM Object Store: For long-term retention (7+ years), processed audit logs are sunk from Kafka into a cost-effective, Write Once, Read Many (WORM) object storage solution like Amazon S3, Google Cloud Storage, or Azure Blob Storage. This offers high durability, scalability, and immutability crucial for compliance.
Tiered Storage: Kafka's tiered storage capabilities allow older data to be automatically offloaded to cheaper object storage while still being discoverable via Kafka, balancing cost and accessibility.
This layer provides the interfaces for various stakeholders to access and analyze the audit data.
Real-Time Monitoring Dashboards: Tools like Grafana, Kibana, or custom dashboards fed by Kafka streams provide immediate visualization of system activity, security alerts, and compliance metrics.
REST APIs for Auditors: Secure APIs allow compliance officers and auditors to programmatically query specific audit trails, generate reports, and demonstrate adherence to regulations.
SQL Query Engine: For deeper, ad-hoc analysis, the data in the object store can be queried using tools like Presto, Apache Hive, or cloud-native data warehouses that integrate with object storage.
For this architecture, governance shouldn’t be treated as a discrete layer but instead as a set of practices and technologies woven throughout the entire pipeline.
Role-Based Access Control (RBAC): Granular permissions are applied across Kafka topics, connectors, and access APIs to ensure only authorized personnel can view, produce, or consume audit data.
Data Masking or Anonymization: Sensitive information within audit logs (e.g., PII, passwords) is masked or anonymized during processing to comply with privacy regulations.
Data Lineage and Audit Trails: Metadata management tools track the journey of each audit event from its source through all transformations to its final resting place, providing a complete audit trail for the audit data itself. This helps answer "where did this data come from?" and "how was it processed?".
This reference design provides a practical, step-by-step guide to building a robust, real-time compliance pipeline using the Confluent ecosystem. We'll cover everything from defining a structured data format to processing, storing, and querying your audit events.
Before you ingest any data, you must define a clear and consistent schema. A schema acts as a formal contract for your audit data, ensuring every event is structured correctly, which is essential for reliable processing, querying, and long-term governance. We recommend using Apache Avro for its strong typing, schema evolution capabilities, and compact serialization.
Here's a sample Avro schema for a typical audit event:
This schema is centrally managed using Confluent Schema Registry, which validates that all data written to a topic conforms to the defined structure.
With your schema in place, the next step is to ingest data from your sources. Kafka Connect is a framework for scalably and reliably streaming data between Kafka and other systems. You can use pre-built connectors to pull from virtually any source without writing custom code.
Syslog Events: Use the Syslog NG Source Connector to capture logs directly from your network devices and servers.
Cloud Provider Audit Logs: Use the Amazon S3 Source Connector to stream logs from services like AWS CloudTrail, which are often written to S3 buckets. Similar connectors exist for Azure and GCP.
Database Changes: Use a JDBC or Debezium CDC connector to capture audit trails from your databases.
Raw logs are often noisy and lack context. Use a stream processing framework to transform them into valuable, compliance-ready records.
Technology: Apache Flink or Kafka Streams are ideal for this. Both integrate seamlessly with Kafka and allow for stateful processing at scale.
Tasks:
Normalization: Parse raw log lines (e.g., from syslog) and map them to your Avro schema.
Enrichment: Add valuable context. For example, join the user_id with an employee database to add the user's name and role, or use the source_ip to look up geolocation data.
Filtering: Drop events that are not relevant to your compliance objectives to reduce noise and storage costs.
Raw logs are often noisy and lack context. Use a stream processing framework to transform these data streams into valuable, compliance-ready records.
Technology: Flink or Kafka Streams are ideal for this. Both integrate seamlessly with Kafka and allow for stateful processing at scale.
Tasks:
Normalization: Parse raw log lines (e.g., from syslog) and map them to your Avro schema.
Enrichment: Add valuable context. For example, join the user_id
with an employee database to add the user's name and role, or use the source_ip
to look up geolocation data.
Filtering: Drop events that are not relevant to your compliance objectives to reduce noise and storage costs.
If you need processed audit logs to be stored securely for years, the storage layer needs to be cost-effective, durable, and immutable.
Strategy: Use Kafka Connect to sink your processed audit topic into a long-term object store.
Immutable Storage: Sink the data to an Amazon S3 bucket with Object Lock enabled or a similar WORM-compliant storage solution. This cryptographically guarantees that once the data is written, it cannot be modified or deleted for its entire retention period.
Tiered Storage: Confluent Cloud's infinite retention capabilities automatically moves older data from Kafka's performance-optimized storage to cheaper, long-term object storage. Crucially, this data remains fully queryable through Kafka, seamlessly blending real-time and historical data.
Auditors and security teams need immediate access to this data.
For Streaming Queries: Use Flink SQL or ksqlDB to query the stream of audit events in real-time. This is perfect for building live dashboards or creating alerts based on complex event patterns (e.g., "alert me if a user fails to log in more than 5 times in one minute from two different countries").
For Historical Analysis: Data in S3 can be queried using tools like Amazon Athena, Presto, or Snowflake. If using Tiered Storage, you can query historical data directly through Kafka APIs and ksqlDB.
Finally, turn your passive audit log into an active defense mechanism.
Monitoring: Use Flink SQL or ksqlDB to define jobs that continuously monitor for specific conditions, such as a user accessing a resource they've never touched before or an unexpected change in administrative privileges.
Alerting: Integrate your monitoring jobs with tools like PagerDuty or Slack to send real-time alerts to your security operations center (SOC) when an anomaly is detected.
Choosing the right platform for real-time use cases is critical, but comparing Kafka vs. Confluent isn’t as straightforward as it might seem. While open source Kafka provides the core streaming capabilities, Confluent Cloud offers fully managed services for Kafka, Flink, and other key capabilities that accelerate development and reduce operational overhead, which is especially important for mission-critical compliance use cases.
Capability | Self-Managed Kafka | Confluent Cloud |
---|---|---|
Ingestion | Requires manual setup and management of Kafka Connect workers and connectors. | Fully managed connectors with a simple UI/API setup. Vast connector portfolio with enterprise support. |
Processing | Requires manual integration and scaling of Kafka Streams or Flink clusters. | Fully managed Flink SQL and integrations for serverless stream processing. |
Schema Governance | Requires self-hosting and managing a Schema Registry cluster | Hosted Schema Registry included, with features like schema validation and evolution rules. Docs here. |
Storage | Limited to broker disk retention. Long-term storage requires building a custom pipeline to an external store like S3 | Infinite Storage / Tiered Storage built-in. Seamlessly retains data long-term in object storage while keeping it queryable. |
Querying | Requires self-hosting and managing a Flink cluster | Serverless Flink for building real-time applications and queries without managing infrastructure. |
Security | Requires manual configuration of access control lists (ACLs), encryption, and authentication (SASL/mTLS). Complex to manage | Enterprise-grade security out-of-the-box: RBAC, private networking, managed keys, and audit logs for the platform itself. |
Cost & Operations | No license fee, but high total cost of ownership (TCO) due to engineering time for setup, scaling, patching, and 24/7 monitoring | Consumption-based pricing. Eliminates operational burden, allowing teams to focus on building value instead of managing infrastructure. |
Building a real-time audit pipeline is a high-stakes endeavor. While the benefits are immense, several common pitfalls can compromise data integrity, blow up your budget, or fail to meet compliance requirements. Here’s how to navigate the most common challenges with proven best practices:
Building a real-time audit pipeline is a high-stakes endeavor. While the benefits are immense, several common pitfalls can compromise data integrity, blow up your budget, or fail to meet compliance requirements.
Here’s how to navigate the most common challenges with proven best practices:
Data Loss and Delivery Failures: To prevent losing critical audit events, you must guarantee data durability. Configure your Kafka producers with acks=all
to ensure that a write is only confirmed after the data is replicated across multiple brokers. For consumers and connectors, monitor consumer lag vigilantly and use dead-letter queues to isolate and handle malformed or problematic events without halting the entire pipeline.
Performance Bottlenecks at High Volume: An audit pipeline can quickly become overwhelmed as event volume grows. The key to scaling is proper partitioning. Choose a strategic partitioning key (e.g., service_name
or user_id
) to distribute the workload evenly across brokers. This allows you to scale horizontally by simply adding more partitions and brokers as your ingestion rate increases, ensuring low latency is maintained.
Compromised Data Integrity: The core value of an audit log is its integrity; it must be a tamper-proof source of truth. Ensure immutability by sinking your processed data from Kafka to a WORM-compliant storage system, such as Amazon S3 with Object Lock. For an even stronger guarantee, you can introduce cryptographic hashing during your stream processing stage to create a verifiable chain of custody for your data.
Improper Access Control: It's critical to separate operational duties from data access. Implement strict role-based access control (RBAC) to manage permissions. Engineers who manage the pipeline should have operational permissions on Kafka topics but be restricted from reading sensitive message data. Conversely, auditors and security analysts should be granted read-only access through dedicated, secure APIs or query interfaces.
Runaway Storage Costs: Storing years of audit data on high-performance disks is not financially viable. Implement a cost-efficient tiered storage strategy. Solutions like Confluent Platform's Tiered Storage automatically move older data segments from expensive broker storage to low-cost object storage. This significantly reduces costs while keeping all your data—both real-time and historical—fully queryable from a single interface.
Moving from slow, batch-based reporting to an instant, event-driven approach allows businesses to proactively meet stringent compliance mandates. Here’s how it works in the real world.
In the world of finance, every transaction and data access event must be logged and monitored. Regulations like the Payment Card Industry Data Security Standard (PCI DSS) and the Sarbanes-Oxley Act (SOX) demand strict controls over financial data.
How it works: A financial institution streams every event related to its Cardholder Data Environment (CDE) into Kafka. This includes every database query, every API call, and every administrative login. A real-time processing job continuously scans this stream for suspicious patterns, such as a single user accessing an unusual number of credit card records in a short period or an administrator modifying financial reporting data outside of a designated change window.
Compliance Impact: If an anomaly is detected, an alert is instantly sent to the security team. This allows them to investigate and respond in minutes, not days. For auditors, they can be given secure, read-only access to a live dashboard that proves continuous monitoring is in place, satisfying a core requirement of PCI DSS and SOX. This proactive stance is essential for preventing fraud and ensuring the integrity of financial reports. You can see how companies in the financial sector leverage this technology in our industry case studies.
The Health Insurance Portability and Accountability Act (HIPAA) mandates strict privacy and security rules around Protected Health Information (PHI). Healthcare organizations must be able to prove who accessed patient records and why.
How it works: A hospital system uses Kafka Connect to ingest audit logs from its Electronic Health Record (EHR) system, applications, and infrastructure in real time. As doctors, nurses, and administrators access patient files, each event is published to a Kafka topic. A Flink SQL query runs continuously, looking for violations of policy—for example, a hospital employee accessing the records of a patient they are not assigned to treat, or an employee accessing the PHI of a high-profile individual.
Compliance Impact: When a potential violation is detected, the system can automatically trigger an alert to the hospital's compliance officer. This immediate notification allows them to investigate the incident promptly, mitigate any potential breach, and fulfill HIPAA's stringent reporting requirements. This creates an auditable, immutable record of all data access, which is crucial for demonstrating compliance during an audit.
For modern software-as-a-service (SaaS) companies, achieving compliance with standards like SOC 2 is essential for earning customer trust. A key part of this is proving that you have controls in place to secure customer data and providing customers with visibility into how their own data is being used.
How it works: A SaaS provider streams all application-level events (e.g., user_login, document_created, permission_changed) into a central Kafka pipeline. Each event is tagged with the tenant_id of the customer it belongs to. The pipeline then uses stream processing to split the main audit stream into separate, secure topics for each tenant.
Compliance Impact: This architecture allows the SaaS company to offer a "self-serve" audit trail feature. Customers can access a dashboard or API that shows a real-time log of all activity within their own account, giving them full transparency and control. For SOC 2 auditors, the company can easily demonstrate robust data segregation and monitoring controls, proving that they can track and audit events on a per-tenant basis, which is a critical security and privacy requirement.
The shift to real-time audit logging is just the beginning. As data streaming platforms become the central nervous system for modern enterprises, the capabilities built on top of them are evolving at a rapid pace. Here are three key trends that will define the future of compliance and security monitoring.
Traditional alerting systems rely on predefined rules (e.g., "alert on five failed logins in a minute"). While useful, they can't catch novel or sophisticated attacks. The future lies in applying artificial intelligence and machine learning (AI/ML) directly to your real-time audit streams.
Instead of static rules, AI models can learn the normal patterns of behavior for every user, service, and system in your organization. They can then identify subtle deviations from this baseline that would be invisible to a human analyst. Imagine a system that automatically flags when a developer's account starts accessing production data at 3 AM from an unusual location, even if their credentials are valid. This moves security from a reactive, rule-based posture to a proactive, predictive one. You can explore more on how Confluent is powering AI with data streaming to unlock these kinds of use cases.
While using a WORM (Write-Once, Read-Many) object store provides strong guarantees of immutability, the next frontier is creating a cryptographically verifiable audit trail. Inspired by blockchain technology, future pipelines will incorporate techniques like cryptographic hashing to chain events together.
In this model, each new audit log entry would contain a hash of the previous entry. This creates an unbroken, tamper-evident chain. Any attempt to alter or delete a past event would invalidate the hashes of all subsequent events, making tampering immediately obvious. This provides mathematical proof of your audit log's integrity, offering the highest possible level of assurance to auditors and regulators.
Security information and event management (SIEM) and security orchestration, automation, and response (SOAR) platforms are the command centers for most security operations teams. Historically, they've ingested data through slow, batch-based log shippers.
The future is a streaming-first integration. Instead of polling for logs, modern SIEMs will subscribe directly to curated Kafka topics. This provides them with normalized, enriched, and pre-processed events in real time. The result is a dramatic reduction in the time from event to detection. This shift will also enable SOAR platforms to trigger automated responses—like blocking an IP address or disabling a user account—in milliseconds, based on events processed in the streaming pipeline. This tightens the feedback loop and transforms security from a passive analysis task into an automated, real-time defense system.
As we've explored, shifting from retrospective analysis to a proactive, real-time approach is crucial for maintaining security and integrity. By leveraging Kafka along with connectors, governance, and stream processing, you can build a powerful, end-to-end pipeline that provides immediate visibility into every critical event across your organization.
Here is a summary of the essential steps to start building your own compliance-ready pipeline:
Define a Contract: Start by creating a strong, centrally-managed Avro schema using a Schema Registry. This ensures all your audit data is consistent, structured, and validated from the start.
Ingest Everything: Use Kafka Connect to reliably stream audit events from all your sources—applications, infrastructure, cloud services, and databases—into a central Kafka topic.
Process in Real-Time: Employ a stream processing framework like Apache Flink or ksqlDB to normalize, enrich, and filter events as they arrive, turning raw data into valuable, compliance-ready records.
Store for the Long Haul: Sink your processed events into a cost-effective, immutable long-term store like an S3 bucket with Object Lock or leverage Confluent's built-in Tiered Storage for infinite, queryable retention.
Query, Alert, and Monitor: Use streaming queries to build live dashboards and automated alerts for immediate threat detection and provide secure access for auditors to query historical data.
Building a compliance pipeline from scratch can be complex, but Confluent Cloud provides the managed, enterprise-grade components to accelerate your journey. You can focus on your data logic while leaving the operational burden of managing a distributed system to us.
Explore Managed Connectors: Browse the Confluent Hub to instantly connect to your critical data sources.
Enforce Data Governance: Get started with the fully-managed Schema Registry to ensure data quality from day one.
See it in Action: Learn how Confluent uses this very architecture to provide comprehensive Audit Logs for Confluent Cloud itself.
Start your free Confluent Cloud trial today and begin building your real-time compliance pipeline in minutes.
What is the difference between audit logging and observability logs?
Audit logs are designed for compliance and record every action taken on systems and data (i.e., who, what, when, where). Observability logs focus on system health and troubleshooting, not regulatory evidence.
How can I make audit logs tamper-proof?
Use immutable storage options (e.g., S3 Object Lock, WORM storage) and cryptographic signing. Real-time pipelines also reduce the risk of manipulation by capturing events as they happen.
How long should audit logs be retained for compliance?
Retention depends on regulations: SOC 2 often requires at least 1 year, HIPAA 6 years, and financial standards (like SOX) 7 years or more. Real-time pipelines built with Confluent make it easier to enforce retention policies automatically.
Do I need real-time audit logging for all industries?
While highly regulated industries like finance, healthcare, or the public sector require it, even SaaS providers and ecommerce platforms benefit from real-time audit trails for customer trust, fraud detection, and faster incident response.
What technologies are best for building a real-time compliance pipeline?
Both Apache Kafka and Confluent's cloud-native Kafka engine can provide the event streaming backbone, combined with Flink or Kafka Streams for processing, Schema Registry for consistency, and immutable object stores (like S3 with Object Lock) for long-term retention.
Apache®, Apache Kafka®, Apache Flink®, Flink®, and the Flink logo are trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.
Learn how to design real-time alerts with Apache Kafka® using alerting patterns, anomaly detection, and automated workflows for resilient responses to critical events.
Learn how to handle data transformation, schema evolution, and security in Kafka Connect with best practices for consistency, enrichment, and format conversions.