Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more
Data breaches and cyber attacks are a growing concern among many companies today. With the ever-increasing cost of data breaches, businesses are constantly searching for ways to secure their data and infrastructure better.
The Problem with Secure Data Management
According to IBM's annual Cost of a Data Breach report,' data breach costs reached an all-time high, averaging USD 4.35 million in 2022, where 83% percent of organizations surveyed experienced multiple data breaches. But with the sheer volume of data generated by various security tools and infrastructure, businesses continually need help managing and securing their workloads, applications, and data, especially in the face of increasingly sophisticated bad actors. That's where Confluent and Amazon Security Lake come in—offering a new purpose-built data lake for security-related data. This blog post will explore how this partnership can help businesses tackle the challenges of securing their data and infrastructure.
Amazon Security Lake is a new purpose-built data lake for security-related data. It can automatically aggregate data from cloud and on-premises infrastructure, firewalls, and endpoint security solutions. It helps enterprises centralize all of their security data in a single data lake, using a standards-based format, and manage the life cycle of this data.
Amazon Security Lake aggregates data from AWS services like CloudTrail and Lambda, as well as its security tools like AWS Security Hub, GuardDuty, or the AWS Firewall Manager, in addition to many third-party log sources from SaaS and on-premises. It supports the new Open Cybersecurity Schema Framework (OCSF), which facilitates a common way to store telemetry, making it far easier to integrate tools. In addition, tools can pass information to one another. The schema is consistent and data flows seamlessly into data lakes and analytics tools. Confluent helps you quickly aggregate data and send it to Amazon Security Lake, wherever it is and at any scale.
The Challenges of Moving Data to a Streaming Model
Amazon Security Lake is a powerful security platform that ingests data from AWS native services as well as custom enterprise data with the help of partners. Confluent offers data governance features, massive scaling, and a connector ecosystem that complement Amazon Security Lake, making it easier to ingest and process data from various locations like on-prem, at the edge, or in a co-location, into S3, ensuring a streamlined and efficient data pipeline. As a data streaming platform, Confluent can scale to millions of events per second, making it an ideal layer for enterprises with massive data estates.
So how should you get data to Confluent?
The first option is to produce events directly using one of our client libraries (Java, C/C++, Python, Go, .NET) to send relevant events to a topic, which is a logical collection of events. For Amazon Security Lake, this might be a microservice that pulls events from network devices, then generates Kafka events.
Another option would be to use our connector ecosystem. Confluent, with help from our partners, has over 120 connectors that will allow you to pull data from various disparate sources. You can also sink Confluent events into data destinations, but we will cover that a bit later. So if you are interested in security events in any flavor of relational database (MSSQL, MySQL, PostgreSQL, Oracle, etc.), you can use connectors to pull data directly from these incumbent systems to generate new events in Confluent.
Once you get data to Confluent, we must ensure that data conforms to the Open Cybersecurity Schema Framework (or OCSF format for short) per Amazon Security Lake requirements—the OCSF is a collaborative, open-source effort by AWS and leading partners in the cybersecurity industry. OCSF provides a standard schema for everyday security events, defines versioning criteria to facilitate schema evolution, and includes a self-governance process for security log producers and consumers. Confluent can help with OCSF conformity in two ways. First is through our Data Governance features, which include Schema Registry. Schema Registry allows you to set up and enforce specific schemas, like OCSF, at a topic level. This means events will be rejected if they do not conform to OCSF. Confluent also formats events into OCSF using ksqlDB or our future Flink offering (more on that in a future blog post). The last step is getting that data to an S3 bucket managed by Amazon Security Lake.
The Solution to Siloed Data
Remember when we were talking about Confluent sink connectors? Our S3 sink connector is one of our most popular and does all of the neat things Amazon Security Lake requires. Here’s how to get started. First, deploy a Confluent Connect Worker somewhere in AWS. Many options exist, including EC2, ECS, and even EKS Fargate. In this example, we’ll use EC2 to simplify our initial setup. Use Amazon Security Lake to set up a source S3 bucket and associated IAM role. You’ll need an EC2 instance profile to allow you to assume the role that Amazon Security Lake created. Next, use this example connect work configuration:
Once you have the worker up and running, the next step is to deploy an S3 Sink task to the worker. The S3 Sink will watch all of the topics you have OCSFs events in and send those to S3 as Parquet objects. There are specific settings and partitioning requirements by Amazon Security Lake; the following S3 Sink task configuration includes those requirements to get you started:
And you’re ready to go!
In addition to sending these events to Amazon Security Lake, you could also have native Confluent consumers using these topics for notifications, business logic, or event firing off AWS Lambda functions to kick off remediation actions. With Confluent and Amazon Security Lake, any organization at any scale can start deriving security insights in near real-time.
Additional contributions from Michael Worthington, Sr. Product Marketing Manager
With Confluent sitting at the core of their data infrastructure, Atomic Tessellator provides a powerful platform for molecular research backed by computational methods, focusing on catalyst discovery. Read on to learn how data streaming plays a central role in their technology.
The insurance industry has undergone a massive transformation over the last 20-30 years. Customer service and business processes that were once done on paper or over the phone are now completed via web and mobile experiences. As a result, manual paperwork and tasks have gradually become...