Level Up Your Kafka Skills in Just 5 Days | Join Season of Streaming
Data management systems are having to break new ground to keep up with the scaling requirements of modern business operations. For example:
Wix, a platform that services around 1 billion unique users each month, hosts around 7% of the entire internet website, and maintains over 2,000 microservices clusters publishing 70 billion daily business events.
Instacart, which experienced exponential growth during the pandemic, built new systems and kept food flowing in the U.S., supporting 10 years of growth in six weeks across 59,000 locations.
These are Confluent customers, and they operate at a massive scale. Operating at such a scale comes with risk, so you need to trust that your technology stack will always be up and running, scale to meet peak demand, run securely, and much more! So what made these companies choose Confluent as their trusted data streaming platform? Our platform’s reliability, durability, scalability, and security. In the following sections, we will explore these areas by presenting some remarkable statistics and providing insights into our engineering capabilities.
Confluent designs its platform with reliability in mind. We’ve taken learnings from having managed 30,000+ Apache Kafka® clusters (the largest in the world) across all environments around the globe to build a cloud-native Kafka service that can handle both common and esoteric failures in the cloud. You want more? Here’s a fun fact: As of today, there are 3 trillion messages written on Confluent, per day! Confluent operates at that scale while offering a 99.99% ("four 9s") uptime availability SLA—the highest, and most comprehensive SLA in the industry.
Our world-class reliability is enabled through exceptional cloud monitoring. Confluent invests heavily in proactive cloud monitoring to learn about production problems before any workload impact. Today, Confluent has over 7,700 monitors on our platform to enable early detection and mitigation of issues. In addition, Confluent offers our customers multi-zone availability, which ensures your data is replicated across multiple availability zones, and self-healing automation which can detect and automatically mitigate some types of cloud service failures, e.g., by moving the leadership of topics from affected zones to healthy zones. It also auto-rebalances data to maintain even load across the cluster. On top of these functionalities, we implement robust battle-tested operational processes that include unit and integration tests, soak tests, scale tests, and failure injection tests to emulate faults and verify that each release does not regress in performance or reliability.
To further minimize downtime and data loss in a regional outage, we introduced Cluster Linking and Schema Linking to help organizations architect a multi-region disaster recovery plan for Kafka. These features enable Confluent customers to keep data and metadata in sync across 80+ regions within all three major cloud providers, improving the resiliency of in-cloud deployments.
Durability is the other side of resiliency, and is a measure of data integrity. Our customers use Confluent Cloud for their most critical data, and they expect it to remain intact and available when they need it, in the same form when it was written. Today, Confluent proactively detects data integrity issues with durability checks on well over over 80 trillion Kafka messages per day! This staggering statistics reflects 10x growth in less than two years.
We achieve durability standards that go far beyond Apache Kafka by performing a comprehensive design and architecture review to identify potential durability violations (prevention), a full-featured auditing service that monitors for data loss in real time (detection), and preventive strategies, which allow us to proactively repair data integrity issues at scale across the entire fleet (mitigation).
To meet the demands of modern data pipelines, we took Apache Kafka’s horizontal scalability to the next level and made Confluent Cloud 10x more elastic than Apache Kafka.
Confluent Cloud offers faster scaling as well as the ability to shrink clusters when demand reduces, lowering your overall total cost of ownership (TCO) and ensuring you don’t pay for unused infrastructure. With the ability to scale up and down, businesses save on TCO by avoiding wasted capacity from over-provisioning. Today, Confluent Cloud customers conduct over 2,500 cluster expansions and contractions in a year, ensuring they don't run out of capacity and lose critical messages or overpay for capacity they don’t need. With Confluent Cloud’s ability to scale up to 100 CKUs, our customers can confidently handle their biggest data events like Black Friday/Cyber Monday sales, large sporting competitions, or other big holidays.
Confluent Cloud provides a complete set of tools to build and launch apps faster while meeting security and compliance requirements. These include, private networking, Bring-Your-Own-Key (BYOK) encryption, SSO authentication support, OAuth support, Role-Based-Access-Controls (RBAC) and many more. Confluent provides these security tools while operating at a massive scale. For example, our services perform over 2 billion RBAC authorization checks per day across Confluent Cloud.
We also provide audit logging capabilities that can help with anomaly detection and incident response activities by leveraging a rich trail of audit records for key events pertaining to the Confluent ecosystem.
In addition to the security controls, Confluent implements internal security practices to protect and secure customer data. These security practices span across employee access management, vulnerability scanning, incident response, and various other areas outlined in our security whitepaper. Confluent's built-in compliance covers many federal and international regulations as well industry specific mandates. We rely on industry standard information security best practices and compliance frameworks, such as NIST 800-53, ISO 27000 series, PCI DSS, and SSAE 18, to support our security initiatives. With Confluent Cloud, customers can rest assured that their data is safe and secure.
Ready to get started? Sign up for a free trial of Confluent Cloud. New sign-ups receive $400 to spend within Confluent Cloud during their first 30 days. Use the code CL60BLOG
for an additional $60 of free usage.*
Operating Kafka at scale can consume your cloud spend and engineering time. And operating everyday tasks like scaling or deploying new clusters can be complex and require dedicated engineers. This post focuses on how Confluent Cloud is 1) Resource Efficient, 2) Fully Managed, and 3) Complete.
Building data streaming applications, and growing them beyond a single team is challenging. Data silos develop easily and can be difficult to solve. The tools provided by Confluent’s Stream Governance platform can help break down those walls and make your data accessible to those who need it.