View sessions and slides from Kafka Summit San Francisco 2017
In this presentation, I will talk about my firsthand experience dealing with the unique challenges of running Kafka at a massive scale. If you ever thought that running Kafka is difficult, this talk may change your mind and provide you with valuable insights into how to configure a Kafka cluster efficiently, how to manage Kafka for enterprise customers and how to measure, monitor and maintain the Quality of Kafka Service. Our production Kafka cluster runs over 1500+ VMs, and serves over 10 GBPS data spread across hundreds of topics for multiple teams across Microsoft. We built a self-serve Kafka management service to make the process manageable and scalable across many teams. In this talk, I will also share insights about running Kafka in Private vs multi-tenant mode, supporting failover and disaster recovery requirements, and how to make Kafka Compliant with regulatory certifications such as ISO, SOC, FEDRAMP, etc.
LINE is a messaging service with 200+ million active users. I will introduce why we feed 100+ billion daily messages into Kafka and how various systems such as data sync, abuse detection and analysis are depending on and leveraging it. It will be also introduced how we leverage dynamic tracing tools like SystemTap to inspect broker’s performance on production system, which led me to fix KAFKA-4614.
Kafka makes so many things easier to do, from managing metrics to processing streams of data. Yet it seems that so many things we have done to this point in configuring and managing it have been object studies in how to make our lives, as the plumbers who keep the data flowing, more difficult than they have to be. What are some of our favorites?
We’ve made a lot of progress over the last few years improving the situation, in part by focusing some of this incredibly talented community towards operational concerns. We’ll talk about the big mistakes you can avoid when setting up multi-tenant Kafka, and some that you still can’t. And we will talk about how to continue down the path of marrying the hot, new features with operational stability so we can all continue to come back here every year to talk about it.
Apache Kafka is recognized as the world\u2019s leading real-time, fault tolerant, highly scalable stream platform. It is adopted very widely across thousands of companies worldwide from web giants like LinkedIn, Netflix, Uber to large enterprises like Apple, Cisco, Goldman Sachs and more. In this talk, we will look at what Confluent has done along with the help from the community to enable running Kafka as a fully managed service. The engineers at Confluent spent multiple years running Kafka as a service and learnt very valuable lessons in that process. They understood how things are very different when you run in a controlled environment inside a single company vs running Kafka for thousands of companies. This talk will go over those valuable lessons and what we have built in Kafka as a result which is available to all Kafka users as part of Confluent Cloud.
An Overview of the Kafka clients ecosystem. APIs – wire protocol clients – higher level clients (Streams) – REST Languages (with simple snippets – full examples in GitHub) – the most developed clients – Java and C/C++ – the librdkafka wrappers node-rdkafka, python, GO, C# – why use wrappers Shell scripted Kafka ( e.g. custom health checks) kafkacat Platform gotchas (e.g. SASL on Win32)
Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments poses some challenges – including container management, scheduling, network configuration and security, and performance. In this session, we’ll share lessons learned from implementing Kafka-as-a-Service with Docker containers.
ChatWork is a worldwide communication service, which holds 110k+ of customer organizations. In 2016, we have developed a new scalable infrastructure based on the idea of CQRS and Event Sourcing using Kafka and Kafka Streams combined with Akka and HBase. In this session, we talk about the concept of this architecture and lessons learned in production use cases.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics within its Streams API for stream processing use cases. This is achieved by leveraging the underlying idempotent and transactional client features. The main focus will be the specific semantics that Kafka distributed transactions enable in Streams and the underlying mechanics to let Streams scale efficiently.
Event Driven Services come in many shapes and sizes from tiny event driven functions that dip into an event stream, right through to heavy, stateful services which can facilitate request response. This practical talk makes the case for building this style of system using Stream Processing tools. We also walk through a number of patterns for how we actually put these things together.
<3 Python & want to process data from Kafka? This talk will look how to make this awesome. In many systems the traditional approach involves first reading the data into the JVM and then passing the data to Python, which can be a little slow, and on a bad day results in almost impossible to debug. This talk will look at how to be more awesome in Spark & how to do this in Kafka Streams.
The rapidly expanding world of stream processing can be daunting, with new concepts (various types of time semantics, windowed aggregates, changelogs, and so on) and programming frameworks to master. KSQL is a new open-source project which aims to simplify all this and make stream processing available to everyone.
Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. By cleanly separating the user’s processing logic from details of the underlying execution engine, the same pipelines will run on any Apache Beam runtime environment, whether it’s on-premise or in the cloud, on open source frameworks like Apache Spark or Apache Flink, or on managed services like Google Cloud Dataflow. In this talk, I will:
The HITS algorithm creates a score for documents; one is “hubbiness”, the other is “authority”. Usually this is done as a batch operation, working on all the data at once. However, with careful consideration, this can be implemented in a streaming architecture using KStreams and KTables, allowing efficient real time sampling of rankings at a frequency appropriate to the specific use case.
We are migrating one of the top 3 consumer packaged goods companies from a batch-oriented systems architecture to a streaming micro services platform. In this talk I’ll explain how we leverage the Lightbend reactive stack and Kafka to achieve this and how the 4 Kafka APIs fit in our architecture. Also I explain why Kafka Streams <3 Enterprise Integration Patterns.
Kafka Streams allows to build scalable streaming apps without a cluster. This “Cluster-to-go” approach is extended by a “DB-to-go” feature: Interactive Queries allows to directly query app internal state, eliminating the need for an external DB to access this data. This avoids redundantly stored data and DB update latency, and simplifies the overall architecture, e.g., for micro-services.
At Funding Circle, we are building a global lending platform with Apache Kafka and Kafka Streams to handle high volume, real-time processing with rapid clearing times similar to a stock exchange. In this talk, we will provide an overview of our system architecture and summarize key results in edge service connectivity, idempotent processing, and migration strategies.
Apache Avro allows data to be self-describing, but carries an overhead when used with message queues such as Apache Kafka. Confluent’s open source Schema Registry integrates with Kafka to allow Avro schemas to be passed ‘by reference’, minimizing overhead, and can be used with any application that uses Avro. Learn about Schema Registry, using it with Kafka, and leveraging it in your application.
You have made the transition from single machines and one-off solutions to distributed infrastructure in your data center powered by Apache Kafka. But what if one data center is not enough? In this session, we review resilient data pipelines with Apache Kafka that span multiple data centers. We provide an overview of best practices and common patterns including key areas such as architecture and data replication as well as disaster scenarios and failure handling.
As organizations increasingly adopt streaming platforms such as kafka, the need for visibility and discovery has become paramount. Increasingly, with the advent of self-service streaming and analytics, a need to increase on overall speed, not only on time-to-signal, but also on reducing times to production is becoming the difference between winners and losers. Beyond Kafka being at the core of successful streaming platforms, there is a need for a stream registry. Come to this session to find out how HomeAway is solving this with a “just right” approach to governance.
On the events pipeline team at New Relic, Kafka is the thread that stitches our micro-service architecture together. We receive billions of monitoring events an hour, which customers rely on us to alert on in real-time. Facing a ten fold+ growth in the system, learn how we avoided a costly scaling nightmare by switching to a streaming system, based on Kafka. We follow a DevOps philosophy at New Relic. Thus, I have a personal stake in how well our systems perform. If evaluation deadlines are missed, I loose sleep and customers loose trust. Without necessarily setting out to from the start, we’ve gone all in, using Kafka as the backbone of an event-driven pipeline, as a datastore, and for streaming updates to the system. Hear about what worked for us, what challenges we faced, and how we continue to scale our applications.
This talk will review the Kafka Connect Framework and discuss building data pipelines using the library of available Connectors. We’ll deploy several data integration pipelines and demonstrate :
When Blizzard started sending gameplay data to Hadoop in 2013, we went through several iterations before settling on Flumes in many data centers around the world reading from RabbitMQ and writing to central flumes in our Los Angeles datacenter. While this worked at first, by 2015 we were hitting problems scaling to the number of events required. This is how we used Kafka to save our pipeline.
Yelp moved quickly into building out a comprehensive service oriented architecture, and before long had over 100 data-owning production services. Distributing data across an organization creates a number of issues, particularly around the cost of joining disparate data sources, dramatically increasing the complexity of bulk data applications. Straightforward solutions like bulk data APIs and sharing data snapshots have significant drawbacks. Yelp’s Data Pipeline makes it easier for these services to communicate with each other, provides a framework for real-time data processing, and facilitates high-performance bulk data applications – making large SOAs easier to work with. The Data Pipeline provides a series of guarantees that makes it easy to create universal data producers and consumers that can be mashed up into interesting real-time data flows. We’ll show how a few simple services at Yelp lay the foundation that powers everything from search to our experimentation framework.
We show a way to make Kafka end-to-end encrypted. It means that data is ever decrypted only at the side of producers and consumers of the data. The data is never decrypted broker-side. Importantly, all Kafka clients have their own encryption keys. There is no pre-shared encryption key. Our approach can be compared to TLS implemented for more than two parties connected together.
Kafka is easy to set up as a messaging service and serves the purpose well. However, it gets complicated in a multi-tenant environment, where users have different SLA on availability, durability and latency. As traffic grows, managing a huge and monolithic Kafka cluster in a cloud environment has been proved to be problematic and hard to scale.
At Netflix, our Kafka messaging system evolves into a multi-cluster and hierarchical service where it can serve over a trillion messages per day. Topics are allocated in either shared or dedicated clusters according to SLA requirements and can be migrated across clusters. Infrastructure routers connect Kafka clusters and provide hierarchical access to data. With the help of enhanced client libraries and proxies, clients interact with the service using higher level APIs and abstracted access points. Kafka deployments are transparent from clients. Enabled by our client libraries and Netflix cloud infrastructure, we are able to mitigate Kafka cluster level failures with our Kafka failover which is also transparent to the clients.
In this talk, we are going to discuss why this architecture is necessary and how we have implemented it with essential components including management and self-service tools, infrastructure routers, client libraries, proxies and monitoring service.
CERN uses the world’s largest and most complex scientific instruments to prove the fundamental structure of the universe. The organization is deploying a data streaming infrastructure, based on Kafka and IaaS cloud, to make its operations more scalable and efficient. This session expounds the motivations, selected architecture, challenging use cases and shares ours lessons learned and future plans.
This talk will focus on how database streaming is essential to WePay’s infrastructure, and the many functions that database streaming serves. It will also provide information on how the database streaming infrastructure was created and is managed so that others can leverage WePay’s work to develop their own database streaming solutions.
With over 100 million monthly players and presence in over 20 data centers globally, League of Legends generates an immense amount of operational and analytical data that has, until recently, been siloed where it was generated or delayed via slow ETLs. In this talk, Riot Games will share their challenges and victories of rolling out a globally aggregated Kafka pipeline to overcome these limitations
The focus of this session is to share our experience of using Kafka in reinventing CapitalOne auto finance customer communications infrastructure. We will share our technology selection process, system architecture to build a highly resilient and available system, cloud native design and lessons learnt. We will also review our journey of moving from batch to real time event driven system.
Shopify is a leading e-commerce platform with 325+ thousand merchants and peak traffic of over 4 million rpm. We enable our merchants to run flash sales, which can double or triple traffic for a short period of time. This talk will take you through the engineering challenges of enabling Shopify to run flash sales, and how Apache Kafka plays a crucial role in supporting this feature.
We created specialized infrastructure for Kafka Streams in each DC. It allows us fast and easy bootstrap stream applications for everyone in Criteo. Its includes next parts: Replication based on Kafka Connect Kafka Connect on Mesos Kafka Streams Application on Mesos Monitoring of Kafka Streams application Kafka Configuration as Code Protobuf schemas and deployment
Should you containerize your Kafka Streams or Kafka Connect apps? I’ll answer this popular question by describing the evolution of streaming platforms at Etsy, which we’ve run on both Docker and bare metal, and what we learned on the way. Attendees will learn about the benefits and drawbacks of each approach, plus some tips and best practices for running your Kafka apps in production.