View sessions and slides from Kafka Summit Americas 2021.
How can you scale the use of data across a company? The trend towards distributed data mesh architectures offers a solution. This keynote addresses some of the technology and practices that are useful to build a data mesh using Apache Kafka.
Join Dustin Pearce, Instacart’s first Vice President of Infrastructure, who oversees the company’s Infrastructure Engineering, Security, and Data Warehousing teams, as he discusses the role that data in motion plays to deliver best in class customer experiences.
Join Will LaForest, Confluent Public Sector CTO, and Jason Schick, Confluent Public Sector GM, for an introduction on the Government Track and highlighted sessions.
Join us for a panel discussion amongst a diverse set of contributors and members of the Apache Kafka® community. The panel discussion covers a variety of topics around building community, mentoring current and future prospective contributors and members of the Apache Kafka community.
In this session I’ll explain the most common levers we use to combat increased latency in stretch clusters.
We will cover operating system level changes, broker side socket and buffer sizes, replication level tuning and touch on client optimizations.
This talk will demonstrate how to use Kafka and Flink together for "unified analytics": Analytics that seamlessly combine processing of real-time data and historic data.
The audience will learn how combining real-time and historic data is becoming convenient with the combination of Kafka and Flink.
In this talk, I will present ways to achieve end-to-end and granular visibility into complex event-sourcing transactions using distributed tracing. I will use open-source tools like OpenTelemetry, Jaeger, and Zipkin to showcase a complex Node.js system using Kafka.
The presentation will cover various techniques administrators of Kafka service can employ to reduce the data transfers and to save the operational costs. Reducing cross-AZ traffic, optimizing batching with use of DumpLogSegment script, utilizing Kafka metrics to shut down unused data streams & more.
I'll demonstrate how the data was ingested from a raw TCP feed, unified with reference data from CSV files, and then processed to spot patterns with the resulting real-time stream of matches written to a new Kafka topic for validation and analysis.
Join us as we take an illustrated tour of the Kafka ecosystem with clear, concise, and approachable explanations of the mot important components, features, and concepts to get started with Kafka. This presentation will be suitable for all audiences.
In this talk, we share our recommendations and picks of what every developer should know about building a streaming data mesh with Kafka. We introduce the four principles of the data mesh: domain-driven decentralization, data as a product, self-service data platform, and federated governance.
In this talk, I will briefly walk you through these possibilities presenting real-life use cases and some sample code using open source tools.
In this session we will:
This talk describes the architectures available to you when planning for an outage. We will examine configurations as well as availability zones and debate the benefits and limitations of each. We will also cover how to set up each configuration using the tools in Kafka.
I will address hurdles such as scaling, warm standbys, schema evolution, and batch replay strategies - highlighting issues prevalent with any streaming Kappa based architecture.
In this session, you will learn powerful Kafka concepts and techniques to upscale your skills in a fun way combining Kafka and Raspberry PI in a Led Strip game.
In our session, we’ll discuss the details described in the IT@Intel white paper that was published in Nov 2020 with same title. We’ll share some stream processing techniques, such as filtering and enriching in Kafka to deliver contextually rich data to Splunk and many of our security controls.
In addition to sharing architecture recommendations to enable high performance, they will be demoing a system that leans on Confluent Cloud and Kafka Streams for its data in motion.
In this talk, we will present how kafka adoption has evolved over the last couple of years in our space and deep dive into how we approached in providing Managed Kafka Connect, a newest addition to our service portfolio.
This session is presented from the Microsoft Solution Architect perspective by Israel Ekpo, Microsoft Cloud Solution Architect and Alicia Moniz, Microsoft MVP. They will cover use cases and scenarios, along with key Azure integration points and architecture patterns.
In this talk, I would like to facilitate the on-boarding process for anyone starting with Kafka and help to destroy (yeah, destroy!) any barriers preconceived in our minds about Kafka.
As part of this session I will be talking about highly resilient streaming architecture that is supporting processing of billions of events every day then some of the strategies & best practices to build highly available and fault-tolerant systems utilizing Kafka and Cloud environments.
This talk introduces you to the strangler fig pattern, which aids a smooth and step-wise migration of monolithic applications into separate services.
In this session, we'll explore the ways that Apache Kafka and Micronaut work together to enable us to build fast, efficient, event-driven applications. Then we'll see it in action, using the AWS Lambda Sink Connector for Confluent Cloud.
It's highly likely for medium- and large-scale systems that an event-first perspective is the most helpful one to take, but it's early days, and it's still possible to get this wrong. Come to this talk for a survey of mistakes not to make.
Attendees will learn how to both stand up their own OSS offering as well as how to be a good internal consumer of other such offerings. Come ready to learn and laugh about my journey to offering OSS to thousands of people!
We will focus on recipes for effective use of Mirror Maker event replication to power platform distribution including the challenges of managing a 'fan in' event replication workflow - pulling events created in satellite clusters back to a mothership cluster for processing.
This talk will introduce the most popular formats for documenting events that flow through Kafka, such as AsyncAPI, Avro, CloudEvents, JSON schemas, and Protobuf.
In this session, we will share our journey to build a real-time monitoring platform based on Confluent and Kafka and how we’ve been able to improve customer satisfaction ratings and boost referral-based sales as a result.
We show how teams can easily leverage the power of Kafka and scale their applications with the right architectural building blocks. We also offer insights from our own experience of building NodeJS based Kafka applications.
This session covers how we interpret and apply the data mesh paradigm, the role of Kafka as the backbone for a mesh of connectivity, the role of Kafka Connect to generate and consume data events, and the use of KSQL to perform minor transformations for consumers.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
In this session, you will learn the strategies, techniques and tools the PayPal Kafka team has utilized for managing the migration process. You will also learn the lessons and pitfalls they experienced during this exercise, as well as the secret sauce of making the migration successful.
In this talk, we share our experience of running Kafka connect at scale. We will walk through our decisions of using one cluster vs many and how the improvements in the connect ecosystem like incremental rebalancing have allowed us to scale to thousands of connects.
In this session, we’ll start with some basic use cases of how Standard SQL can be effectively used over events in Kafka- including how these SQL engines can help teams that are brand new to streaming data get started. From there, we’ll cover a series of more advanced functions and their implications
Focusing on zero negative impact to existing ingestion pipelines, scalability and cost efficiency led us to make various design decisions to eventually achieve auditing rollout to every pipeline with zero downtime and fundamentally improve the data ingestion quality at Pinterest.
This technology will enable the VA to provide accurate, real-time information on a claim, appeal or rating for our Veterans.
In this talk, I want to show how you can solve those challenges by embracing Apache Kafka as a foundation of your data pipeline and leveraging modern stream-processing frameworks like Apache Kafka Streams and Apache Flink.
GraphQL is a powerful way to bridge the gap between frontend and backend. Providing a typed API with introspection. This can be used for code generation or code completion.
This session will cover how large volumes of streaming messages can be received by parallel Kafka consumers, and turned into action by network operations teams, dramatically reducing downtime and improving performance.
In this talk we will discuss all the nuances and considerations around using Avro Schemas for your JSON event payloads. From developer tools, to DevOps approaches, versioning, governance and some “gotchas” we found when working with Avro Schemas and the Confluent Schema Registry.
During this session, the audience will learn about the satellite communications chain, and best practices and lessons learned in creating a data pipeline with Kafka for high throughput and scalability while displaying high quality situational awareness to mission operators.
This session walks through the different steps some companies are already gone through. Technical options like Change Data Capture (CDC), MQ, and third-party tools for mainframe integration, offloading and replacement are explored.
This talk will outline strategies for low-code and model-driven development based on Event Modeling. We'll explore how event-driven application architecture provides a simple yet robust framework for generating DevSecOps-friendly code for the UI, for the web services layer, and for event-processing.
The aim of this talk is to provide an insight into how we built such a scalable system while embracing a full blown Kappa architecture, with Kafka at the heart of it all.
We will share our lessons learnt, the patterns and practices to modernize both our underlying runtime platforms and our applications with highly performing and resilient event-driven architectures.
This session will describe this architecture and the lessons learned while building it, with a focus on the internally built, multi-tenant, multi-cluster, Kafka-as-a-Service platform that enables it.
In this talk you will learn the tips & tricks I wish I had known at the beginning of my Apache Kafka journey. We’ll discuss topics like producer acknowledgments, server and consumer parameters (autooffsetreset anyone?) that are commonly overlooked causing lots of developer’s pain.
In this talk, we will take you on a journey through the theoretical foundations of stream processing and discuss the underlying principles and unique problems that need to be addressed.
Using JDBC Kafka Connect with custom transformation, we’re able to push configuration changes into Kafka topics with minimal coding. Kafka Streams provides a way to model live market data as a stream and configuration data as a table.
Our solution utilizes Kafka’s metadata to keep track of blocks that we intend to send to ClickHouse, and later uses this metadata information to deterministically re-produce ClickHouse blocks for re-tries in case of failures.
In our proposed presentation, we will provide a live demonstration that consists of two consumers subscribing to the same Kafka topic, but receiving different messages based on the rules specified in Open Policy Agent.
Through this session, we will highlight various aspects of Design, Architecture, Deployment strategy, Kafka settings, Optimization techniques etc that was paramount to achieve this rate of processing and certainly with no accidents.
Here, we review and compare methods for connecting Kafka to Superset to enable streaming analytics use cases including anomaly detection, operational monitoring, and online data integration.
In this talk, we introduce a Kafka Connect Sink Connector for Apache Hudi, which writes data straight into Hudi's log format, making the data immediately queryable, while Hudi's table services like indexing, compaction, clustering work behind the scenes.
In this session, hear from the teams at Salesforce that manage Kafka as a service, running over a hundred clusters across on-premise and public cloud environments with over 99.9% availability.
In this talk, we cover the public API changes to the TimeWindows, SessionWindows, JoinWindows and SlidingWindows as well as the new guidance going forward.
With shelter-in-place orders taking immediate effect, they needed to quickly set up a robust online learning platform - one with powerful analytics to track student success. And, for the times students and staff are on campus, a contact tracing application was essential for their safety.
This talk will focus on how to use Kafka events as a database. We will talk about using KTables vs GlobalKTables, and how to apply them to patterns we use with traditional databases. We will go over a real-world example of joining events against existing data and some issues to be aware of.
In this session we will explore how Kafka Connect and its various connectors satisfied this need. We will review the two disparate tech stacks we needed to integrate, and the strategies and connectors we used to exchange information.
This talk outlines why Van Oord requires data governance and enterprise architecture models integrated with Confluent Kafka, and demo how an open-source based data governance tool is integrated with Confluent Kafka to fulfil these requirements.
In this talk I will discuss the creation and demonstrate the usage of geospatial UDFs in ksqlDB. I will also talk through the advantages of doing geospatial processing directly in Apache Kafka.
In this talk you will hear about 5 lessons that Wix has learned in order to successfully meet this challenge.
Join this talk to learn how SmartBear is building the world's first universal and protocol-agnostic API platform and the lessons learned along the way.
In this talk, we discuss the variety of issues that black box machine learning models present and ways in which we can open them up. These include conducting in-depth ablation studies.
In this talk,we present the end-to-end scalable system developed to democratize the use of contextual bandits at EG.The architecture comprises of an online inference component as well as a continuous feedback loop that tracks the users’ affinity towards certain content or page layouts.
In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities.
Previously Reddit would catch these bad actors using hourly Airflow jobs which allowed undesirable content to remain on the site long enough to impact other users. However with kSQL, we are able to reduce the time to catch these bad actors from hours to minutes.
In this talk we'll present some of the problems we've run into with Kafka Connect, and how we've engineered around them.
In this use case-driven talk, we are going to demonstrate how our team at UnitedHealth Group leveraged existing transformers to extract data from the message metadata in the topic as well as how we developed our customized transformers to minimize the amount of duplicated data in each message.
Learn how we tackled this problem by implementing Event Tracking, a technique allowing our customers from the financial sector to precisely locate and visualize event's route within complicated Kafka flows.
This session will provide an introduction on how to use Kotlin and Ktor to build an application that shares geographical coordinates among clients. Viktor will give an introduction to Ktor, the Kotlin framework for building connected applications.
Many organizations have chosen to go with a hybrid cloud architecture to give them the best of both worlds: the scalability and ease of deployment of cloud, and the security, latency & egress benefits of local storage.
The more appropriate workload metric for Kafka consumer is the number of messages in Kafka broker queue. More specifically, the message production rate of a specific topic would be the right workload metric for a Kafka consumer.
In this talk, we explore how Kafka is being used in cutting-edge connected and automated vehicle research.
In this session, we will learn about an internal tool developed at Reddit to QA events in real-time. This KSQL-powered web app streams events from our pipeline, allowing developers to filter events they care about using criteria like User ID, Device ID or the type of user interaction.
This session is the story of how we learned the hard way about mitigating cluster failures with the proper configurations in place.
In this talk we’ll give you the tools and metrics to decide which solution you should apply when, and show you a real life example with cost & time comparisons. To highlight the differences, we’ll dive into a project we’ve done, transitioning from reading Kafka in a stream to reading it in batch.
In this talk, we describe this journey, and how we leverage Debezium’s Outbox pattern to transactionally emit events following the Event-Carried State Transfer pattern, where all the needed information is sent to Kafka at once, therefore avoiding consumer requests to enrich the event data.
This talk discusses our investments in Kafka infrastructure for a large-scale Python-based environment:
We'll also present challenges we encountered along the way and share our learnings with the audience.
In this session, we discuss how you can use Amazon Kinesis Data Analytics Studio (KDA Studio) and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to interactively build serverless stream processing applications for Kafka using SQL, Python, or Scala with a serverless notebook interface.
In this talk, we cover these use cases in more detail along with a deep dive into the architecture of the source and sink Kafka Connectors for Cosmos DB.
In this session, we'll explore how making changes to the JVM design can eliminate the problems of garbage collection pauses and raise the throughput of applications. For cloud-based Kafka applications, this can deliver both lower latency and reduced infrastructure costs.
This session provides a quick overview of Couchbase, describes the Couchbase Kafka Connector, and showcases a demo of how it can be used as both a source and a sink for building real-time data processing pipelines for mission-critical applications.
We will present several continuous intelligence applications in use today that depend on real-time analysis, learning and prediction to power automation and deliver responses that are in sync with the real-world.
In this talk, we’ll share and demonstrate different approaches for developers to safely create Kafka Topics whilst sharing a few war stories of what can go wrong along the way.
In this talk we'll cover how to extend existing applications with webassembly, allowing developers to change the shape of data at runtime, per application without creating additional topics.
If you’re wondering why Kafka makes sense for a digital thread, join us to learn how a real-time event streaming platform enables core strategies around ML/AI, microservices, model-based system engineering, and continuous improvement.
This talk shows real-world experiences building out real-time analytics stacks powering next-generation observability, customer insights, and insights as a service.
This talk will explore: The integration points and various capabilities of Spring Cloud Stream touchpoints with Kafka Streams How to build event streaming applications using Spring’s programming model built on top of Kafka Streams
This session shows how Kong Konnect Enterprise can complement Kafka Event Streaming, exposing it to new and external consumers while applying specific and critical policies to control its consumption.
Learn how a fintech startup is using Kafka to turn trades worth billions of dollars into streams. Sachin Kumar, CTO at Clear Street, will walk attendees through how the team leverages Kafka to manage order state, change data capture, interoperability, and more.
Join us for a talk with Confluent's Head of Education, Mario Sanchez, as he discusses how we've successfully transformed business through a prescriptive approach to enablement. We invite you to join the live Q&A that follows, to discuss how enablement can benefit your organization.
In this talk, Fran Mendez, founder of AsyncAPI and Jonathan Schabowsky, Solace CTO Architect will introduce you to the AsyncAPI specification and show you two different methods to define and share your event APIs, quickly get up to speed, and more.
In this talk, we will talk around addressing these issues and look into ways to bridge the on-premise kafka deployments with GCP stack for different use cases and personas. This will be followed by architecture examples on How do you deploy kafka and integrate with the rest of the GCP stack.
In this session, Russ Savage, Director of Product Management at InfluxData will discuss basic concepts of integrating Kafka and InfluxDB while highlighting how companies are creating fault-tolerant, scalable and fast data pipelines with the power of InfluxDB and Kafka.
In this session, we will show you how easy we have made streaming data with great user experience. Flexible resource management with our new secret weapon in the Apache Camel project -- Kamelet.
Join Joe Niemiec, Sr. Product Manager at Cloudera, as he shares these insights in this session that covers topics such as - The many ways that Kafka has been deployed in the field Standalone clusters, multiple clusters in a single data center etc.
Learn how to stream massive amounts of data which used to be impossible to handle from Kafka, to serve real-time applications using lake-scale optimized approaches to storage and indexing.
In this session, you will learn how to architect and build the fastest data processing applications that scale linearly, and combine streaming data and reference data data-in-motion and data-at-rest with machine learning.
In this breakout session you’ll hear data streaming success stories from Generali and Skechers that leverage Qlik Data Integration and Confluent. You’ll discover how Qlik’s data integration platform lets organizations automatically produce real-time transaction streams into Kafka, Confluent Platform
This session introduces the Kafka Connector for Redis Enterprise. During the presentation we will discover how Kafka in combination with the multi-model database platform from Redis Labs opens up new possibilities for developers.
DriveOhio digitally links sensors, cameras, speed monitoring equipment, and smart highway assets in real time, to dynamically adjust the surface road network to maximize the safety and efficiency for travelers.
In this session, you’ll learn how to leverage modern ML-augmented data management solutions to automatically find, identify, and classify sensitive data across Spark, Databricks, and beyond - and how to apply policies for compliance and risk mitigation to get the most value from our data.
APIs have become ubiquitous as a way of exposing the capabilities of the enterprise both internally and externally. However, are APIs alone enough? There is a strong resurgence in interest in asynchronous communication and event driven architecture.
In this session, you will learn how Kafka and SingleStore enable modern, yet simple data architecture to analyze both fast paced incoming data as well as large historical datasets. In particular, you will understand why SingleStore is well suited process data streams coming from Kafka.
In this session you will learn how to setup and configure the Confluent Cloud with MongoDB Atlas. We'll start the journey learning about the basic connectivity between the two cloud services and end with a brief discovery of what you can do with data once it is in MongoDB Atlas.
In this talk, we examine an indexing approach that enables fast SQL analytics on data from Kafka, without data flattening or denormalization. Rockset is the real-time indexing database that builds an inverted index, a columnar index and a row index on all fields of your Kafka messages.
In this session Mike & David will walk you through how to navigate and govern your data in motion with a new suite of products from Confluent.
In our talk, we’ll tackle this question and many more when it comes to the testing of Apache Kafka endpoints and your services architecture. We’ll cover what makes testing in EDA difficult; technologies that can help you; and how we at SmartBear are thinking about these testing problems.
During this presentation we will explore how Moogsoft used Aiven for Kafka to manage and scale their data in the cloud.
Join us to hear how Precisely Connect can help use the power of Apache Kafka to eliminate data silos and make cloud-based, event-driven data architectures a reality. Start your cloud transformation journey today, knowing you don’t need to leave essential transaction data behind!
Join experts from Confluent and AWS to learn how to build Apache Kafka®-based streaming applications backed by machine learning models. Adopting the recommendations will help you establish repeatable patterns for high performing event-based apps.
We will demonstrate how easy it is to use Confluent Cloud as the data source of your Beam pipelines. You will learn how to process the information that comes from Confluent Cloud in real time, make transformations on such information and feed it back to your Kafka topics.
This workshop presents a solution using Confluent Cloud on Azure, Azure Cosmos DB and Azure Synapse Analytics which can be connected in a secure way within Azure VNET using Azure Private link configured on Kafka clusters.