Confluent is proud to participate in the following conferences, trade shows, and meetups.
In this workshop, we will show how Kafka Connect and Kafka Streams can be used together to build a real-world, real-time streaming data pipelines. Using Kafka Connect, we will ingest data from a relational database into Kafka topics as the data is being generated. We then will process and enrich the data in real time using Kafka Streams, before writing it out for further analysis.
We’ll see how easy it is to use Connect to ingest and export data (no code is required), and how the Kafka Streams Domain Specific Language (DSL) means that developers can concentrate on business logic without worrying about the low-level plumbing of streaming data processing. Because Streams is a Java library, developers can build real-time applications without needing a separate cluster to run an external stream processing framework.
In this session we talk about how Apache Kafka helps you to radically simplify your data processing architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. We discuss common use cases to realize that stream processing in practice often requires database-like functionality, and how Kafka allows you to bridge the worlds of streams and databases when implementing your own core business applications, for example in the form of event-driven, containerized microservices. Notably, we cover Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced Interactive Queries functionality. As we will see, Kafka makes such architectures equally viable for small, medium, and large scale use cases. As a running example we will walk through the architecture of a typical use case, which we will successively simplify during the session.
When you’re storing petabytes of data in a large distributed system, moving data from machine to machine can be an arduous and expensive operation. The problem has two parts: Working out when and where data should move and limiting the bandwidth used by data transfers. In a multi-tenant system, where each machine has different load profiles, this can be a tricky problem. Be too restrictive and progress can be starved. Be too open and users will encounter problems. This talk will look at the algorithms added to the latest Kafka release for handling dynamic data distribution and throttling the data transfer between machines. The result: a Multi-tenant Streaming Platform that can scale elastically in response your very own usage profile.
Join Tim Berglund to discuss topics from his tutorial, Real-time data pipelines with Apache Kafka, or ask any other questions you have.
Couchbase and Kafka are both technologies that address high throughput, distributed data management challenges. In fact, they are often deployed together, each one solving a particular need. In this session, we’ll explore how Couchbase and Kafka complement each other and examine some real-world use case architectures.
Additionally, you’ll hear directly from the folks that built Kafka, Confluent.io. Since being open sourced, Apache Kafka has been widely adopted by organizations ranging from web companies like Uber, Netflix, and LinkedIn to more traditional enterprises like Cerner, Goldman Sachs, and Cisco. These companies use Kafka in a variety of ways: 1) as a pipeline for collecting high-volume log data to load into Hadoop, 2) as a means of collecting operational metrics to feed monitoring/alerting applications, 3) for low-latency messaging use cases, and 4) to power near real-time stream processing. In this talk, you will hear how companies are using Apache Kafka, learn how its unique architecture enables it to be used for both real-time processing and as a bus for feeding batch systems like Hadoop, and explore where it fits in the Big Data ecosystem.
Chris will outline the Apache Kafka platform, and how companies in a wide range of areas are using it to solve complex data integration challenges.
In Tim's talk, we’ll explore the basics of Kafka as a stream processing system, learning the core concepts of topic, producer, consumer, broker, and the streams API. We’ll look at how topics are partitioned among brokers and see the simple Java APIs for getting data in and out. But more than that, we’ll look at how we can extend this scalable messaging system into a streaming data processing system—one that offers significant advantages in scalability and deployment agility, while locating computation in your data pipeline in precisely the places it belongs: in your microservices and applications, and out of costly, high-density systems.
And of course, we’ll look at the Kafka Connect API and how we can use it to get data in and out of MongoDB easily and efficiently.
In the last year, multicluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception.
The reasons are many and include:
Gwen Shapira offers an overview of several use cases, including real-time analytics and payment processing, that may require multicluster solutions and discusses real-world examples with their specific requirements. Gwen outlines the pros and cons of several common architecture patterns, including:
Along the way, Gwen explores the features of Apache Kafka and demonstrates how to use this understanding of Kafka to choose the right architecture for use cases from the financial, retail, and media industries.
Streaming systems have begun to move past trivial implementations of distributed message queues into more comprehensive architectures that model changed data as a first-class citizen, rather than try to hide changes behind update-in-place databases. This is not an uncommon way of modeling behavior inside a single application, but is a new paradigm in the distributed systems thinking of most engineers. In this talk, we’ll consider four real-world streaming architectures based on Apache Kafka. We’ll look at the particular problems posed by the four different use cases, and how the systems solved them, with reference to features of Kafka, Kafka Streaming, and where appropriate, open-source extensions like Kafka Connect and schema features.Learn More
Many developers have already wrapped their minds around the basic architecture and APIs of Kafka as a message queue and a streaming platform. But can they keep it running in production? This talk contains real-world troubleshooting and optimization scenarios culled from the logs of Confluent technical support. We’ll talk about the trade-offs between optimizing for the always-desirable outcomes of throughput, latency, durability, and availability. How many partitions should you use for a given topic? How much message batching should you configure in the producer? How many replicas should be required to acknowledge a write? What do you do when you see a partition growing inexplicably? When should you rearchitect your application to use the streaming API? We’ll answer these questions and more int his overview of common Kafka production issues.Learn More
Big Data and Machine Learning are key for innovation in many industries today. The first part of this session explains how to build analytic models with R, Python or Scala leveraging open source machine learning / deep learning frameworks like Apache Spark, TensorFlow or H2O.ai. The second part discusses the deployment of these built analytic models to your own applications or microservices; leveraging the Apache Kafka cluster and Kafka Streams instead of setting up a new, complex stream processing cluster. The session focuses on live demos and teaches lessons learned for executing analytic models in a highly scalable and performant way. The last part explains how Apache Kafka can help to move from a manual build and deployment of analytic models to continuous online model improvement in real time.
Streaming systems have begun to move past trivial implementations of distributed message queues into more comprehensive architectures that model changed data as a first-class citizen, rather than try to hide changes behind update-in-place databases. This is not an uncommon way of modeling behavior inside a single application, but is a new paradigm in the distributed systems thinking of most engineers.
In this talk, we’ll consider four real-world streaming architectures based on Apache Kafka. We’ll look at the particular problems posed by the four different use cases, and how the systems solved them, with reference to features of Kafka, Kafka Streaming, and where appropriate, open-source extensions like Kafka Connect and schema features.
Apache Kafka’s rise in popularity as a streaming platform has demanded a revisit of its traditional at least once message delivery semantics. In this talk, we present the recent additions to Apache Kafka to achieve exactly once semantics. We shall discuss the newly introduced transactional APIs and use Kafka Streams as an example to show how these APIs are leveraged for streams tasks.
Data integration is a really difficult problem. We know this because 80% of the time in every project is spent getting the data you want the way you want it. We know this because this problem remains challenging despite 40 years of attempts to solve it. All we want is a service that will be reliable, handle all kinds of data and integrate with all kinds of systems, be easy to manage and scale as our systems grow. Oh, and it should be super low latency too. Is it too much to ask?
In this presentation, we’ll discuss the basic challenges of data integration and introduce few design and architecture patterns that are used to tackle these challenges. We will then explore how these patterns can be implemented using Apache Kafka. Difficult problems are difficult and we offer no silver bullets, but we will share pragmatic solutions that helped many organizations build fast, scalable and manageable data pipelines.Learn More
Let's talk some SMACK! Join us for a fantastic evening with some of the experts of the SMACK stack technologies. What is the SMACK stack and why is it important?
In today’s always-connected economy, businesses need to provide real-time services to customers that utilize vast amounts of data. Successful businesses are changing how they build applications—from monolithic architectures to cloud native architectures: distributed systems of microservices, containers, and data services.
The SMACK stack is a common combination of technologies to build data-rich applications. It is composed of Apache Spark (batch/stream processing), Mesosphere / Apache Mesos (cluster manager), AKKA (JVM based actor framework), Apache Cassandra (storage layer), and Apache Kafka (message queue).
For this meetup, we'll be doing things a little differently and running it like a live reddit AMA (ask me anything). We'll have a panelist of experts from the technologies in the SMACK stack to answer any and all questions you may have, so please come prepared with some good questions!
In a recent survey, 54% of respondents say that a streaming platform enables more accurate and/or faster decision making for their business. With stream processing built in, Apache Kafka has emerged as the leading streaming platform to quickly and easily deliver data to every corner of your business. In this session learn how the Streams API in Apache Kafka allows you to develop next-generation applications and with proven reliability, scalability, and low latency enable you to deliver on the promise of real-time analytics.
Typically when we build service based apps, microservices, SOA and the like, we use REST or some RPC framework. But building such applications becomes tricky as they get larger, more complex and share more data. We can trace this trickiness back to a dichotomy that underlies the way systems interact: Data systems are designed to expose data, to make it freely accessible. But services, instead, focus on encapsulation. Restricting the data each service exposes. These two forces inevitably compete as such systems evolve.
This talk will look at a different approach. One where a distributed log can be used to hold data that is shared between services. Then stateful stream processors are embedded right in each service, providing facilities for joining and reacting to the shared streams. The result is a very different way to architect and build service-based applications, but one with some unique benefits as we scale.
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session, we talk about how Apache Kafka helps you to radically simplify your data architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. We discuss common use cases to realize that stream processing in practice often requires database-like functionality, and how Kafka allows you to bridge the worlds of streams and databases when implementing your own core business applications (inventory management for large retailers, patient monitoring in healthcare, fleet tracking in logistics, etc), for example in the form of event-driven, containerized microservices.Session Details
In this talk we'll examine how Stateful Stream Processing can be used to build Event Driven Services, using a distributed log like Apache Kafka. In doing so this Data-Dichotomy is balanced with an architecture that exhibits demonstrably better scaling properties, be it increased complexity, team size, data volume or velocity.
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka in their critical applications to provide the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked, “Can you guarantee that we will always get every transaction?” you want to be able to say “Yes” with total confidence.
Gwen Shapira and Jeff Holoman walk you through everything that happens to a message, from producer to consumer, and pinpoint all the places where data can be lost if you’re not careful. You’ll learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka—and how to prove that the system you’ve built is reliable.Session Details
Join DataStax, Mesosphere and Confluent for Data 'n' Drinks in Chicago at Ace Bounce and learn about relevant real-world use cases and best practices for how you can leverage the always-on data platform, DataStax Enterprise (DSE).
You will also hear how you can use the SMACK Stack to build scalable, always-on, real time, intelligent applications on a data layer that lives seamlessly in any public cloud, on prem or hybrid. This event will address next generation architecture that will power future cloud applications. Network with others who have delivered a flexible, scalable and intelligent personalized experience to all users in real time.
For some Kafka is simply a conduit for low latency analytics. But stream processing is an increasingly popular tool for building transactional systems that run business logic for banks, telcos, consumer companies and more.
This talk walks through how Stateful Stream Processing can be used as a backbone to share state between services, breaking the chains that tie typical request-driven architectures. We’ll look at the benefits of a using a message bus that can retain state. Then layer atop stream processing tools to balance the dichotomy between consistency and the independence services need to iterate and get things done.
Learn how the three realities of modern programming – the explosion of data and data systems, building business processes as microservices instead of monolithic applications and the rise of the public cloud – affect how developers and companies operate today and why companies across all industries are turning to streaming data and Apache Kafka for mission-critical applications.
In the last year, multicluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. The reasons are many and include: -Different groups in the same company using Kafka in different ways -Collecting information from many geographical regions and branches to a centralized analytics cluster -Planning for cases where an entire cluster or data center is not available -Using Kafka to assist in cloud migration Robin Moffatt offers an overview of several use cases, including real-time analytics and payment processing, that may require multicluster solutions and discusses real-world examples with their specific requirements. Robin outlines the pros and cons of several common architecture patterns, including: -Multitenant Kafka clusters -Active-active multiclusters -Failover clusters -Stretching a single cluster between multiple data centers -Using Kafka to bridge between clouds or between on-premises and the cloud Along the way, Robin explores the features of Apache Kafka and demonstrates how to use this understanding of Kafka to choose the right architecture for use cases from the financial, retail, and media industries.
In the last few years, Apache Kafka is a streaming platform and has been used extensively in enterprises for real-time data collecting, delivering, and processing. This talk will provide a deep dive on some of the key internals that help make Kafka popular and provide strong reliability guarantees. Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput. Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high reliability through its built-in replication mechanism. One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
The number of deployments of Apache Kafka at enterprise scale has greatly increased in the years since Kafka’s original development in 2010. Along with this rapid growth has come a wide variety of use cases and deployment strategies that transcend what Kafka’s creators imagined when they originally developed the technology. As the scope and reach of streaming data platforms based on Apache Kafka has grown, the need to understand monitoring and troubleshooting strategies has as well. Topics include: - Effective use of JMX for Kafka - Tools for preventing small problems from becoming big ones - Efficient architectures proven in the wild - Finding and storing the right information when it all goes wrong
Hailing from the Persian city of Ephesus in around 500 BC, the Greek philosopher Heraclitus is famous for his trenchant analysis of big data stream processing systems, saying “You never step into the same river twice.” Central to his philosophy was the idea that all things change constantly. His close readers also know him as the Weeping Philosopher—perhaps because dealing with constantly changing data at low latency is actually pretty hard. It doesn’t need to be that way. Almost as famous as Heraclitus is Apache Kafka, the de facto standard open-source distributed stream processing system. Many of us know Kafka’s architectural and API particulars as well as we know the philosophy of Heraclitus, but that doesn’t mean we know how the most successful deployments of Kafka work. In this talk, I’ll present several real-world systems build on Kafka, not just as a giant message queue, but as a platform for distributed stream computation. The talk will include a brief summary of Kafka architecture and (probably Java) APIs, then a detailed description of several architectures drawn from live customer deployments. The role of stream processing will be featured in each, with attention given to what computation gets done in the stream, how Kafka fills the role of persistence rather than merely a message queue, and what other persistence and computational technologies are present in the system.
Have someone from Confluent contact you.Contact Us