준비하세요! 일일 일괄 처리 작업을 실시간 아키텍처로 전환하기 위한 필드 참고 사항

« Current 2022

Are you considering converting your daily batch ETLs into a new and exhilarating realtime framework? We’ll help you look before you leap as we take a deep dive into the unique operational challenges entailed in transitioning data processing paradigms.

As batched data pipelines consume data from well defined time intervals and write results to partitioned data storage, batched jobs are often idempotent, so the failure recovery is simply rerunning the faulty job instances. Batched data processes are triggered at a certain frequency (e.g. daily or hourly), so the data latency is determined by both the job scheduler and job run time. Therefore, many advanced data use cases, such as frequency capping, requires event streaming to enable real-time data insights. Event streaming applications process unbounded input data in real-time and append output to message queues and/or tables to be further processed. However, real-time data insights are no free meal - because event streaming comes with many unique engineering challenges, such as handling late-arriving and duplicate events, implementing event-time partitioning, and backfilling historical data after failures. In addition, batched-driven and even streaming are not incompatible to each other but can often be better together, as the Delta and Kappa Architecture are commonly adopted in modern data systems.

In this session, we will demystify operational complexity of event streaming in the real data engineering world and share best practices learned from developing and maintaining web-scale data systems at Netflix. After attending the session, you will gain a comprehensive understanding of the trade-offs between batched data processing and even streaming and make better data system design decisions for your business/research use cases.

발표자

Valerie Burchby

Netflix

발표자

Xinran Waibel

Netflix

Xinran Waibel (She/Her) is a Senior Data Engineer on the Personalization Data Engineering team at Netflix. She is also the founder of Data Engineer Things on Medium. Check out her blog posts: medium.com/data-engineer-things

준비하세요! 일일 일괄 처리 작업을 실시간 아키텍처로 전환하기 위한 필드 참고 사항

발표자

Valerie Burchby

발표자

Xinran Waibel

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how