Windowing and Join Operations on Streaming Data with Apache Spark on Databricks

Course Cover
compare button icon

Course Features

icon

Duration

122 minutes

icon

Delivery Method

Online

icon

Available on

Downloadable Courses

icon

Accessibility

Mobile, Desktop, Laptop

icon

Language

English

icon

Subtitles

English

icon

Level

Beginner

icon

Teaching Type

Self Paced

icon

Video Content

122 minutes

Course Description

Apache Spark Structured Streaming treats real-time data like a table that is constantly being added to. This stream processing model shifts the burden of stream processing from the user to Spark, making it easy and intuitive to process streaming Data with Spark. Apache Spark supports a variety of windowing operations and join operations on streaming datasets using processing time and event times. This course on Windowing and Join Operations with Streaming Data on Databricks will explain the differences between stateless operations that only operate on one streaming entity and stateful operations which operate on multiple entities accumulated within a stream. Next, you'll explore the various types of windows that Apache Spark supports, including sliding windows, global windows, and tumbling windows. Finally, you'll learn the differences between processing time, event time, and ingestion time. You will also see how windowing operations can be performed using both processing and event times. You will also connect to an HDInsight Kafka Cluster to read records from your input stream. Watermarking will be used to handle late-arriving data. You will also learn how to use watermarks in order to limit the state of Apache Spark storage. Finally, you will explore Spark's support for stream-stream and static-stream joins. This course will show you how to connect to Azure Event Hubs in order to read records.

Course Overview

projects-img

International Faculty

projects-img

Post Course Interactions

projects-img

Hands-On Training,Instructor-Moderated Discussions

Skills You Will Gain

What You Will Learn

Along the way, you will connect to an HDInsight Kafka cluster to read records for your input stream

Finally, you will perform join operations using streams and explore the types of joins that Spark supports for static-stream joins and stream-stream joins

Next, you will understand the differences between event time, ingestion time, and processing time and see how you can perform windowing operations using both processing time as well as event time

Then, you will explore the different kinds of windows supported by Apache Spark which includes tumbling windows, sliding windows, and global windows

When you are finished with this course, you will have the skills and knowledge of windowing and join operations needed to identify when these powerful transformations should be performed and how they are performed

You will also see how you can connect to Azure Event Hubs to read records

You will learn the difference between stateless operations that operate on a single streaming entity and stateful operations that operate on multiple entities accumulated in a stream

You will then use watermarking to deal with late-arriving data and see how you can use watermarks to limit the state that Apache Spark stores

Course Cover