Big Data

Apache Spark

Windowing and Join Operations on Streaming Data with Apache Spark on Databricks

Artificial Intelligence & Data Science

Hands On Training

Windowing and Join Operations on Streaming Data with Apache Spark on Databricks

via

Pluralsight

Course Features

Duration

122 minutes

Delivery Method

Online

Available on

Downloadable Courses

Accessibility

Mobile, Desktop, Laptop

Language

English

Subtitles

English

Level

Beginner

Teaching Type

Self Paced

Video Content

122 minutes

Course Description

Apache Spark Structured Streaming treats real-time data like a table that is constantly being added to. This stream processing model shifts the burden of stream processing from the user to Spark, making it easy and intuitive to process streaming Data with Spark. Apache Spark supports a variety of windowing operations and join operations on streaming datasets using processing time and event times. This course on Windowing and Join Operations with Streaming Data on Databricks will explain the differences between stateless operations that only operate on one streaming entity and stateful operations which operate on multiple entities accumulated within a stream. Next, you'll explore the various types of windows that Apache Spark supports, including sliding windows, global windows, and tumbling windows. Finally, you'll learn the differences between processing time, event time, and ingestion time. You will also see how windowing operations can be performed using both processing and event times. You will also connect to an HDInsight Kafka Cluster to read records from your input stream. Watermarking will be used to handle late-arriving data. You will also learn how to use watermarks in order to limit the state of Apache Spark storage. Finally, you will explore Spark's support for stream-stream and static-stream joins. This course will show you how to connect to Azure Event Hubs in order to read records.

Course Overview

International Faculty

Post Course Interactions

Hands-On Training,Instructor-Moderated Discussions

Skills You Will Gain

What You Will Learn

Along the way, you will connect to an HDInsight Kafka cluster to read records for your input stream

Finally, you will perform join operations using streams and explore the types of joins that Spark supports for static-stream joins and stream-stream joins

Next, you will understand the differences between event time, ingestion time, and processing time and see how you can perform windowing operations using both processing time as well as event time

Then, you will explore the different kinds of windows supported by Apache Spark which includes tumbling windows, sliding windows, and global windows

When you are finished with this course, you will have the skills and knowledge of windowing and join operations needed to identify when these powerful transformations should be performed and how they are performed

You will also see how you can connect to Azure Event Hubs to read records

You will learn the difference between stateless operations that operate on a single streaming entity and stateful operations that operate on multiple entities accumulated in a stream

You will then use watermarking to deal with late-arriving data and see how you can use watermarks to limit the state that Apache Spark stores

Course Content

Expand all sections

Windowing and Join Operations on Streaming Data with Apache Spark on Databricks

Course Features

Course Description

Module 1: Performing Windowing Operations on Data

Module 2: Exploring Aggregations Using Watermarks

Module 3: Performing Join Operations on Data

Explore

More

Get in Touch with Us