Windowing and Join Operations on Streaming Data with Apache Spark on Databricks
Course Features
Duration
122 minutes
Delivery Method
Online
Available on
Downloadable Courses
Accessibility
Mobile, Desktop, Laptop
Language
English
Subtitles
English
Level
Beginner
Teaching Type
Self Paced
Video Content
122 minutes
Course Description
Course Overview
International Faculty
Post Course Interactions
Hands-On Training,Instructor-Moderated Discussions
Skills You Will Gain
What You Will Learn
Along the way, you will connect to an HDInsight Kafka cluster to read records for your input stream
Finally, you will perform join operations using streams and explore the types of joins that Spark supports for static-stream joins and stream-stream joins
Next, you will understand the differences between event time, ingestion time, and processing time and see how you can perform windowing operations using both processing time as well as event time
Then, you will explore the different kinds of windows supported by Apache Spark which includes tumbling windows, sliding windows, and global windows
When you are finished with this course, you will have the skills and knowledge of windowing and join operations needed to identify when these powerful transformations should be performed and how they are performed
You will also see how you can connect to Azure Event Hubs to read records
You will learn the difference between stateless operations that operate on a single streaming entity and stateful operations that operate on multiple entities accumulated in a stream
You will then use watermarking to deal with late-arriving data and see how you can use watermarks to limit the state that Apache Spark stores