Getting Started with Apache Spark on Databricks
Course Features
Duration
112 minutes
Delivery Method
Online
Available on
Downloadable Courses
Accessibility
Desktop, Laptop
Language
English
Subtitles
English
Level
Beginner
Teaching Type
Self Paced
Video Content
112 minutes
Course Description
Course Overview
International Faculty
Post Course Interactions
Hands-On Training,Instructor-Moderated Discussions
Skills You Will Gain
What You Will Learn
Along the way, you will learn how you can read data from an external source such as Azure Cloud Storage and how you can use built-in functions in Apache Spark to transform your data
Finally, you will get hands-on experience with big data processing operations such as projection, filtering, and aggregation operations
First, you will learn how the Spark architecture is configured for big data processing, you will then learn how the Databricks Runtime on Azure makes it very easy to work with Apache Spark on the Azure Cloud Platform and will explore the basic concepts an
Next, you will learn the workings and nuances of Resilient Distributed Datasets also known as RDDs which is the core data structure used for big data processing in Apache Spark
When you are finished with this course you will have the skills and ability to work with basic transformations, visualizations, and aggregations using Apache Spark on Azure Databricks
You will learn the components of the Apache Spark analytics engine which allows you to process batch as well as streaming data using a unified API
You will see that RDDs are the data structures on top of which Spark Data frames are built
You will study the two types of operations that can be performed on Data frames - namely transformations and actions and understand the difference between them
You’ll also learn how Databricks allows you to explore and visualize your data using the display() function that leverages native Python libraries for visualizations