Getting Started with Apache Spark on Databricks

Course Cover
compare button icon

Course Features

icon

Duration

112 minutes

icon

Delivery Method

Online

icon

Available on

Downloadable Courses

icon

Accessibility

Desktop, Laptop

icon

Language

English

icon

Subtitles

English

icon

Level

Beginner

icon

Teaching Type

Self Paced

icon

Video Content

112 minutes

Course Description

Azure Databricks lets you work with large data processing and queries using Apache Spark's unified analytics engine. Azure Databricks allows you to set up an Apache Spark environment in minutes. You can also autoscale your processing and collaborate with other users in an interactive workspace. You will first learn about the Spark architecture for big data processing. Next, you'll learn how Databricks Runtime for Azure makes it easy to use Apache Spark on the Azure Cloud Platform. Finally, you'll learn the workings of Resilient Distributed Datasets, also known as RDDs, which is the data structure that is used for Apache Spark big data processing. It will be clear that Spark Data frames are built on top of RDDs. The two types of operations you can perform on Data frames, namely actions and transformations, will be covered. You'll also learn how Databricks allows you to explore and visualize your data using the display() function that leverages native Python libraries for visualizations.Finally, you will get hands-on experience with big data processing operations such as projection, filtering, and aggregation operations. You will also learn how to read data from external sources such as Azure Cloud Storage, and how to use built-in functions within Apache Spark to transform your data.

Course Overview

projects-img

International Faculty

projects-img

Post Course Interactions

projects-img

Hands-On Training,Instructor-Moderated Discussions

Skills You Will Gain

What You Will Learn

Along the way, you will learn how you can read data from an external source such as Azure Cloud Storage and how you can use built-in functions in Apache Spark to transform your data

Finally, you will get hands-on experience with big data processing operations such as projection, filtering, and aggregation operations

First, you will learn how the Spark architecture is configured for big data processing, you will then learn how the Databricks Runtime on Azure makes it very easy to work with Apache Spark on the Azure Cloud Platform and will explore the basic concepts an

Next, you will learn the workings and nuances of Resilient Distributed Datasets also known as RDDs which is the core data structure used for big data processing in Apache Spark

When you are finished with this course you will have the skills and ability to work with basic transformations, visualizations, and aggregations using Apache Spark on Azure Databricks

You will learn the components of the Apache Spark analytics engine which allows you to process batch as well as streaming data using a unified API

You will see that RDDs are the data structures on top of which Spark Data frames are built

You will study the two types of operations that can be performed on Data frames - namely transformations and actions and understand the difference between them

You’ll also learn how Databricks allows you to explore and visualize your data using the display() function that leverages native Python libraries for visualizations

Course Cover