Serverless Data Processing with Dataflow

blur

Learn Path Description

It is becoming harder and harder to maintain a technology stack that can keep up with the growing demands of a data-driven business. Every Big Data practitioner is familiar with the three V’s of Big Data: volume, velocity, and variety. What if there was a scale-proof technology that was designed to meet these demands?

Skills You Will Gain

Courses In This Learning Path

blur
icon

Total Duration

46 minutes

icon

Level

Intermediate

icon

Learn Type

Certifications

Serverless Data Processing with Dataflow: Foundations

This is the first part of a 3-course series about Serverless Data Processing with Dataflow.We start this course with refreshers on:What Apache Beam is, and how it relates to DataflowApache Beam vision and benefits of the Beam Portability Framework. The Beam Portability framework enables developers to use their preferred programming language with their preferred execution backend.Dataflow lets you separate storage and compute while saving money. How identity, access and management tools interact to your Dataflow pipelinesWe will also discuss how to create the best security model for your Dataflow use case.

blur
icon

Total Duration

118 minutes

icon

Level

Advanced

icon

Learn Type

Certifications

Serverless Data Processing with Dataflow: Develop Pipelines

This is the second installment in the Dataflow course series. We will be going deeper into developing pipelines with the Beam SDK. Let's start by reviewing the Apache Beam concepts. Next we will discuss streaming data processing using windows, watermarks, and triggers. Next, we will discuss the options for sources and sinks within your pipelines. We also discuss schemas that can be used to express structured data and how to statefully transform it using State and Timer APIs. Next, we will discuss best practices to maximize the performance of your pipeline. We will be covering SQL and Dataframes in the final part of the course. This will allow you to express your business logic in Beam. You'll also learn how to develop pipelines iteratively using Beam notebooks.

blur
icon

Total Duration

115 minutes

icon

Level

Advanced

icon

Learn Type

Certifications

Serverless Data Processing with Dataflow: Operations

The last installment in the Dataflow course series will present the components of Dataflow's operational model. We will discuss tools and techniques to optimize pipeline performance and troubleshoot problems. Next, we will review best practices in testing, deployment, reliability, and maintenance of Dataflow pipelines. We'll end with a discussion on Templates. This makes it possible to scale Dataflow pipelines for organizations with hundreds of users. These lessons will ensure your data platform is resilient to unexpected circumstances.

blur