Building ETL and Data Pipelines with Bash, Airflow and Kafka

Course Cover
compare button icon

Course Features

icon

Duration

5 weeks

icon

Delivery Method

Online

icon

Available on

Limited Access

icon

Accessibility

Mobile, Desktop, Laptop

icon

Language

English

icon

Subtitles

English

icon

Level

Beginner

icon

Effort

4 hours per week

icon

Teaching Type

Self Paced

Course Description

Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. Defining your data workflows, pipelines and processes early in the platform design ensures the right raw data is collected, transformed and loaded into desired storage layers and available for processing and analysis as and when required.

This course is designed to provide you the critical knowledge and skills needed by Data Engineers and Data Warehousing specialists to create and manage ETL, ELT, and data pipeline processes.

Upon completing this course you’ll gain a solid understanding of Extract, Transform, Load (ETL), and Extract, Load, and Transform (ELT) processes; practice extracting data, transforming data, and loading transformed data into a staging area; create an ETL data pipeline using Bash shell-scripting, build a batch ETL workflow using Apache Airflow and build a streaming data pipeline using Apache Kafka.

You’ll gain hands-on experience with practice labs throughout the course and work on a real-world inspired project to build data pipelines using several technologies that can be added to your portfolio and demonstrate your ability to perform as a Data Engineer.

This course pre-requisites that you have prior skills to work with datasets, SQL, relational databases, and Bash shell scripts.

Course Overview

projects-img

International Faculty

projects-img

Post Course Interactions

projects-img

Instructor-Moderated Discussions

Skills You Will Gain

Prerequisites/Requirements

Computer and IT literacy.

What You Will Learn

Create batch ETL processes using Apache Airflow and streaming data pipelines using Apache Kafka.

Define data pipeline components, processes, tools, and technologies.

Demonstrate understanding of how shell-scripting is used to implement an ETL pipeline.

Describe and differentiate between Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) processes.

Course Instructors

Jeff Grossman

Data Science and Engineering SME

Jeff has a background in Pure Mathematics, Geophysical Signal & Image Processing, Medical Imaging, and Data Science & Engineering. Jeff has built his career around data and algorithm developm...

Rav Ahuja

AI and Data Science Program Director

Rav Ahuja is a Global Program Director at IBM. He leads growth strategy, curriculum creation, and partner programs for the IBM Skills Network. Rav co-founded Cognitive Class, an IBM led initiative to...

Yan Luo

Ph.D., Data Scientist and Developer

Yan Luo, Ph.D., is a data scientist and developer at IBM Canada. Yan has been building innovative AI and cognitive applications in various areas such as mining software repositories, personalized hea...
Course Cover