Fundamentals of Scalable Data Science

Course Cover

5

(8)

compare button icon

Course Features

icon

Duration

20 hours

icon

Delivery Method

Online

icon

Available on

Limited Access

icon

Accessibility

Desktop, Laptop

icon

Language

English

icon

Subtitles

English

icon

Level

Beginner

icon

Teaching Type

Self Paced

icon

Video Content

20 hours

Course Description

Apache Spark is the standard for large-scale data processing. This course is the first in a series of courses leading to the IBM Advanced Data Science Specialization. It is essential to learn how to build a scalable platform for data science because memory and CPU limitations are the most important limiting factors in building advanced machine learning models.

This course will teach you how to use python with Apache Spark. In the first two weeks, we will introduce Apache Spark and then learn how to use it to perform basic exploratory and pre-processing tasks. This exercise will also introduce you to data visualization and statistical methods. This will give you the knowledge necessary to assume the role of data engineer in any modern setting. It also gives you the foundation for your data science career. Please have a look at the full specialization curriculum: https://www.coursera.org/specializations/advanced-data-science-ibm If you choose to take this course and earn the Coursera course certificate, you will also earn an IBM digital badge. For more information about IBM digital badges, visit ibm.biz/badging. This course will help you to recognize data patterns, patterns, deviations, inconsistencies, and outliers. * Identify useful techniques for working with big data such as dimension reduction and feature selection methods * Use advanced tools and charting libraries to: o improve efficiency of analysis of big-data with partitioning and parallel analysis o Visualize the data in an number of 2D and 3D formats (Box Plot, Run Chart, Scatter Plot, Pareto Chart, and Multidimensional Scaling) For successful completion of the course, the following prerequisites are recommended: * Basic programming skills in python * Basic math * Basic SQL (you can get it easily from https://www.coursera.org/learn/sql-data-science if needed) In order to complete this course, the following technologies will be used: (These technologies are introduced in the course as necessary so no previous knowledge is required.) Some of the course material is considered too complex. If you feel the exact same, please take a look at these materials before you start this course. We've heard that it really helps. You can try this course first, and then, if you feel the need, go to these courses. It's free... https://cognitiveclass.ai/learn/spark https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/f8982db1-5e55-46d6-a272-fd11b670be38/view?access_token=533a1925cd1c4c362aabe7b3336b3eae2a99e0dc923ec0775d891c31c5bbbc68 This course takes four weeks, 4-6h per week

Course Overview

projects-img

International Faculty

projects-img

Post Course Interactions

projects-img

Instructor-Moderated Discussions

Skills You Will Gain

What You Will Learn

Tools that support BigData solutions

Scaling Math for Statistics on Apache Spark

Data Visualization of Big Data

Course Instructors

Romeo Kienzler

Chief Data Scientist, Course Lead

Romeo Kienzler holds a M. Sc. (ETH) in Information Systems, Bioinformatics & Applied Statistics (Swiss Federal Institute of Technology). He has nearly two decades of experience in Software Enineering...

Course Reviews

Average Rating Based on 8 reviews

4.9

88%

13%

Course Cover