Sign In
    Saved
      Sign In
      Saved

Big Data with PySpark

tag
Total Duration
24 Hours

Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

Courses in this Learning Path
1
Introduction to PySpark
DataCamp Course via DataCamp
learnpath-img
Duration : 4 hours
Price :₹1,093
Level :Beginner
Learn Type :Certification
Introduction to PySpark

This course will show you how to use Spark with Python. Spark allows you to perform parallel computations using large data sets. It is easy to integrate into Python. PySpark, the Python package that makes all of this magic possible, is responsible. This package allows you to access data on flights between Portland, Washington and Seattle. This package will show you how to manage the data and …

Read More
2
Big Data Fundamentals with PySpark
DataCamp Course via DataCamp
learnpath-img
Duration : 4 hours
Price :₹1,093
Level :Intermediate
Learn Type :Certification
Big Data Fundamentals with PySpark

Big Data has attracted a lot of attention in the past few years, and it is now a mainstream topic for many companies. What is Big Data? This course will introduce you to the basics of Big Data with PySpark. Spark is a framework which allows for Big Data "lightning fast cluster computing". It's a data processing platform engine that can run programs up to 100x faster in memory than Hadoop and 10x …

Read More
3
Cleaning Data with PySpark
DataCamp Course via DataCamp
learnpath-img
Duration : 4 hours
Price :₹1,093
Level :Intermediate
Learn Type :Certification
Cleaning Data with PySpark

Working with data can be difficult. It can be frustrating to work with millions or billions of rows. It is possible that you received data processing code from a laptop with very clean data. It is possible that you were responsible for moving basic data processing processes from prototype to production. You might have worked with real-world data. This could include missing fields, unusual …

Read More
4
Feature Engineering with PySpark
DataCamp Course via DataCamp
learnpath-img
Duration : 4 hours
Price :₹1,093
Level :Intermediate
Learn Type :Certification
Feature Engineering with PySpark

Your job is to find the meaning in chaos. Careful curation is required to create toys datasets like MTCars and Iris. The data must be transformed in order to make them useful for machine-learning algorithms that can predict, extract, classify, cluster, etc. This course will cover the details that data scientists spend between 70 and 80% of their time dealing, such as feature engineering and data …

Read More
5
Machine Learning with PySpark
DataCamp Course via DataCamp
learnpath-img
Duration : 4 hours
Price :₹1,093
Level :Intermediate
Learn Type :Certification
Machine Learning with PySpark

Spark is a powerful tool for Big Data. Spark transparently manages the allocation of compute tasks within a cluster. This allows for quick operations and lets you concentrate on the analysis, not worrying about the technical details. In this course you'll learn how to get data into Spark and then delve into the three fundamental Spark Machine Learning algorithms: Linear Regression, Logistic …

Read More
6
Building Recommendation Engines with PySpark
DataCamp Course via DataCamp
learnpath-img
Duration : 4 hours
Price :₹1,093
Level :Intermediate
Learn Type :Certification
Building Recommendation Engines with PySpark

This course will show you how to build recommendation engines in PySpark with Alternating Least Squares. This course will show you how to create recommendation engines in PySpark using Alternating Least Squares. It uses both the MovieLens dataset and Million Songs. This course also contains the code required to train, test, and implement ALS models on various types of customer data.