Course Features

icon

Duration

5 weeks

icon

Delivery Method

Online

icon

Available on

Lifetime Access

icon

Accessibility

Desktop, Laptop

icon

Language

English

icon

Subtitles

English

icon

Level

Intermediate

icon

Effort

6 hours per week

icon

Teaching Type

Self Paced

Course Description

You will learn how to use Hadoop to analyze and manage big data.

This course will allow you to access a virtual environment that includes R, Hadoop, and Rstudio. It will also give you hands-on experience in big data management. You will have the opportunity to test and learn from several unique examples of statistical learning and related R code to map-reduce operations.

Basic knowledge of statistical learning and R is helpful to better understand the methods and how to run them in parallel using map/reduce functions and Hadoop storage. You will have access to RHadoop at University of Ljubljana at the end of this course.

Course Overview

projects-img

International Faculty

projects-img

Post Course Interactions

projects-img

Instructor-Moderated Discussions

Skills You Will Gain

Prerequisites/Requirements

All software needed to actively participate the course is provided within the virtual machine that the followers are supposed to download and run on the local machine No extra software is needed You will need a modest local machine with 15GB free disk spa

What You Will Learn

Explore basic functionality of Apache Hadoop and of RHadoop

Experiment how to achieve performance of modern supercomputing

Experiment regression, clustering and classification with RHadoop

Investigate basic functionality of Bash terminal window

Knowledge about statistical learning to instances of data provided by edcators

How to do big data management with RHadoop on real supercomputer provided by University of Ljubljana

Target Students

This course is designed for people interested in data science, computational statistics and machine learning and have basic experiences with them It will be also useful for advanced undergraduate students and first year PhD students in data analysis, stat

Course Instructors

Author Image

Janez Povh

Instructor

I am an active researcher in mathematical optimization, which has many applications in data science and where HPC is an inevitable tool.
Author Image

Biljana Mileva Boshkoska

Instructor

Biljsna Mileva Boshkoska is an assistant professor in computer science. Her interests include decision support systems, data mining and working with big data.
Author Image

Leon Kos

Instructor

Leon Kos is a 25+ years veteran of using Linux desktop on a daily basis to build digital relationships for research, teaching, and getting the job done by programming.
Course Cover