Apache Kudu, the revolutionary storage technology, is often used with other Hadoop ecosystem frameworks to data ingest and processing. This course is practical and hands-on. It demonstrates how Kudu works in conjunction with four frameworks: Apache Spark SQL, Spark SQL, MLlib and Apache Flume.
You will use the KuduSpark module with SparkSQL and SparkSQL to create, move and update data seamlessly between Kudu, Spark. Next, you will use Apache Flume for streaming events into Kudu tables and then query it using Apache Impala. This course is for learners who have limited experience with Hadoop ecosystem components such as HDFS, Hive and Spark.
- Kudu gives you hands-on experience and lets you add more tools to your Big Data toolbox
- Learn how to move data from Kudu tables to Spark apps using Kudu-Spark.
- Flume and Kudu allow you to stream and analyze data real-time.
- Flume can be used to predict movie ratings and you can save the predicted values to Kudu
- These open-source tools can be combined to create data engineering pipelines that are simple and quick.
Ryan Bosshart, a Principal Systems Engineer at Cloudera is responsible for a specialized team that focuses on Hadoop ecosystem storage technologies like HDFS, Hbase and Kudu. Ryan Bosshart is a co-chair of the Twin Cities Spark and Hadoop User Group. He has been a builder and architect of large-scale distributed systems, since 2006. Ryan speaks at conferences across North America about Hadoop technologies and holds an Augsburg College degree in computer science.