X
Search results for:
No result found! Try with different keywords!
Users
Organizers
Events
Advertisement

HDP Developer: Quick Start - Hortonworks Official Curriculum

Advertisement

HDP Developer: Quick Start - Hortonworks Official Curriculum


COURSE OVERVIEW

This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark.Topics include: Essential understanding of HDP and its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features.


COURSE CONTENT

DAY 1: AN INTRODUCTION TO APACHE HADOOP AND HDFS


OBJECTIVES



The Case for Hadoop



The Hadoop Ecosystem



The HDFS Architecture



Ingesting Data Into HDFS



Parallel Processing Fundamentals



YARN Architecture



Introduction to Apache Pig




LABS



Starting anHDP Cluster



Using HDFS Commands



Demonstration: Understanding Apache Pig



Getting Started with Apache Pig



Exploring Data with Pig



DAY 2: ADVANCED APACHE PIG PROGRAMMING


OBJECTIVES



Advanced Apache Pig Programming



Introduction to Apache Hive



Using HCatalog




LABS



Splitting a Dataset



Joining Datasets



Preparing Data for Apache Hive



Understanding Apache Hive Tables



Demonstration: Understanding Partitions and Skew



Analyzing Big Data with Apache Hive



Demonstration: Computing Ngrams



Joining Datasets in Apache Hive



Computing NGrams of Emails in Avro Format



Using HCatalog with Apache Pig



DAY 3: ADVANCED APACHE HIVE PROGRAMMING


OBJECTIVES



Advanced Apache Hive Programming



An Overview of Apache Zeppelin and Apache Spark



An Introduction to RDD Programming



An Introduction to Pair RDDs




LABS



Advanced Apache Hive Programming



Introduction to Apache Spark REPLs and Apache Zeppelin



Creating and Manipulating RDDs



Creating and Manipulating Pair RDDs



DAY 4: WORKING WITH PAIR RDDS AND BUILDING YARN APPLICATIONS


OBJECTIVES



An Introduction to Pair RDDs (Continued)



An Introduction to Spark SQL



Caching and Persisting



Building and Submitting Applications to YARN





LABS


Creating and Saving DateFrames and Tables

Working with DataFrames

Building and Submitting Applications to YARN



You may also like the following events from Agilitics Singapore:

Also check out other Workshops in Taguig.

Liked this event? Spread the word :

Map Indonesia
Loading venue map..
Event details from Report a problem

Are you going to this event?

Organizer

Agilitics Singapore
Advertisement