HDP Analyst: Data Science - Hortonworks Official Curriculum
- 11 June 2018 09:00 - 13 June 2018 17:00
- Science and technology
- Noida, Uttar Pradesh, India.
- $ Ticket price starts from : USD 2,124.95
- 00
Days
- 00
Hours
- 00
Min
- 00
Sec
This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.
DAY 1: AN INTRODUCTION TO HADOOP AND DATASCIENCE
OBJECTIVES
Using Hadoop for Data Science
The Hadoop Distributed File System
The MapReduce Framework
Hadoop 2 and YARN
Machine Learning from Data
LABS
Setting up the Lab Environment
Using HDFS Commands
Demonstration: Understanding MapReduce
Using Apache Mahout for Machine Learning
DAY 2: AN INTRODUCTION TO APACHE PIG AND PYTHON
OBJECTIVES
Introduction to Apache Pig
Python Programming
Analyzing Data with Python
Running Python on Hadoop
Machine Learning Algorithms
LABS
Getting Started with Apache Pig
Using the IPython Notebook
Demonstration: Understanding the NumPy Package
Demonstration: The Pandas Library
Performing Data Analysis with Python
Interpolating Data Points
Defining User Defined Functions in Python
Streaming Python with Apache Pig
Exploring Data with Apache Pig
Demonstration: Classification with Scikit-Learn
Computing K-Nearest Neighbor
Generating a K-Means Clustering
DAY 3: MACHINE LEARNING ALGORITHMS
OBJECTIVES
Machine Learning Algorithms Continued
Natural Language Processing
Apache SparkMLib
Talking Data Science to Production
LABS
Demonstration: POS Tagging Using a Decision Tree
Using the Python Natural Language Toolkit
Classifying Text Using Naïve Bayes
Using Spark Transformations andActions
Using Spark MLib
Creating a Spam Classifier Using Spark MLib