Introduction to Big Data

  • Characteristics of Big Data
  • Why is parallel computing important
  • Discuss various products developed by vendors

Introducing Hadoop

  • Components of Hadooop
  • Starting Hadoop
  • Identify various processes
  • Hands on

Working with HDFS

  • Basic file commands
  • Web Based User Interface
  • Reading & Writing to files
  • Run a word count program
  • View jobs in the Web UI
  • Hands on

Installation & Configuration of Hadoop

  • Types of installation (RPM’s & Tar files)
  • Set up ‘ssh’ for the Hadoop cluster
  • Tree structure
  • XML, masters and slaves files
  • Checking system health
  • Discuss block size and replication factor
  • Benchmarking the cluster
  • Hands on

Advanced administration activities

  • Adding and de-commissioning nodes
  • Purpose of secondary name node
  • Recovery from a failed name node
  • Managing quotas
  • Enabling trash
  • Hands on

Monitoring the Hadoop Cluster

  • Hadoop infrastructure monitoring
  • Hadoop specific monitoring
  • Install and configure Nagios / Ganglia
  • Capture metrics
  • Hands on

Other Components of the Hadoop ecosystem

  • Discuss Hive, Sqoop, Pig, HBase, Flume
  • Use cases of each
  • Use Hadoop streaming to write code in Perl / Python
  • Hands on

Download Course Content PDF