Learn Hadoop Administration

Apache Hadoop pSudo Distribution Installation on CentOS


è Local Mode and pSudo Distribution Mode we can install on Single Machine

è Operating System should be installed It supports Windows, Linux and UNIX operating Systems – CentOS 7

è Java should be Installed. Recommended is Java 8 or later version.


Steps to install Hadoop:-

There are mainly 3 steps. They are:

Pre Installation Steps

Step1: Install Java (Recommended is Java 1.8 or Java 8)

Step2: Setup Passwordless SSH.


Installation Steps

Step3: Install Hadoop



Step 4: Configure Hadoop


Format NameNode

è To apply our changes


After Pre-Installation and Installation steps we can say Local Mode or Standalone Mode completed.


Pre Installation Steps

Step1: Install Java (Recommended is Java 1.8 or Java 8)

Step2: Setup Passwordless SSH.


Install Java:- (Recommended is Java 1.8 or Java 8)

Step1:- Download Java

Step2:- Install Java

Step3:- Set JAVA_HOME and PATH Variable

Step4:-  Verify Java is Installed or not


Download Java 8:-

$cd /opt

$sudo wget jdk_url

$ sudo wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.rpm


Install Java:-

$ sudo yum install jdk-8u131-linux-x64.rpm -y


Set JAVA_HOME and PATH Variable

Installation path =  /usr/java/jdk1.8.0_131



Java Binaries or Commands presented in /usr/java/jdk1.8.0_131/bin


Why We need to set JAVA_HOME?



Why we need to set PATH variable?



In Windows, we use Environment variables to set JAVA_HOME and PATH variable.


In Linux, we use user profile files like .bashrc, .profile, etc..


$nano ~/.bashrc


export JAVA_HOME=/usr/java/jdk1.8.0_131

export PATH=$PATH:$JAVA_HOME/bin

ctrl+x+y à to save and come out from nano prompt.


Restart .bashrc file

$ source ~/.bashrc


Verify Java is Installed or not

$java -version




Setup Passwordless SSH

SSH – Secure Shell

[cfamily@cfamily bin]$ ssh localhost

The authenticity of host 'localhost (::1)' can't be established.

ECDSA key fingerprint is SHA256:fDsvpLgsscfdNdB/fztpx4PS0NxsBZb9WlqX8jVhhcc.

ECDSA key fingerprint is MD5:1c:2b:98:42:ad:fa:4e:12:2a:10:0d:a2:5f:76:69:1a.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

cfamily@localhost's password:

Last login: Thu Jul 22 15:51:35 2021 from

[cfamily@cfamily ~]$ exit



SSH Requires Password.










Why we are setting Passwordless SSH?


$ssh-keygen -t rsa

$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$chmod og-wx ~/.ssh/authorized_keys


Verify Passwordless working or not?

$ ssh localhost – It will not ask password.




















Hadoop Installation

Steps to Install Hadoop 3.x:-

Step1: Download Hadoop 3.x latest version

Step2: Install Hadoop 3.x

Step3: Set  HADOOP_HOME and PATH variables

Step4: Verify Hadoop is Installed or not


Download Hadoop 3.x latest version



$cd /opt

$sudo wget https://mirrors.sonic.net/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz


Install Hadoop 3.x

a)     Extract tar file

b)    Remove tar file – optional

c)     Rename Hadoop

d)    Change directory permissions.

$ sudo tar -xvzf hadoop-3.3.1.tar.gz

[cfamily@cfamily opt]$ ls

hadoop-3.3.1  hadoop-3.3.1.tar.gz

[cfamily@cfamily opt]$ sudo rm -rf hadoop-3.3.1.tar.gz

[cfamily@cfamily opt]$ ls


[cfamily@cfamily opt]$ sudo mv hadoop-3.3.1 hadoop3

[cfamily@cfamily opt]$ ls


[cfamily@cfamily opt]$ sudo chmod -R 777 hadoop3

[cfamily@cfamily opt]$ ls -l

total 0

drwxrwxrwx. 10 cfamily cfamily 215 Jun 15 11:22 hadoop3


Set  HADOOP_HOME and PATH variables

Why we need to set HADOOP_HOME and PATH Variable?

[cfamily@cfamily opt]$ cd hadoop3/

[cfamily@cfamily hadoop3]$ ls

bin      lib             licenses-binary  NOTICE.txt  share

etc      libexec         LICENSE.txt      README.txt

include  LICENSE-binary  NOTICE-binary    sbin

[cfamily@cfamily hadoop3]$ cd bin/

[cfamily@cfamily bin]$ ls

container-executor  hdfs      mapred.cmd               yarn

hadoop              hdfs.cmd  oom-listener             yarn.cmd

hadoop.cmd          mapred    test-container-executor

[cfamily@cfamily bin]$ cd ..

[cfamily@cfamily hadoop3]$ cd sbin/

[cfamily@cfamily sbin]$ ls

distribute-exclude.sh    start-all.sh         stop-balancer.sh

FederationStateStore     start-balancer.sh    stop-dfs.cmd

hadoop-daemon.sh         start-dfs.cmd        stop-dfs.sh

hadoop-daemons.sh        start-dfs.sh         stop-secure-dns.sh

httpfs.sh                start-secure-dns.sh  stop-yarn.cmd

kms.sh                   start-yarn.cmd       stop-yarn.sh

mr-jobhistory-daemon.sh  start-yarn.sh        workers.sh

refresh-namenodes.sh     stop-all.cmd         yarn-daemon.sh

start-all.cmd            stop-all.sh          yarn-daemons.sh


To make above script files available anywhere in our system


In Windows, we use Environment variables. In Linux, we use user profile files like .bashrc, .profile etc..


$nano ~/.bashrc


export HADOOP_HOME=/opt/hadoop3


ctrl+x+y à To save and Quit from nano prompt.


Restart .bashrc file

$ source ~/.bashrc


Verify Hadoop is Installed or not

$ hadoop version

Hadoop 3.3.1


With this we can say Hadoop is Successfully installed.


Local Mode or Standalone Mode = Pre-Installation Steps + Installation Steps.



Post Installation Steps:-

è In Post Installation we will perform configuration of Hadoop.

è All Configuration files present in


è There are mainly 4 configuration files

1.     core-site.xml

2.     hdfs-site.xml

3.     mapred-site.xml

4.     yarn-site.xml – introduced from Hadoop 2.x

[cfamily@cfamily ~]$ cd /opt/hadoop3/etc/hadoop/

[cfamily@cfamily hadoop]$ ls



à we will specify "filesystem type", default is local filesystem.

à we will specify "namenode" and rpc port

$nano core-site.xml









à In this configuration file we will provide/configure

    block size -- default block size unitl hadoop 1.x -- 64MB from Hadoop 2.x -- 128MB.

    replication-factor -- default replication-factor is "3"


$nano hdfs-site.xml









è Until Hadoop 1.x we have JoTracker and TaskTracker

è So we use only map-red-site.xml until hadoop.1.x

è This is also called MapReduce 1 Architecure.

è From Hadoop 2.x They Introduced YARN Architecture or MapReduce 2 Architecture.

è In This They release ResourceManager instead of JobTracker.

NodeManager instead of TaskTracker. So additionally we have yarn-site.xml

è In  mpared-site.xml we will specify MapReduce Architecture.


$nano mapred-site.xml









è We will provide YARN Daemons information

$nano yarn-site.xml






















Format NameNode:-

$hdfs namenode -format


Note:- Once Post-Installation is Completed then we can say pSudo Distribution Mode is completed.