Learn Hadoop Administration

Apache Hadoop pSudo Distribution Installation on Ubuntu

Apache Hadoop Installation on Single Node on Ubuntu

In this we are going to install apache hadoop in

1.     Standalone/Local Mode

2.     pSudo Distribution Mode



1.     We need Physical Machine or Virtual Machine

2.     Any Operating System


In This,

1.     Oracle VM Virtual Software

2.     Create VM with Ubuntu Server Operating System


In Local Mode or pSudo Distribution Mode, there three steps to install Hadoop. They are:

1.     Pre-Installation steps

2.     Installation steps

3.     Post-Installation steps.


Once, Pre-Installation and Installation Completed we can say "Standalone/Local"

installation is completed


Pre-Installation + Installation + Post-Installation Completed the we can say "pSudo Distribution Installation is Completed.       


Note: -

à Make sure any Virtualization software installed and any operating system


à In this i have already installed "Oracle VM VirtualBox".

    CentOS 7(CUI, GUI) -- CUI.


Pre-Installation steps: -

1.     Java 8 or Java 1.8 or JDK 8 or JDK 1.8(Recommended) or Later

2.     Setup Passwordless SSH (This for pSudo Distribution Mode)


Note: -

To perform Pre-Installation steps or Installation steps or Post-Installation

Steps, we need either "root" user or "any user" with "sudo" permissions.


In My VM, I have two users

1. Main Admin User

  un: root

  pwd: cfamily


2. Hadoop Admin User

  un: cfamily (sudoer)

  pwd: cfamily  


Recommended is "any user" with "sudoer" permission"


Step1:  Java 8 Installation on Linux Operating System(Ubuntu)

$sudo apt update

$java -version

Command not found


Steps to install Java 8:

1: Download Java 8

2: Install Java 8

3: Set JAVA_HOME and PATH variable

4: Verify Java is installed or not


Download Java 8:

a) download tar file in windows and then upload into Ubuntu


b) download directly in ubuntu server

$cd /opt 

$sudo wget https://download.oracle.com/otn/java/jdk/8u281-b09/89d678f2be164786b292527658ca1605/jdk-8u281-linux-x64.tar.gz


Install Java 8: -

a)  extract tar file

$sudo tar -xvzf jdk-8u281-linux-x64.tar.gz

b) remove tar.gz file 

$ sudo rm -rf jdk-8u281-linux-x64.tar.gz

c) rename

$ ls


cfamily@cfamily:/opt$ sudo mv jdk1.8.0_281 java

cfamily@cfamily:/opt$ ls



Set JAVA_HOME and PATH variable: -

$sudo nano ~/.bashrc 


export JAVA_HOME=/opt/java

export PATH=$PATH:$JAVA_HOME/bin


à restart bashrc 

$ source ~/.bashrc


Verify Java is installed or not: -



$java -version 




Step2: Passwordless SSH

àfor pSudo Distribution

à In pSudo Distribution Mode All Daemons thinks they are running in 

    different machines.

à So Daemon to Daemon communication it uses SSH Protocol

à SSH -- Secure Shell

    It requires username and password.           


$ ssh localhost

The authenticity of host 'localhost (::1)' can't be established.


cfamily@localhost's password:

Last login: Sat May 30 11:55:03 2020 from


[cfamily@cfamily ~]$ ssh localhost

cfamily@localhost's password:

Last login: Sat May 30 12:22:24 2020 from localhost

[cfamily@cfamily ~]$ exit


Connection to localhost closed.

[cfamily@cfamily ~] $ exit


Connection to localhost closed.


--> To make all Daemons can communicate with each other with password we are 


Set Passwordless SSH: -      

$ssh-keygen -t rsa

$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$chmod og-wx ~/.ssh/authorized_keys


Then Verify: -

$ ssh localhost

Now it will not ask any password

II) Installation Steps

Step1: Download Latest Apache Hadoop

Step2: Install Apache Hadoop

Step3: Set HADOOP_HOME and PATH Variable

Step4: Verify Hadoop is Installed or not


Step1: Download Latest Apache Hadoop

Hadoop available in

Hadoop 1.x – Outdated

Hadoop 2.x – Still Industry using

Hadoop 3.x – New Application started


$cd /opt

$sudo wget https://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz


Step2: Install Apache Hadoop

a)      Extract tar file

b)      Remove tar file

c)      Rename Hadoop

$ ls

hadoop-3.3.0.tar.gz  java

cfamily@cfamily:/opt$ sudo tar -xvzf hadoop-3.3.0.tar.gz

$ ls

hadoop-3.3.0  hadoop-3.3.0.tar.gz  java

cfamily@cfamily:/opt$ sudo rm -rf *.gz

cfamily@cfamily:/opt$ ls

hadoop-3.3.0  java

cfamily@cfamily:/opt$ sudo mv hadoop-3.3.0 hadoop

cfamily@cfamily:/opt$ ls

hadoop  java


Step3: Set HADOOP_HOME and PATH Variable

a) Understanding Hadoop Installation

cfamily@cfamily:/opt$ cd hadoop/

cfamily@cfamily:/opt/hadoop$ pwd


cfamily@cfamily:/opt/hadoop$ ls

bin      lib             licenses-binary  NOTICE.txt  share

etc      libexec         LICENSE.txt      README.txt

include  LICENSE-binary  NOTICE-binary    sbin

cfamily@cfamily:/opt/hadoop$ cd bin

cfamily@cfamily:/opt/hadoop/bin$ ls

All Admin Commands

cfamily@cfamily:/opt/hadoop/bin$ cd ..

cfamily@cfamily:/opt/hadoop$ cd sbin/

cfamily@cfamily:/opt/hadoop/sbin$ ls

All Admin and user Commands

cfamily@cfamily:/opt/hadoop/sbin$ cd ..

cfamily@cfamily:/opt/hadoop$ cd etc/

cfamily@cfamily:/opt/hadoop/etc$ ls


cfamily@cfamily:/opt/hadoop/etc$ cd hadoop/

All configuration files


à In Linux for every use we have different files like .bashrc, .profile etc..

à These files executes automatically when user is login

à We are going to configure HADOOP_HOME and PATH variable in .bashrc file

$nano ~/.bashrc


export HADOOP_HOME=/opt/hadoop




restart .bashrc

$ source ~/.bashrc



Step4: Verify Hadoop is Installed or not

$ hadoop version

Hadoop 3.3.0


Note: - Now we can say Local Mode/Standalone Mode installed Successfully.


III) Post-Installation steps

Step 1) In Post-Installation we will configure Hadoop

cfamily@cfamily:/opt/hadoop/etc/hadoop$ pwd


cfamily@cfamily:/opt/hadoop/etc/hadoop$ ls

All configuration files available.


We can configure 4 important files. They are:

1.      core-site.xml

2.      hdfs-site.xml

3.      mapred-site.xml

4.      yarn-site.xml



core-site.xml: -

à we will specify "filesystem type", default is local filesystem.

à we will specify "namenode" and rpc port

$sudo nano core-site.xml








à In this configuration file we will provide/configure

    block size -- default block size until hadoop 1.x 64MB from Hadoop 2.x


            replication-factor -- default replication-factor is "3"


$sudo nano hdfs-site.xml







Why i am setting replication-factor is "1"?

we have only one machine



à until hadoop 1.x MapReduce Architecture 1

    JobTracker and TaskTracker

à From Hadoop 2.x -- MapReduce Architecture 2 -- YARN Architecture

    ResourceManager and NodeManager       


$sudo nano mapred-site.xml








yarn-site.xml: -

à We will specify yarn configuration


$sudo nano yarn-site.xml



















Step 2: Format NameNode

Why we are formatting?

à Once configuration complete, we need to format NameNode about latest


Note: - If you format NameNode we loss old metadata. So, take a backup of



$hdfs namenode -format


Accessing Hadoop

à In Previous Session, We Successfully installed Apache Hadoop on Single

    Node -- pSudo Distribution Mode.

1. Starting and Stopping Hadoop Daemons

2. Enable required ports

3. Understanding NameNode Web UI          


Starting and Stopping Hadoop Daemons:-


     hadoop,hdfs,mapred,yarn etc..



            start-all.sh,stop-all.sh, start-dfs.sh, stop-dfs.sh,start-yarn.sh,

            stop-yarn.sh, etc..


These contain hadoop commands for both hadoop admin and normal user commands.


define users:

$nano ~/.bashrc

export HDFS_NAMENODE_USER="cfamily"

export HDFS_DATANODE_USER="cfamily"



export YARN_NODEMANAGER_USER="cfamily"


restart .bashrc file:-


$ source ~/.bashrc


Note: Configure JAVA_HOME in hadoop-env.sh


start all daemons at once: -



To know daemons are started or not: -

$ jps

3986 ResourceManager

4457 Jps

3707 SecondaryNameNode

3532 DataNode

4092 NodeManager

3437 NameNode


To Stop All Daemons at once: -




Understanding NameNode Web UI

--> start all hadoop daemons


--> open any web browser and use url:

 http://ipaddress:9870  --> hadoop 3.x

 http://ipaddress:50070 --> hadoop 2.x


Note:- Linux is Secure, we can stop firewall, we can enable required ports


Enable required ports: -

$ sudo ufw allow 9870
NameNode Web UI: -  
        |-- browse filesystem 
/ -- root is a main filesystem