Learn Hadoop Administration

Apache Hadoop pSudo Distribution Installation on CentOS

Introduction

è Local Mode and pSudo Distribution Mode we can install on Single Machine

è Operating System should be installed It supports Windows, Linux and UNIX operating Systems – CentOS 7

è Java should be Installed. Recommended is Java 8 or later version.

 

Steps to install Hadoop:-

There are mainly 3 steps. They are:

Pre Installation Steps

Step1: Install Java (Recommended is Java 1.8 or Java 8)

Step2: Setup Passwordless SSH.

 

Installation Steps

Step3: Install Hadoop

 

Post-Installation

Step 4: Configure Hadoop

 

Format NameNode

è To apply our changes

 

After Pre-Installation and Installation steps we can say Local Mode or Standalone Mode completed.

 

Pre Installation Steps

Step1: Install Java (Recommended is Java 1.8 or Java 8)

Step2: Setup Passwordless SSH.

 

Install Java:- (Recommended is Java 1.8 or Java 8)

Step1:- Download Java

Step2:- Install Java

Step3:- Set JAVA_HOME and PATH Variable

Step4:-  Verify Java is Installed or not

 

Download Java 8:-

$cd /opt

$sudo wget jdk_url

$ sudo wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.rpm

 

Install Java:-

$ sudo yum install jdk-8u131-linux-x64.rpm -y

 

Set JAVA_HOME and PATH Variable

Installation path =  /usr/java/jdk1.8.0_131

JAVA_HOME=/usr/java/jdk1.8.0_131

 

Java Binaries or Commands presented in /usr/java/jdk1.8.0_131/bin

 

Why We need to set JAVA_HOME?

JAVA_HOME=/usr/java/jdk1.8.0_131

 

Why we need to set PATH variable?

PATH=/usr/java/jdk1.8.0_131/bin

 

In Windows, we use Environment variables to set JAVA_HOME and PATH variable.

 

In Linux, we use user profile files like .bashrc, .profile, etc..

 

$nano ~/.bashrc

#JAVA_HOME

export JAVA_HOME=/usr/java/jdk1.8.0_131

export PATH=$PATH:$JAVA_HOME/bin

ctrl+x+y à to save and come out from nano prompt.

 

Restart .bashrc file

$ source ~/.bashrc

 

Verify Java is Installed or not

$java -version

$java

$javac

 

Setup Passwordless SSH

SSH – Secure Shell

[cfamily@cfamily bin]$ ssh localhost

The authenticity of host 'localhost (::1)' can't be established.

ECDSA key fingerprint is SHA256:fDsvpLgsscfdNdB/fztpx4PS0NxsBZb9WlqX8jVhhcc.

ECDSA key fingerprint is MD5:1c:2b:98:42:ad:fa:4e:12:2a:10:0d:a2:5f:76:69:1a.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

cfamily@localhost's password:

Last login: Thu Jul 22 15:51:35 2021 from 192.168.1.11

[cfamily@cfamily ~]$ exit

Logout

 

SSH Requires Password.

 

 

 

 

 

 

 

 

 

Why we are setting Passwordless SSH?

 

$ssh-keygen -t rsa

$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$chmod og-wx ~/.ssh/authorized_keys

 

Verify Passwordless working or not?

$ ssh localhost – It will not ask password.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Hadoop Installation

Steps to Install Hadoop 3.x:-

Step1: Download Hadoop 3.x latest version

Step2: Install Hadoop 3.x

Step3: Set  HADOOP_HOME and PATH variables

Step4: Verify Hadoop is Installed or not

 

Download Hadoop 3.x latest version

https://mirrors.sonic.net/apache/hadoop/common/hadoop-3.3.1/

 

$cd /opt

$sudo wget https://mirrors.sonic.net/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

 

Install Hadoop 3.x

a)     Extract tar file

b)    Remove tar file – optional

c)     Rename Hadoop

d)    Change directory permissions.

$ sudo tar -xvzf hadoop-3.3.1.tar.gz

[cfamily@cfamily opt]$ ls

hadoop-3.3.1  hadoop-3.3.1.tar.gz

[cfamily@cfamily opt]$ sudo rm -rf hadoop-3.3.1.tar.gz

[cfamily@cfamily opt]$ ls

hadoop-3.3.1

[cfamily@cfamily opt]$ sudo mv hadoop-3.3.1 hadoop3

[cfamily@cfamily opt]$ ls

hadoop3

[cfamily@cfamily opt]$ sudo chmod -R 777 hadoop3

[cfamily@cfamily opt]$ ls -l

total 0

drwxrwxrwx. 10 cfamily cfamily 215 Jun 15 11:22 hadoop3

 

Set  HADOOP_HOME and PATH variables

Why we need to set HADOOP_HOME and PATH Variable?

[cfamily@cfamily opt]$ cd hadoop3/

[cfamily@cfamily hadoop3]$ ls

bin      lib             licenses-binary  NOTICE.txt  share

etc      libexec         LICENSE.txt      README.txt

include  LICENSE-binary  NOTICE-binary    sbin

[cfamily@cfamily hadoop3]$ cd bin/

[cfamily@cfamily bin]$ ls

container-executor  hdfs      mapred.cmd               yarn

hadoop              hdfs.cmd  oom-listener             yarn.cmd

hadoop.cmd          mapred    test-container-executor

[cfamily@cfamily bin]$ cd ..

[cfamily@cfamily hadoop3]$ cd sbin/

[cfamily@cfamily sbin]$ ls

distribute-exclude.sh    start-all.sh         stop-balancer.sh

FederationStateStore     start-balancer.sh    stop-dfs.cmd

hadoop-daemon.sh         start-dfs.cmd        stop-dfs.sh

hadoop-daemons.sh        start-dfs.sh         stop-secure-dns.sh

httpfs.sh                start-secure-dns.sh  stop-yarn.cmd

kms.sh                   start-yarn.cmd       stop-yarn.sh

mr-jobhistory-daemon.sh  start-yarn.sh        workers.sh

refresh-namenodes.sh     stop-all.cmd         yarn-daemon.sh

start-all.cmd            stop-all.sh          yarn-daemons.sh

 

To make above script files available anywhere in our system

 

In Windows, we use Environment variables. In Linux, we use user profile files like .bashrc, .profile etc..

 

$nano ~/.bashrc

#HADOOP_HOME

export HADOOP_HOME=/opt/hadoop3

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

ctrl+x+y à To save and Quit from nano prompt.

 

Restart .bashrc file

$ source ~/.bashrc

 

Verify Hadoop is Installed or not

$ hadoop version

Hadoop 3.3.1

 

With this we can say Hadoop is Successfully installed.

 

Local Mode or Standalone Mode = Pre-Installation Steps + Installation Steps.

 

 

Post Installation Steps:-

è In Post Installation we will perform configuration of Hadoop.

è All Configuration files present in

HADOOP_HOME/etc/hadoop

è There are mainly 4 configuration files

1.     core-site.xml

2.     hdfs-site.xml

3.     mapred-site.xml

4.     yarn-site.xml – introduced from Hadoop 2.x

[cfamily@cfamily ~]$ cd /opt/hadoop3/etc/hadoop/

[cfamily@cfamily hadoop]$ ls

 

core-site.xml:-

à we will specify "filesystem type", default is local filesystem.

à we will specify "namenode" and rpc port

$nano core-site.xml

<configuration>

        <property>

            <name>fs.default.name</name>

            <value>hdfs://localhost:9000</value>

        </property>

</configuration>

 

hdfs-site.xml:-

à In this configuration file we will provide/configure

    block size -- default block size unitl hadoop 1.x -- 64MB from Hadoop 2.x -- 128MB.

    replication-factor -- default replication-factor is "3"

    etc..

$nano hdfs-site.xml

<configuration>

     <property>

            <name>dfs.replication</name>

            <value>1</value>

    </property>

</configuration>

 

mapred-site.xml

è Until Hadoop 1.x we have JoTracker and TaskTracker

è So we use only map-red-site.xml until hadoop.1.x

è This is also called MapReduce 1 Architecure.

è From Hadoop 2.x They Introduced YARN Architecture or MapReduce 2 Architecture.

è In This They release ResourceManager instead of JobTracker.

NodeManager instead of TaskTracker. So additionally we have yarn-site.xml

è In  mpared-site.xml we will specify MapReduce Architecture.

 

$nano mapred-site.xml

<configuration>

    <property>

            <name>mapreduce.framework.name</name>

            <value>yarn</value>

    </property>

</configuration>

 

yarn-site.xml

è We will provide YARN Daemons information

$nano yarn-site.xml

<configuration>

    <property>

            <name>yarn.acl.enable</name>

            <value>0</value>

    </property>

    <property>

            <name>yarn.resourcemanager.hostname</name>

            <value>localhost</value>

    </property>

 

    <property>

            <name>yarn.nodemanager.aux-services</name>

            <value>mapreduce_shuffle</value>

    </property>

</configuration>

 

 

 

 

 

 

Format NameNode:-

$hdfs namenode -format

 

Note:- Once Post-Installation is Completed then we can say pSudo Distribution Mode is completed.