Learn Cloudera Hadoop Administration

CCA 131 Perform OS-level configuration for Hadoop installation

CCA 131 Perform OS-level configuration for Hadoop installation

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series

In the last post we have setup the local CDH repository. But before we can perform any installation or even before setting up the repository you must perform few OS-level configurations.


The OS-level configuration includes:

1.      Enabling NTP

2.      Configuring Network Names (hostnames/FQDNs)

3.      Disabling SELinux

4.      Disabling the Firewall


1. Enabling NTP

Step 1. To install NTP on CentOS/RHEL 7 systems:

# yum install ntp


Step 2. Edit the ntp configuration file /etc/ntp.conf and add the available NTP servers in your setup.

# vi /etc/ntp.conf





Step 3. Start the ntpd service/target and enable ot to start automatically on boot.

# systemctl start ntpd

# systemctl enable ntpd


Note: In CentOS/RHEL 7 ntpd is replaced by chronyd as the default network time protocol daemon. ntpd is still included in the yum repository for customers who need to run an NTP service. We can run both NTP and chrony together and I have included the NTP configuration as per exam objectives of CCA 131


2. Configuring Network Names (hostnames/FQDNs)

It is important that all the nodes in the CDH cluster should communicate with each other and their FQDNs are fully resolvable to their respective IPs.


Step 1. Add the IP address of each node of the CDH cluster and their respective FQDN in the file /etc/hosts:

# vi /etc/hosts   master.localdomain   node01.localdomain   node02.localdomain   node03.localdomain

Note: If you are using DNS, storing this information in /etc/hosts is not required, but it is good practice.


Step 2. Verify that you can get the FQDN of each node of the cluster using the command:

# hostname -f


3. Disabling SELinux

Step 1. To get the current status of the SELinux:

# sestatus

SELinux status:                 enabled

SELinuxfs mount:                /sys/fs/selinux

SELinux root directory:         /etc/selinux

Loaded policy name:             targeted

Current mode:                   enforcing

Mode from config file:          enforcing

Policy MLS status:              enabled

Policy deny_unknown status:     allowed

Max kernel policy version:      31


# getenforce



In my case, SELinux is enabled and is in enforcing mode as well. You can skip disabling SELinux if the mode is “permissive“.


Step 2. Edit the /etc/selinux/config file and change parameter value “SELINUX=enforcing” to “SELINUX=permissive”.

# vi /etc/sysconfig/selinux



Step 3. You can either reboot the system or execute the below command for the changes to take effect immediately.

# setenforce 0


I usually disable the SELinux using “SELINUX=disabled” and reboot the system after all the OS-level configuration is completed.


Step 4. Verify the status again:

# getenforce



4. Disabling the Firewall

Last but not least, disable the firewall on the system. The default firewall is CentOS/RHEL 6 is iptables whereas in CentOS/RHEL 7 we use firewalld.

For CentOS/RHEL 6

# chkconfig iptables off

# service iptables stop


For CentOS/RHEL 7

# systemctl disable firewalld

# systemctl stop firewalld


Other Recommended Settings

Along with the above 4 mentioned OS-level configurations, it is recommended to disable “transparent hugepage” and setting “vm.swappiness” to recommended value by Cloudera.


Verify if THP is enabled

Transparent Huge Pages (THP) are enabled by default in RHEL 6 for all applications. The kernel attempts to allocate hugepages whenever possible and any Linux process will receive 2MB pages if the mmap region is 2MB naturally aligned.


To verify if THP enabled or disabled:

# cat /sys/kernel/mm/transparent_hugepage/enabled

[always] madvise never

Note : Transparent Huge Pages cannot be enabled/disabled on a running machine and requires a reboot.


Disabling THP

Step 1. Add the “transparent_hugepage=never” kernel parameter option to the grub2 configuration file. Append or change the “transparent_hugepage=never” kernel parameter on the GRUB_CMDLINE_LINUX option in /etc/default/grub file.

# vi /etc/default/grub





GRUB_CMDLINE_LINUX="nomodeset crashkernel=auto rd.lvm.lv=vg_os/lv_root rd.lvm.lv=vg_os/lv_swap rhgb quiet transparent_hugepage=never"



Step 2. Rebuild the /boot/grub2/grub.cfg file by running the grub2-mkconfig -o command. Before rebuilding the GRUB2 configuration file, ensure to take a backup of the existing /boot/grub2/grub.cfg.

# grub2-mkconfig -o /boot/grub2/grub.cfg


Step 3. Reboot the system and verify option are in effect.

# shutdown -r now


Step 4. Verify the parameter is set correctly

# cat /proc/cmdline

BOOT_IMAGE=/vmlinuz-3.10.0-514.10.2.el7.x86_64 root=/dev/mapper/vg_os-lv_root ro nomodeset crashkernel=auto rd.lvm.lv=vg_os/lv_root rd.lvm.lv=vg_os/lv_swap rhgb quiet transparent_hugepage=never LANG=en_US.UTF-8

For more info on disablig THP, refer the below posts:


CentOS / RHEL 7 : How to disable Transparent Huge pages (THP)

CentOS / RHEL 6 : How to disable Transparent Huge pages (THP)


VM swappiness

Swappiness is a property for the Linux kernel that changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100, inclusive. A low value means the kernel will try to avoid swapping as much as possible where a higher value instead will make the kernel aggressively try to use swap space.


Step 1. Cloudera recommends to set the value of swapiness equal to or below 10. To view the current value of swapiness:

# grep vm.swappiness /usr/lib/tuned/virtual-guest/tuned.conf

vm.swappiness = 30


Step 2. Let’s set the value of vm.swapiness to 10.

# echo "vm.swappiness = 10" > /usr/lib/tuned/virtual-guest/tuned.conf


Step 3. Verify the value of vm.swapness again.

# grep vm.swappiness /usr/lib/tuned/virtual-guest/tuned.conf

vm.swappiness = 10