Configuring Hadoop and starting cluster services using Ansible Playbook

🔰The Apache Hadoop🔰

The Apache Hadoop software is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

🔰The Hadoop architecture:🔰

Let’s see how to write Ansible playbook to Configure Hadoop and start cluster services :

This is the ansible inventory file where we have defined a group of hosts as datanodes and the other group as masternode

Here I have used 2 systems as datanodes and 1 system as a masternode and mentioned their ip-addresses usernames and passwords in the inventory file

Inventory file

Configuration file of ansible: In the conf-file of ansible , specify the path of the inventory file

ansible.cfg file

Check the connectivity using the ping command:

ansible all -m ping

✔Copy module can be used to copy hadoop and java software from my local system to all the nodes

Install hadoop and java software

The file module can be used to create directory

blockinfile module will insert/update/remove a block of multi-line text surrounded by customizable marker lines.

We have used blockinfile module to configure the configuration files of hadoop , i.e , hdfs-site.xm and core-site.xml of the namenodes and datanodes

In case of namenode, use the command module and format the namenode

Finally , we can use the shell module to start the namenode services in the namenode and datanode services in all the datanodes

namenode.yml playbook:

datanode.yml playbook:

This is the variable file which includes the list of all variables we have used in our playbook

!! Now, let’s run the playbook !!

By running the jps command you can verify that the DataNode and the NameNode services are running

We have successfully created the cluster. The following command gives the complete information about the cluster

hadoop dfsadmin -report