Integrating LVM with Hadoop and providing Elasticity to DataNode Storage


❗❗ Basic knowledge of Logical Volume Management (LVM)

❗❗ Basic knowledge about Hadoop


Here, I have used 2 RHEL8 systems and configured one system as a MasterNode or NameNode and another system as a DataNode .

Let me describe how you can provide elasticity to DataNode storage , i.e, increase/decrease the storage capacity on the fly by using the concept of Logical Volume Management (LVM)

Step 1 : Attaching hard-disks to datanode storage

Add 2 hard disks to the system you have configured as a datanode

You can use the following command to view the hard-disks you have attached

Here, I have attached two hard disks /dev/sdb of size 10 GiB and /dev/sdc of size 20 GiB

Step 2 : Creation of physical volumes

A physical volume is any physical storage device, such as a Hard Disk Drive ( HDD ), Solid State Drive ( SSD ), or partition.

You can use the following command to create physical volume:

You can use the pvdisplay command to view the details of physical volumes created

Step 3: Creation of Volume Group

When we combine multiple physical volumes to create a single storage structure it is called a Volume Group

You can use the vgdisplay command to view the details of volume groups created

Step 4: Creation of Logical Volume

Now, from the volume group created in the previous steps , we can create a logical volume according to the required storage

Here I have created a logical volume named hadoop_lv1 of size 5GiB

Step 5 :

The next step is to format the partition. This step is necessary in order to create the inode table

Step 6: Creating a directory and mount

Create a directory

Mount the lvm created in the previous step onto the directory created

Step 7 : Configuration of DataNode

Navigate to the folder /etc/hadoop

This is the conf file : hdfs-site.xml in DataNode

This is the conf file : core-site.xml in DataNode

Step 8 :

Now, finally start the datanode services

Step 9 :

Meanwhile , in the NameNode , configure and start the services

This is the conf file : hdfs-site.xml in NameNode

This is the conf file : core-site.xml in NameNode

Step 10 :

Format the namenode using the following command

Step 11 :

Finally, start the namenode services

You can use the following command to check the details about the DataNodes and other storage details

You can see that the total storage capacity is 5GiB

Increase/Decrease DataNode storage dynamically

To increase the partition size on the fly:

We can use the lvextend command to increase the size of the datanode storage on the fly

Use the resize2fs command to re-format the file system without losing the older data stored

By running the hadoop dfsadmin -report command, you can verify that the total datanode storage size has been increased to 15GiB

Similarly, we can also reduce the datanode storage capacity on the fly using the lvreduce command

That’s all !!