Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

5 min readMar 15, 2021

Pre-requisites:

❗❗ Basic knowledge of Logical Volume Management (LVM)

❗❗ Basic knowledge about Hadoop

Description:

Here, I have used 2 RHEL8 systems and configured one system as a MasterNode or NameNode and another system as a DataNode .

Let me describe how you can provide elasticity to DataNode storage , i.e, increase/decrease the storage capacity on the fly by using the concept of Logical Volume Management (LVM)

Step 1 : Attaching hard-disks to datanode storage

Add 2 hard disks to the system you have configured as a datanode

You can use the following command to view the hard-disks you have attached

fdisk -l

Here, I have attached two hard disks /dev/sdb of size 10 GiB and /dev/sdc of size 20 GiB

Step 2 : Creation of physical volumes

A physical volume is any physical storage device, such as a Hard Disk Drive ( HDD ), Solid State Drive ( SSD ), or partition.

You can use the following command to create physical volume:

pvcreate /dev/sdb
pvcreate /dev/sdc

You can use the pvdisplay command to view the details of physical volumes created

Step 3: Creation of Volume Group

When we combine multiple physical volumes to create a single storage structure it is called a Volume Group

vgcreate  <vg_name> /dev/sdb /dev/sdc

You can use the vgdisplay command to view the details of volume groups created

Step 4: Creation of Logical Volume

Now, from the volume group created in the previous steps , we can create a logical volume according to the required storage

lvcreate --size <size> --name <lv_name> <vg_name>

Here I have created a logical volume named hadoop_lv1 of size 5GiB

Step 5 :

The next step is to format the partition. This step is necessary in order to create the inode table

mkfs.ext4 /dev/hadoop_vg/hadoop_lv1

Step 6: Creating a directory and mount

Create a directory

mkdir /data_dir

Mount the lvm created in the previous step onto the directory created

mount /dev/hadoop_vg/hadoop_lv1 /data_dir

Step 7 : Configuration of DataNode

Navigate to the folder /etc/hadoop

cd /etc/hadoop

This is the conf file : hdfs-site.xml in DataNode

This is the conf file : core-site.xml in DataNode

Step 8 :

Now, finally start the datanode services

hadoop-daemon.sh start datanode

Step 9 :

Meanwhile , in the NameNode , configure and start the services

This is the conf file : hdfs-site.xml in NameNode

This is the conf file : core-site.xml in NameNode

Step 10 :

Format the namenode using the following command

hadoop namenode -format

Step 11 :

Finally, start the namenode services

hadoop-daemon.sh start namenode

You can use the following command to check the details about the DataNodes and other storage details

hadoop dfsadmin -report

You can see that the total storage capacity is 5GiB

✨Increase/Decrease DataNode storage dynamically✨

To increase the partition size on the fly:

We can use the lvextend command to increase the size of the datanode storage on the fly

lvextend --size +10G /dev/hadoop_vg/hadoop_lv1

Use the resize2fs command to re-format the file system without losing the older data stored

resize2fs <path_of_lv> <size>

By running the hadoop dfsadmin -report command, you can verify that the total datanode storage size has been increased to 15GiB

Similarly, we can also reduce the datanode storage capacity on the fly using the lvreduce command

lvreduce --size -5G /dev/hadoop_vg/hadoop_lv1

That’s all !!