Integrating LVM with Hadoop and providing Elasticity to DataNode Storage
Pre-requisites:
❗❗ Basic knowledge of Logical Volume Management (LVM)
❗❗ Basic knowledge about Hadoop
Description:
Here, I have used 2 RHEL8 systems and configured one system as a MasterNode or NameNode and another system as a DataNode .
Let me describe how you can provide elasticity to DataNode storage , i.e, increase/decrease the storage capacity on the fly by using the concept of Logical Volume Management (LVM)
Step 1 : Attaching hard-disks to datanode storage
Add 2 hard disks to the system you have configured as a datanode
You can use the following command to view the hard-disks you have attached
fdisk -l
Here, I have attached two hard disks /dev/sdb of size 10 GiB and /dev/sdc of size 20 GiB
Step 2 : Creation of physical volumes
A physical volume is any physical storage device, such as a Hard Disk Drive ( HDD ), Solid State Drive ( SSD ), or partition.
You can use the following command to create physical volume:
pvcreate /dev/sdb
pvcreate /dev/sdc
You can use the pvdisplay command to view the details of physical volumes created
Step 3: Creation of Volume Group
When we combine multiple physical volumes to create a single storage structure it is called a Volume Group
vgcreate <vg_name> /dev/sdb /dev/sdc
You can use the vgdisplay command to view the details of volume groups created
Step 4: Creation of Logical Volume
Now, from the volume group created in the previous steps , we can create a logical volume according to the required storage
lvcreate --size <size> --name <lv_name> <vg_name>
Here I have created a logical volume named hadoop_lv1 of size 5GiB
Step 5 :
The next step is to format the partition. This step is necessary in order to create the inode table
mkfs.ext4 /dev/hadoop_vg/hadoop_lv1
Step 6: Creating a directory and mount
Create a directory
mkdir /data_dir
Mount the lvm created in the previous step onto the directory created
mount /dev/hadoop_vg/hadoop_lv1 /data_dir
Step 7 : Configuration of DataNode
Navigate to the folder /etc/hadoop
cd /etc/hadoop
This is the conf file : hdfs-site.xml in DataNode
This is the conf file : core-site.xml in DataNode
Step 8 :
Now, finally start the datanode services
hadoop-daemon.sh start datanode
Step 9 :
Meanwhile , in the NameNode , configure and start the services
This is the conf file : hdfs-site.xml in NameNode
This is the conf file : core-site.xml in NameNode
Step 10 :
Format the namenode using the following command
hadoop namenode -format
Step 11 :
Finally, start the namenode services
hadoop-daemon.sh start namenode
You can use the following command to check the details about the DataNodes and other storage details
hadoop dfsadmin -report
You can see that the total storage capacity is 5GiB
✨Increase/Decrease DataNode storage dynamically✨
To increase the partition size on the fly:
We can use the lvextend command to increase the size of the datanode storage on the fly
lvextend --size +10G /dev/hadoop_vg/hadoop_lv1
Use the resize2fs command to re-format the file system without losing the older data stored
resize2fs <path_of_lv> <size>
By running the hadoop dfsadmin -report command, you can verify that the total datanode storage size has been increased to 15GiB
Similarly, we can also reduce the datanode storage capacity on the fly using the lvreduce command
lvreduce --size -5G /dev/hadoop_vg/hadoop_lv1
That’s all !!