Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Pre-requisites:

❗❗ Basic knowledge of Logical Volume Management (LVM)

Description:

Here, I have used 2 RHEL8 systems and configured one system as a MasterNode or NameNode and another system as a DataNode .

Step 1 : Attaching hard-disks to datanode storage

Add 2 hard disks to the system you have configured as a datanode

fdisk -l

Step 2 : Creation of physical volumes

A physical volume is any physical storage device, such as a Hard Disk Drive ( HDD ), Solid State Drive ( SSD ), or partition.

pvcreate /dev/sdb
pvcreate /dev/sdc

You can use the pvdisplay command to view the details of physical volumes created

Step 3: Creation of Volume Group

When we combine multiple physical volumes to create a single storage structure it is called a Volume Group

vgcreate  <vg_name> /dev/sdb /dev/sdc

You can use the vgdisplay command to view the details of volume groups created

Step 4: Creation of Logical Volume

Now, from the volume group created in the previous steps , we can create a logical volume according to the required storage

lvcreate --size <size> --name <lv_name> <vg_name>

Step 5 :

The next step is to format the partition. This step is necessary in order to create the inode table

mkfs.ext4 /dev/hadoop_vg/hadoop_lv1

Step 6: Creating a directory and mount

Create a directory

mkdir /data_dir
mount /dev/hadoop_vg/hadoop_lv1 /data_dir

Step 7 : Configuration of DataNode

Navigate to the folder /etc/hadoop

cd /etc/hadoop
hdfs-site.xml
core-site.xml

Step 8 :

Now, finally start the datanode services

hadoop-daemon.sh start datanode

Step 9 :

Meanwhile , in the NameNode , configure and start the services

hdfs-site.xml
core-site.xml

Step 10 :

Format the namenode using the following command

hadoop namenode -format

Step 11 :

Finally, start the namenode services

hadoop-daemon.sh start namenode
hadoop dfsadmin -report 

Increase/Decrease DataNode storage dynamically

To increase the partition size on the fly:

lvextend --size +10G /dev/hadoop_vg/hadoop_lv1
resize2fs <path_of_lv> <size>
lvreduce --size -5G /dev/hadoop_vg/hadoop_lv1