Since I was trying to get the OS up and running, I didn't get a chance to set up hardware RAID 1 then. Now that it's running for a few months, it is impossible for me to put the root disk into hardware raid, for it will destroy the data on disk. The only solution for me to get it redundant is to set up the software raid.
Here are the steps for me to get the server protected by setting up RAID 1 (mirroring) on my boot and root disks.
- Gather the partition information from your main disk /dev/sda.
NOTE: partition 1 is /boot and partition 2 is /root.
Code: Select all
/root%parted /dev/sda u s p Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sda: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 2048s 2099199s 2097152s primary xfs boot 2 2099200s 1953523711s 1951424512s primary lvm
Also, check current partitiono on sdb:Note: Both disks have the same sectors - 1953525168s.Code: Select all
/root%parted /dev/sdb u s p Model: ATA ST1000DM003-1CH1 (scsi) Disk /dev/sdb: 1953525168s Sector size (logical/physical): 512B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 32130s 1953503999s 1953471870s primary boot
- Using the start and end sectors, reproduce the partitioning scheme from the previous command on the new unused disk.
I figured I have to remove the existing partition before I can set it up properly.
Code: Select all
/root%parted /dev/sdb mklabel msdos --> partition table is already msdos, this step can be skipped /root%parted /dev/sdb mkpart primary 2048s 2099199s Warning: You requested a partition from 1049kB to 1075MB (sectors 2048..2099199). The closest location we can manage is 1049kB to 16.5MB (sectors 2048..32129). Is this still acceptable to you? Yes/No? No
Then, I was able to run the command successfully.Code: Select all
/root%parted /dev/sdb GNU Parted 3.1 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) help align-check TYPE N check partition N for TYPE(min|opt) alignment help [COMMAND] print general help, or help on COMMAND mklabel,mktable LABEL-TYPE create a new disklabel (partition table) mkpart PART-TYPE [FS-TYPE] START END make a partition name NUMBER NAME name partition NUMBER as NAME print [devices|free|list,all|NUMBER] display the partition table, available devices, free space, all found partitions, or a particular partition quit exit program rescue START END rescue a lost partition near START and END rm NUMBER delete partition NUMBER select DEVICE choose the device to edit disk_set FLAG STATE change the FLAG on selected device disk_toggle [FLAG] toggle the state of FLAG on selected device set NUMBER FLAG STATE change the FLAG on partition NUMBER toggle [NUMBER [FLAG]] toggle the state of FLAG on partition NUMBER unit UNIT set the default unit to UNIT version display the version number and copyright information of GNU Parted (parted) rm 1 (parted) u s p Model: ATA ST1000DM003-1CH1 (scsi) Disk /dev/sdb: 1953525168s Sector size (logical/physical): 512B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags (parted) q Information: You may need to update /etc/fstab.
A quick check:Code: Select all
/root%parted /dev/sdb mkpart primary 2048s 2099199s Information: You may need to update /etc/fstab.
Looking good. Move on to the next partition.Code: Select all
root%parted /dev/sdb u s p Model: ATA ST1000DM003-1CH1 (scsi) Disk /dev/sdb: 1953525168s Sector size (logical/physical): 512B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 2048s 2099199s 2097152s primary
Status check:Code: Select all
# parted /dev/sdb mkpart primary 2099200s 1953523711s
Code: Select all
/root%parted /dev/sdb u s p Model: ATA ST1000DM003-1CH1 (scsi) Disk /dev/sdb: 1953525168s Sector size (logical/physical): 512B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 2048s 2099199s 2097152s primary 2 2099200s 1953523711s 1951424512s primary
- Add the RAID flag on all partitions that will be mirrored.
Code: Select all
/root%parted /dev/sda set 1 raid on Information: You may need to update /etc/fstab. /root%parted /dev/sda set 2 raid on Information: You may need to update /etc/fstab. /root%parted /dev/sdb set 1 raid on Information: You may need to update /etc/fstab. /root%parted /dev/sdb set 2 raid on Information: You may need to update /etc/fstab.
- Create a degraded RAID device on the first partition of the new disk. This will be used for your boot partition (/boot). NOTE: Use --metadata=1.0 option to store /boot on this device, otherwise the bootloader will not be able to read the metadata.
NOTE: --level=1 mean it is creating RAID 1 (mirror) array.
Code: Select all
/root%mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/sdb1 --metadata=1.0 mdadm: array /dev/md0 started.
- Create file-system same as existing on /dev/sda1 on the new degraded RAID array /dev/md0. xfs is the default filesystem in Red Hat Enterprise Linux 7.
Code: Select all
/root%mkfs.xfs /dev/md0 meta-data=/dev/md0 isize=512 agcount=4, agsize=65532 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=0, sparse=0 data = bsize=4096 blocks=262128, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=1605, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
- Mount the new raid array and copy over the files from /boot.
Code: Select all
/root%mkdir /mnt/md0 /root%mount /dev/md0 /mnt/md0 /root%rsync -a /boot/ /mnt/md0/ /root%sync /root%umount /mnt/md0 /root%rmdir /mnt/md0
- Unmount the current /boot, and mount the new RAID volume there.
Code: Select all
/root%umount /boot /root%mount /dev/md0 /boot
- Add the old disk to the new array to complete the mirror.
Code: Select all
/root%mdadm /dev/md0 -a /dev/sda1 mdadm: added /dev/sda1
- Monitor the RAID status and wait for the recovery to complete.
Code: Select all
/root%mdadm -D /dev/md0 /dev/md0: Version : 1.0 Creation Time : Wed Aug 30 15:30:37 2017 Raid Level : raid1 Array Size : 1048512 (1023.94 MiB 1073.68 MB) Used Dev Size : 1048512 (1023.94 MiB 1073.68 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Wed Aug 30 15:34:15 2017 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : hsiao.net:0 (local to host hsiao.net) UUID : a15cbc22:c27202f8:31c08617:3def65df Events : 33 Number Major Minor RaidDevice State 2 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1
- Find UUID of /dev/md0 using blkid.
Code: Select all
/root%blkid |grep md0 /dev/md0: UUID="6767ec6d-5437-45ed-a04b-208aef0c4c55" TYPE="xfs"
- Update the /etc/fstab file with the new location for boot. (Add the new UUID line)
Code: Select all
/root%grep boot /etc/fstab #UUID=c46975ae-27fa-4785-b024-4ceec84f9f61 /boot xfs defaults 0 0 UUID=6767ec6d-5437-45ed-a04b-208aef0c4c55 /boot xfs defaults 0 0
- Create a degraded RAID device on the second partition of the new disk. This will be used for your LVM partition(/).
NOTE: metadata 1.2 reduced the size from 930.51 GB (238209 extents) to 930.39 GB (238178). Therefore, 'pvmove /dev/sda2 /dev/md1' failed due to Insufficient free space.
Code: Select all
/root%mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/sdb2 --metadata=1.2 mdadm: array /dev/md1 started.
Instead, I had to use metadata 1.0 to get the md1 to the correct size.NOTE 1: --level=1 mean it is creating RAID 1 (mirror) array.Code: Select all
/root%mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/sdb2 --metadata=1.0 mdadm: /dev/sdb2 appears to be part of a raid array: level=raid1 devices=2 ctime=Wed Aug 30 15:38:05 2017 Continue creating array? y mdadm: array /dev/md1 started.
NOTE 2: Default value for metadata is 1.2 for --create. Apparently, 1.0 is needed for me. - Add this new array to your LVM stack and add it to your existing volume group.
Code: Select all
/root%vgextend cl /dev/md1 Physical volume "/dev/md1" successfully created. Volume group "cl" successfully extended
- Move the physical extents from the old partition to the new array.
After using metadata 1.0, I was able to run pvmove command.
Code: Select all
/root%pvmove /dev/sda2 /dev/md1 Insufficient free space: 238209 extents needed, but only 238178 available Unable to allocate mirror extents for pvmove0. Failed to convert pvmove LV to mirrored
This will take some time to complete. Started around 13:15 08/30/2017. Estimated 0.01% per second or slower. It will take roughly 3 hours to complete the pvmove.Code: Select all
/root%pvmove /dev/sda2 /dev/md1 /dev/sda2: Moved: 0.00% /dev/sda2: Moved: 0.15% /dev/sda2: Moved: 0.30% /dev/sda2: Moved: 0.42% /dev/sda2: Moved: 0.56% /dev/sda2: Moved: 0.71% /dev/sda2: Moved: 0.86% /dev/sda2: Moved: 1.01% /dev/sda2: Moved: 1.16% /dev/sda2: Moved: 1.31% /dev/sda2: Moved: 1.46% /dev/sda2: Moved: 1.61% .....
- Remove the old partition from the volume group and LVM stack.
Code: Select all
/root%vgreduce cl /dev/sda2 Removed "/dev/sda2" from volume group "cl" /root%pvremove /dev/sda2 Labels on physical volume "/dev/sda2" successfully wiped.
- At this moment there are many known issues with lvmetad cache existing in RHEL7. So it's better to disable lvmetad cache.
Modify use_lvmetad as below in lvm.confStop lvm2-lvmetad service.Code: Select all
vi /etc/lvm/lvm.conf # Changed from 1 to 0 to disable lvmetad cache - CAH 08/30/2017 #use_lvmetad = 1 use_lvmetad = 0
Code: Select all
/boot%systemctl stop lvm2-lvmetad.service Warning: Stopping lvm2-lvmetad.service, but it can still be activated by: lvm2-lvmetad.socket /boot%systemctl disable lvm2-lvmetad.service
- Add the old partition to the degraded array to complete the mirror.
Check /dev/md1 status first:Add /dev/sda2 to arrayCode: Select all
/root%mdadm -D /dev/md1 /dev/md1: Version : 1.0 Creation Time : Wed Aug 30 16:13:10 2017 Raid Level : raid1 Array Size : 975712064 (930.51 GiB 999.13 GB) Used Dev Size : 975712064 (930.51 GiB 999.13 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed Aug 30 19:19:48 2017 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : hsiao.net:1 (local to host hsiao.net) UUID : 73c0656c:37edea8c:1fd11ddb:d2683a2e Events : 114 Number Major Minor RaidDevice State - 0 0 0 removed 1 8 18 1 active sync /dev/sdb2
Code: Select all
/root%mdadm /dev/md1 -a /dev/sda2 mdadm: added /dev/sda2
- Monitor the RAID status and wait for the recovery to complete.
This will take some time to rebuild. Again for 2 to 3 hours.
Code: Select all
/root%mdadm -D /dev/md1 /dev/md1: Version : 1.0 Creation Time : Wed Aug 30 16:13:10 2017 Raid Level : raid1 Array Size : 975712064 (930.51 GiB 999.13 GB) Used Dev Size : 975712064 (930.51 GiB 999.13 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed Aug 30 19:20:04 2017 State : clean, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 0% complete Name : hsiao.net:1 (local to host hsiao.net) UUID : 73c0656c:37edea8c:1fd11ddb:d2683a2e Events : 123 Number Major Minor RaidDevice State 2 8 2 0 spare rebuilding /dev/sda2 1 8 18 1 active sync /dev/sdb2
Here is the status after spare rebuilding is done:Code: Select all
/root%mdadm -D /dev/md1 /dev/md1: Version : 1.0 Creation Time : Wed Aug 30 16:13:10 2017 Raid Level : raid1 Array Size : 975712064 (930.51 GiB 999.13 GB) Used Dev Size : 975712064 (930.51 GiB 999.13 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Thu Aug 31 00:15:42 2017 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : hsiao.net:1 (local to host hsiao.net) UUID : 73c0656c:37edea8c:1fd11ddb:d2683a2e Events : 4429 Number Major Minor RaidDevice State 2 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2
- Scan mdadm metadata and append RAID information to /etc/mdadm.conf.
Code: Select all
/root%ls -l /etc/mdadm.conf ls: cannot access /etc/mdadm.conf: No such file or directory /root%mdadm --examine --scan >/etc/mdadm.conf /root%ls -l /etc/mdadm.conf -rw-r--r-- 1 root root 176 Aug 30 19:25 /etc/mdadm.conf /root%cat /etc/mdadm.conf ARRAY /dev/md/0 metadata=1.0 UUID=a15cbc22:c27202f8:31c08617:3def65df name=hsiao.net:0 ARRAY /dev/md/1 metadata=1.0 UUID=73c0656c:37edea8c:1fd11ddb:d2683a2e name=hsiao.net:1
- Update /etc/default/grub with MD device UUID. (info will be available after md0/md1 are created)
Code: Select all
/boot%mdadm -D /dev/md* |grep UUID UUID : a15cbc22:c27202f8:31c08617:3def65df UUID : 73c0656c:37edea8c:1fd11ddb:d2683a2e /boot%grep GRUB_CMDLINE_LINUX /etc/default/grub #GRUB_CMDLINE_LINUX="ipv6.disable=1 crashkernel=auto rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet" GRUB_CMDLINE_LINUX="rd.md.uuid=a15cbc22:c27202f8:31c08617:3def65df rd.md.uuid=73c0656c:37edea8c:1fd11ddb:d2683a2e ipv6.disable=1 crashkernel=auto rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet"
- Update grub2.cfg.
Code: Select all
/boot/grub2%cp -p grub.cfg grub.cfg.20170309 /boot/grub2%grub2-mkconfig -o /boot/grub2/grub.cfg Generating grub configuration file ... WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). /usr/sbin/grub2-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. Found linux image: /boot/vmlinuz-3.10.0-514.el7.x86_64 Found initrd image: /boot/initramfs-3.10.0-514.el7.x86_64.img /usr/sbin/grub2-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. /usr/sbin/grub2-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. /usr/sbin/grub2-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. Found linux image: /boot/vmlinuz-0-rescue-acac49b8c0944ac189fdf455a5e7d7c0 Found initrd image: /boot/initramfs-0-rescue-acac49b8c0944ac189fdf455a5e7d7c0.img WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). /usr/sbin/grub2-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. done
- Verify that both of your disks are listed in /boot/grub/device.map. Add them if needed.
Code: Select all
/boot/grub2%cat /boot/grub2/device.map # this device map was generated by anaconda (hd0) /dev/sda /boot/grub2%cp -p device.map device.map.20170220 /boot/grub2%vi device.map /boot/grub2%cat /boot/grub2/device.map # this device map was generated by anaconda (hd0) /dev/sda (hd1) /dev/sdb
- Re-install grub on both disk.
Kindly Note: Use --metadata=0.9 for boot device if below command fails with error -> cannot find /dev/md0 in /dev/sd* deviceCode: Select all
/boot/grub2%grub2-install /dev/sda Installing for i386-pc platform. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. Installation finished. No error reported. /boot/grub2%grub2-install /dev/sdb Installing for i386-pc platform. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. Installation finished. No error reported.
- Rebuild initramfs image with mdadmconf.
It is recommended you make a backup copy of the initramfs in case the new version has an unexpected problem:Code: Select all
/boot%cp -p initramfs-3.10.0-514.el7.x86_64.img initramfs-3.10.0-514.el7.x86_64.img.20170220 /boot%dracut -f --mdadmconf -rw------- 1 root root 31407076 Aug 30 19:45 initramfs-3.10.0-514.el7.x86_64.img -rw------- 1 root root 31446130 Feb 20 2017 initramfs-3.10.0-514.el7.x86_64.img.20170220
- Reboot the machine to make sure everything is correctly utilizing the new software RAID devices.
One may pull the original /dev/sda out and boot up to see if the /dev/md0 (still has /dev/sba in it) can boot up as expected. It should. After pulling out the original /dev/sda, status will show one missing and /dev/sdb1 becomes /dev/sda1:
The original disk will need to be added back to the raid (/dev/md0 and /dev/md1):Code: Select all
# mdadm -D /dev/md0 ..... /dev/md0: Version : 1.0 Creation Time : Thu Aug 17 18:10:36 2017 Raid Level : raid1 Array Size : 511936 (499.94 MiB 524.22 MB) Used Dev Size : 511936 (499.94 MiB 524.22 MB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Thu Aug 17 19:33:32 2017 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Consistency Policy : unknown Name : test.lab.msp.redhat.com:0 (local to host test.lab.msp.redhat.com) UUID : 78046e00:70413e71:4b0009b7:4ee5e2f9 Events : 50 Number Major Minor RaidDevice State - 0 0 0 removed 1 8 1 1 active sync /dev/sda1 .....
NOTE: Be careful, remember sdb1 is for /boot while sdb2 is for /.Code: Select all
# mdadm /dev/md0 -a /dev/sdb1 # mdadm /dev/md1 -a /dev/sdb2
Check the status of the raid rebuild:
When completed it should look like normal again:Code: Select all
<snip> [root@test ~]# mdadm -D /dev/md0 /dev/md0: Version : 1.0 Creation Time : Thu Aug 17 18:10:36 2017 Raid Level : raid1 Array Size : 511936 (499.94 MiB 524.22 MB) Used Dev Size : 511936 (499.94 MiB 524.22 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Thu Aug 17 19:35:11 2017 State : clean, degraded, resyncing (DELAYED) Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Consistency Policy : unknown Name : test.lab.msp.redhat.com:0 (local to host test.lab.msp.redhat.com) UUID : 78046e00:70413e71:4b0009b7:4ee5e2f9 Events : 52 Number Major Minor RaidDevice State 2 8 17 0 spare rebuilding /dev/sdb1 1 8 1 1 active sync /dev/sda1 </snip>
Code: Select all
<snip> /dev/md0: Version : 1.0 Creation Time : Thu Aug 17 18:10:36 2017 Raid Level : raid1 Array Size : 511936 (499.94 MiB 524.22 MB) Used Dev Size : 511936 (499.94 MiB 524.22 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Thu Aug 17 19:36:39 2017 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : unknown Name : test.lab.msp.redhat.com:0 (local to host test.lab.msp.redhat.com) UUID : 78046e00:70413e71:4b0009b7:4ee5e2f9 Events : 69 Number Major Minor RaidDevice State 2 8 17 0 active sync /dev/sdb1 1 8 1 1 active sync /dev/sda1 </snip>
To remove an existing RAID device, first deactivate it by running the following command as root:
Code: Select all
mdadm --stop <raid_device>
Code: Select all
mdadm --remove <raid_device>
Code: Select all
mdadm --zero-superblock <component_device…>
Code: Select all
~]# mdadm --detail /dev/md3 | tail -n 4
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
Code: Select all
~]# mdadm --stop /dev/md3
mdadm: stopped /dev/md3
Code: Select all
~]# mdadm --remove /dev/md3
Code: Select all
~]# mdadm --zero-superblock /dev/sda1 /dev/sdb1 /dev/sdc1