What scrubbing is is basically an online operation (meaning you can be performing other tasks while it is happening) that goes block by block to read the data, and perform a consistency check to make sure that the data it reads is the same as the data stored on the other mirrored disk in the array. It does this using a checksum operation, and if the output is different, it will record an error and attempt to correct.
The problem is that this operation is only performed during the initial RAID creation and is never completed afterwards. Meaning your bits could be rotting away if you are not reading your data frequently and allowing the disk to perform corrections.
Sometimes on traditional RAID arrays this operation is called "patrol read".
Now, for non-RAID1's you can schedule and run these scrubbing operations via Storage Manager > Volumes > Manage. But this option is grayed out for RAID1.
However there is a way to run these tests from the commandline interface:
Find your disk you wish to scrub:
DS213> df -h Filesystem Size Used Available Use% Mounted on /dev/md0 2.3G 513.8M 1.7G 22% / /tmp 249.7M 664.0K 249.1M 0% /tmp /dev/vg1000/lv 912.5G 674.1G 238.3G 74% /volume1 /dev/sdq1 1.8T 1.3T 488.4G 74% /volumeUSB3/usbshare /dev/sdt1 916.9G 702.5G 214.3G 77% /volumeUSB2/usbshareIn my case I want to scrub /volume1 which is an lvm logical volume in the volume group vg1000.
I will then find out which physical disk is associated with this logical volume
DS213> pvdisplay --- Physical volume --- PV Name /dev/md2 VG Name vg1000 PV Size 927.00 GB / not usable 1.75 MB Allocatable yes (but full) PE Size (KByte) 4096 Total PE 237311 Free PE 0 Allocated PE 237311 PV UUID 1605qY-qT39-71lP-qWHk-3ww5-MqGJ-gYZ4SNSince there is only one pv attached to vg1000 this is pretty easy to determine that /dev/md2 is the physical disk I want to check.
Now I can follow the steps outlined at: http://boomkicker.wordpress.com/2013/02/14/scrub-synology-raid-disks/ to scrub this array
Start the scrub
echo check > /sys/block/md2/md/sync_action
Check to see if the data is mismatched (one of the raid 1 disks is different from the other)
cat /sys/block/md2/md/mismatch_cnt
Tell synology to attempt to correct these errors
echo repair > /sys/block/md2/md/sync_action
If the errors do not go away or cannot be repaired from this final command you may want to consider replacing your disks
cat /sys/block/md2/md/rd?/errors
Where rd# represents raw disk number.
You can also perform an offline fsck using the instructions on the following website: http://www.cyberciti.biz/faq/synology-complete-fsck-file-system-check-command/
RAID 1 does not employ parity and therefore, data scrubbing (parity checking), does not make sense in the case of RAID 1 volumes.
ReplyDeleteFor example, in RAID 5, let's say you have a drive failure so you replace the bad drive and start a rebuild. What is essentially happening during the rebuild is that the RAID controller reads corresponding blocks from each of the remaining hard drives in order to generate the data block that was stored on the failed hard drive. If during this process, it hits a URE (Unrecoverable Read Error - meaning that block of data cannot be read because the underlying sector failed for various reasons) then the RAID controller assumes that not 1 but 2 drives have failed and the rebuild stops - all data is lost.
With RAID 1, the RAID controller simply mirrors everything onto the second drive, including bad blocks so, in the worst case scenario) some data loss occurs
Synology Data scrubbing cannot detect bad data blocks as it does not perform a disk check but only a parity check - furthermore, on a RAID 1 system, data on a bad block, even if a bad block is identified by a disk check tool, cannot be regenerated and must be restored from a backup.
Currently with BTRFS filesystem, RAID scrubbing on RAID 1 should make sense.
ReplyDeleteThanks for offline fsck link!
ReplyDelete