Search This Blog

2014-08-09

Synology RAID Scrubbing

By default synology does not allow scrubbing for RAID1 volumes.

What scrubbing is is basically an online operation (meaning you can be performing other tasks while it is happening) that goes block by block to read the data, and perform a consistency check to make sure that the data it reads is the same as the data stored on the other mirrored disk in the array. It does this using a checksum operation, and if the output is different, it will record an error and attempt to correct.

The problem is that this operation is only performed during the initial RAID creation and is never completed afterwards. Meaning your bits could be rotting away if you are not reading your data frequently and allowing the disk to perform corrections.

Sometimes on traditional RAID arrays this operation is called "patrol read".

Now, for non-RAID1's you can schedule and run these scrubbing operations via Storage Manager > Volumes > Manage. But this option is grayed out for RAID1.

However there is a way to run these tests from the commandline interface:

Find your disk you wish to scrub:
DS213> df -h     
Filesystem                Size      Used Available Use% Mounted on
/dev/md0                  2.3G    513.8M      1.7G  22% /
/tmp                    249.7M    664.0K    249.1M   0% /tmp
/dev/vg1000/lv          912.5G    674.1G    238.3G  74% /volume1
/dev/sdq1                 1.8T      1.3T    488.4G  74% /volumeUSB3/usbshare
/dev/sdt1               916.9G    702.5G    214.3G  77% /volumeUSB2/usbshare
In my case I want to scrub /volume1 which is an lvm logical volume in the volume group vg1000.

I will then find out which physical disk is associated with this logical volume
DS213> pvdisplay
  --- Physical volume ---
  PV Name               /dev/md2
  VG Name               vg1000
  PV Size               927.00 GB / not usable 1.75 MB
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              237311
  Free PE               0
  Allocated PE          237311
  PV UUID               1605qY-qT39-71lP-qWHk-3ww5-MqGJ-gYZ4SN

Since there is only one pv attached to vg1000 this is pretty easy to determine that /dev/md2 is the physical disk I want to check.

Now I can follow the steps outlined at: http://boomkicker.wordpress.com/2013/02/14/scrub-synology-raid-disks/ to scrub this array

Start the scrub
echo check > /sys/block/md2/md/sync_action

Check to see if the data is mismatched (one of the raid 1 disks is different from the other)
cat /sys/block/md2/md/mismatch_cnt

Tell synology to attempt to correct these errors
echo repair > /sys/block/md2/md/sync_action

If the errors do not go away or cannot be repaired from this final command you may want to consider replacing your disks
cat /sys/block/md2/md/rd?/errors

Where rd# represents raw disk number.

You can also perform an offline fsck using the instructions on the following website: http://www.cyberciti.biz/faq/synology-complete-fsck-file-system-check-command/


1 comment:

  1. RAID 1 does not employ parity and therefore, data scrubbing (parity checking), does not make sense in the case of RAID 1 volumes.
    For example, in RAID 5, let's say you have a drive failure so you replace the bad drive and start a rebuild. What is essentially happening during the rebuild is that the RAID controller reads corresponding blocks from each of the remaining hard drives in order to generate the data block that was stored on the failed hard drive. If during this process, it hits a URE (Unrecoverable Read Error - meaning that block of data cannot be read because the underlying sector failed for various reasons) then the RAID controller assumes that not 1 but 2 drives have failed and the rebuild stops - all data is lost.
    With RAID 1, the RAID controller simply mirrors everything onto the second drive, including bad blocks so, in the worst case scenario) some data loss occurs
    Synology Data scrubbing cannot detect bad data blocks as it does not perform a disk check but only a parity check - furthermore, on a RAID 1 system, data on a bad block, even if a bad block is identified by a disk check tool, cannot be regenerated and must be restored from a backup.

    ReplyDelete