ZFS checksum error

Post by **cah** » Fri May 25, 2012 4:03 pm

When I was checking zfs pool status today, I saw the following:

%zpool status
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
  scan: scrub repaired 47.5K in 1h3m with 0 errors on Sun May 20 04:03:55 2012
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c5d0s0  ONLINE       0     0     0
            c6d0s0  ONLINE       0     0     3

errors: No known data errors

I then checked the link http://www.sun.com/msg/ZFS-8000-9P for details.
It is hard to tell whether the disk is failing or this is a temporary thing.
So, I did a clear first.

Code: Select all

%zpool clear rpool c6d0s0

And that cleared the checksum error.

Code: Select all

%zpool status            
  pool: rpool
 state: ONLINE
  scan: scrub repaired 47.5K in 1h3m with 0 errors on Sun May 20 04:03:55 2012
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c5d0s0  ONLINE       0     0     0
            c6d0s0  ONLINE       0     0     0

errors: No known data errors

The following check also confirms it.

Code: Select all

%zpool status -x         
all pools are healthy

If this happens more often, I will be replacing the failing disk in the future.

I am writing a zfs status checking script that runs regularly. It will email me zpool status so that I know how often this happens.

Script:

Code: Select all

#!/bin/ksh

ZPOOL=/sbin/zpool
MAILX=/usr/bin/mailx

status=`$ZPOOL status -x`
if [ "$status" != "all pools are healthy" ]
then
  $MAILX -s "Check ZFS Pool Status" [recipient]< /dev/null
fi

Crontab:

Code: Select all

# ZFS status checking
0 4 * * * /export/home/cah/bin/script/zfs_check.sh