In the beginning, c6d0s0 was showing write errors. I used 'zpool clear rpool' to clear the write errors. rpool was good for a couple of days.
Code: Select all
pool: rpool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0 in 1h55m with 0 errors on Wed Sep 10 02:04:21 2014
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c5d0s0 ONLINE 0 0 0
c6d0s0 FAULTED 0 242 0 too many errors
errors: No known data errors
Code: Select all
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scan: resilvered 207G in 1h39m with 0 errors on Wed Sep 10 15:05:52 2014
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5d0s0 ONLINE 0 488K 0
c6d0s0 ONLINE 0 0 0
errors: No known data errors
Code: Select all
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scan: resilvered 4.54G in 7h58m with 0 errors on Sun Sep 14 21:28:27 2014
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c5d0s0 FAULTED 0 0 0 corrupted data
c6d0s0 ONLINE 0 0 0
errors: No known data errors
I was no longer able to boot up from the latest BE (solaris-3)!!
Working BE:
Code: Select all
%beadm list
BE Active Mountpoint Space Policy Created
-- ------ ---------- ----- ------ -------
solaris - - 1.15G static 2012-03-25 22:47
solaris-1 - - 34.71M static 2012-06-22 18:32
solaris-2 - - 141.13M static 2012-11-21 18:47
solaris-3 NR / 28.40G static 2012-11-21 19:05
I took out the broken c5d0s0 and it sounded strangely. I then found one Seagate Baracuda 1 TB SATA drive from Amazon and ordered it.
It arrived on 09/18/2014 the day I returned from MSP.
I thought it would just work when I replace the drive but it still didn't work.
On 09/19/2014, I decided to detach c5d0s0 from the mirror and I was able to boot solaris-3 BE!!!
That means the faulted c5d0s0 was the culprit to hang the BE during boot up.
When I tried to attach c5d0s0 back to the rpool mirror, it complained c5d0s0 does not exist.
From 'format', I know for sure, the new Seagate Baracuda drive is at c5d0.
However, it just doesn't want to take it. I then tried c5d0p0 and it took it into mirror and resilvered it.0. c5d0 <ST1000DM-Z1D4MWF-0001 cyl 60797 alt 2 hd 255 sec 126>
/pci@0,0/pci-ide@8,1/ide@0/cmdk@0,0
1. c6d0 <WDC WD50- WD-WCASU404856-0001 cyl 60797 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@8,1/ide@1/cmdk@0,0
However, installgrub didn't work. It needs to have a raw device in slice (ex. c5d0s0).
Code: Select all
cahtoh02:/root%/sbin/installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c5d0p0
raw device must be a root slice (not s2)
Unable to gather device information for /dev/rdsk/c5d0p0
c5d0
Code: Select all
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 60796 931.46GB (60797/0/0) 1953407610
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 15.69MB (1/0/0) 32130
9 alternates wm 1 - 2 31.38MB (2/0/0) 64260
Code: Select all
Part Tag Flag Cylinders Size Blocks
0 root wm 1 - 60796 465.72GB (60796/0/0) 976687740
1 unassigned wu 0 0 (0/0/0) 0
2 backup wu 0 - 60798 465.74GB (60799/0/0) 976735935
3 unassigned wu 0 0 (0/0/0) 0
4 unassigned wu 0 0 (0/0/0) 0
5 unassigned wu 0 0 (0/0/0) 0
6 unassigned wu 0 0 (0/0/0) 0
7 unassigned wu 0 0 (0/0/0) 0
8 boot wu 0 - 0 7.84MB (1/0/0) 16065
9 unassigned wu 0 0 (0/0/0) 0
I then configured slice 0 to the following and labeled it.
Code: Select all
Part Tag Flag Cylinders Size Blocks
0 root wm 3 - 60752 930.74GB (60750/0/0) 1951897500
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 60796 931.46GB (60797/0/0) 1953407610
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 15.69MB (1/0/0) 32130
9 alternates wm 1 - 2 31.38MB (2/0/0) 64260
After this, I was able to attach c5d0s0 to rpool mirror!
Code: Select all
% zpool attach rpool c6d0s0 c5d0s0
Code: Select all
%zpool status
pool: rpool
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Sep 20 05:53:26 2014
24.1G scanned out of 207G at 35.1M/s, 1h28m to go
24.1G resilvered, 11.64% done
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c6d0s0 ONLINE 0 0 0
c5d0s0 ONLINE 0 0 0 (resilvering)
errors: No known data errors
Code: Select all
cahtoh02:/root%/sbin/installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c5d0s0
stage2 written to partition 0, 282 sectors starting at 50 (abs 32180)
stage1 written to partition 0 sector 0 (abs 32130)
1. Shut down the system
2. Power up the machine
3. Press 'ESC'
4. Enter 'Boot' selection screen
5. Choose "ST1000DM-Z1D4MWF" (new Seagate 1 TB disk)
6. Select "solaris-3" BE if necessary
Everything is back to normal as of 09/20/2014.
Hopefully, it will be up for some time.