Mats texted me and I took a quick look.
It was again the system MB voltage fault like May 11's incidence.
I cleared the fault from ILOM and started orazone01 from ILOM.
After it came up, I boot it up from ok prompt.
I then used unmount script to unmount all ZFS file systems.
I then added zoneadm -z <ngz> boot commands in mount script to start all NGZs and mount all file systems afterwards and all went as expected.
I then put both scripts together and moved it to /etc/init.d/ngz and made a symbolic link in /.etc/rc3.d/S40ngz.
Hopefully, next boot will have this script kicked in automatically.
I then took some time to look through Event Logs from ILOM and I found:
Code: Select all
666 Fault Repair minor Wed Jun 12 09:06:57 2013 Component /SYS/MB repaired
665 Fault Repair minor Wed Jun 12 09:06:57 2013 Fault fault.chassis.voltage.fail on component /SYS/MB cleared
662 Fault Fault critical Tue Jun 11 21:57:00 2013 Fault detected at time = Tue Jun 11 21:57:00 2013. The suspect component: /SYS/MB has fault.chassis.voltage.fail with probability=100. Refer to http://www.sun.com/msg/SPT-8000-DH for details.
https://support.oracle.com/epmos/faces/DocumentDisplay?alias=EVENT%3ASPT-8000-DH&_afrLoop=5978266352605&_afrWindowMode=0&_adf.ctrl-state=yz47vwji1_4
They asked for another explorer file and a snapshot from ILOM.
Here is what I did on ILOM to collect snapshot:
Code: Select all
-> set /SP/diag/snapshot dataset=full
Set 'dataset' to 'full'
-> set /SP/diag/snapshot dump_uri=ftp://hsiaoc1:NrCa0402@10.0.33.4/ilom_snapshot
Collecting a "full" dataset may reset the host. Are you sure (y/n)? n
Command aborted.
->set /SP/diag/snapshot dataset=normal
Set 'dataset' to 'normal'
-> set /SP/diag/snapshot dump_uri=ftp://hsiaoc1:<PWD>@10.0.33.4/ilom_snapshot
Set 'dump_uri' to 'ftp://hsiaoc1:NrCa0402@10.0.33.4/ilom_snapshot'
I just sent over an email asking when we can stop the snapshot.