Snapshot on ILOM

Post by **cah** » Wed Jun 12, 2013 7:14 pm

T3-2 was down again this morning.
Mats texted me and I took a quick look.

It was again the system MB voltage fault like May 11's incidence.

I cleared the fault from ILOM and started orazone01 from ILOM.
After it came up, I boot it up from ok prompt.
I then used unmount script to unmount all ZFS file systems.
I then added zoneadm -z <ngz> boot commands in mount script to start all NGZs and mount all file systems afterwards and all went as expected.

I then put both scripts together and moved it to /etc/init.d/ngz and made a symbolic link in /.etc/rc3.d/S40ngz.
Hopefully, next boot will have this script kicked in automatically.

I then took some time to look through Event Logs from ILOM and I found:

Code: Select all

666 Fault Repair minor Wed Jun 12 09:06:57 2013 Component /SYS/MB repaired 
665 Fault Repair minor Wed Jun 12 09:06:57 2013 Fault fault.chassis.voltage.fail on component /SYS/MB cleared 
662 Fault Fault critical Tue Jun 11 21:57:00 2013 Fault detected at time = Tue Jun 11 21:57:00 2013. The suspect component: /SYS/MB has fault.chassis.voltage.fail with probability=100. Refer to http://www.sun.com/msg/SPT-8000-DH for details.
 
https://support.oracle.com/epmos/faces/DocumentDisplay?alias=EVENT%3ASPT-8000-DH&_afrLoop=5978266352605&_afrWindowMode=0&_adf.ctrl-state=yz47vwji1_4

The URL suggests to replace the faulty FRU soon so I contacted Terix again.
They asked for another explorer file and a snapshot from ILOM.

Here is what I did on ILOM to collect snapshot:

Code: Select all

-> set /SP/diag/snapshot dataset=full  
Set 'dataset' to 'full'

-> set /SP/diag/snapshot dump_uri=ftp://hsiaoc1:NrCa0402@10.0.33.4/ilom_snapshot
Collecting a "full" dataset may reset the host. Are you sure (y/n)? n
Command aborted.

->set /SP/diag/snapshot dataset=normal
Set 'dataset' to 'normal'

-> set /SP/diag/snapshot dump_uri=ftp://hsiaoc1:<PWD>@10.0.33.4/ilom_snapshot
Set 'dump_uri' to 'ftp://hsiaoc1:NrCa0402@10.0.33.4/ilom_snapshot'

Orazone01_10.0.33.5_2013-06-12T22-39-13.zip then appeared in /export/home/hsiaoc1/ilom_snapshot on orazone01(10.0.33.4).

I just sent over an email asking when we can stop the snapshot.