Oracle T3-1 SP fault
Posted: Wed May 25, 2016 2:13 pm
We got SP fault on T3-1 server.
The vendor, NCE, suggested the MB replacement.
After the first MB replacement, the server couldn't ever power up. The voltage is out of normal range. This was on Thursday (05/19/2016).
The second MB replacement occurred on Monday (05/23/2016). The server could power up but couldn't see the disks.
The third attempt was yesterday (05/24/2016). Same situation. Finally, NCE support figured out the volume needs to be activated first before probe-scsi-all can find the device. Then, boot device needs to have the right path and auto-boot needs to be set as true.
First, need to select the scsi.
Then, show-volumes command to determine the inactive volume:
Activate the volume using the volume number:
Probe all SCSI devices
Set up boot disk path
Set up auto-boot
Boot up the server
The server finally booted up with the right disk.
However, it shows the old faulty message again today!
The vendor, NCE, suggested the MB replacement.
After the first MB replacement, the server couldn't ever power up. The voltage is out of normal range. This was on Thursday (05/19/2016).
The second MB replacement occurred on Monday (05/23/2016). The server could power up but couldn't see the disks.
The third attempt was yesterday (05/24/2016). Same situation. Finally, NCE support figured out the volume needs to be activated first before probe-scsi-all can find the device. Then, boot device needs to have the right path and auto-boot needs to be set as true.
First, need to select the scsi.
Code: Select all
{0} ok select /pci@400/pci@1/pci@0/pci@4/scsi@0
Code: Select all
{0} ok show-volumes
Volume 0 Target 389 Type RAID1 (Mirroring)
Name root_volume WWID 0c7892682c764144
Optimal Enabled Inactive
2 Members 583983104 Blocks, 298 GB
Disk 1
Primary Optimal
Target 9 HITACHI H103030SCSUN300G A2A8 PhyNum 0
Disk 0
Secondary Optimal
Target a HITACHI H103030SCSUN300G A2A8 PhyNum 2
Code: Select all
ok 0 activate-volume
Code: Select all
{0} ok probe-scsi-all
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/storage@2
Unit 0 Removable Read Only device AMI Virtual CDROM 1.00
/pci@400/pci@2/pci@0/pci@4/scsi@0
FCode Version 1.00.62, MPT Version 2.00, Firmware Version 5.00.17.00
Target a
Unit 0 Removable Read Only device TSSTcorp CDDVDW TS-T633A SR00
SATA device PhyNum 6
/pci@400/pci@1/pci@0/pci@4/scsi@0
FCode Version 1.00.62, MPT Version 2.00, Firmware Version 5.00.17.00
Target 389 Volume 0
Unit 0 Disk LSI Logical Volume 3000 583983104 Blocks, 298 GB
VolumeDeviceName 3c7892682c764144 VolumeWWID 0c7892682c764144
Code: Select all
ok nvramrc=devalias root-volume /pci@400/pci@1/pci@0/pci@4/scsi@0/disk@389 (old OBP syntax)
{0} ok nvalias root-volume /pci@400/pci@1/pci@0/pci@4/scsi@0/disk@389 (new OBP syntax)
ok setenv boot-device root-volume
Code: Select all
ok setenv auto-boot? true
auto-boot? = true
Code: Select all
{0} ok boot
or
{0} ok boot root-volume
However, it shows the old faulty message again today!
Code: Select all
appzone01:/%fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Feb 26 2013 7e85742b-44d6-616a-ef3e-90c3471ee4d3 SUN4V-8002-US Critical
Host : appzone01
Platform : sun4v Chassis_id : 1047BDR269
Product_sn : 1047BDR269
Fault class : fault.sp.failed
Problem in : "/SYS/MB/SP" (hc://:product-id=sun4v:product-sn=1047BDR269:server-id=appzone01:chassis-id=1047BDR269/chassis=0/sp=0)
faulted but still in service
FRU : "/SYS/MB/SP" (hc://:product-id=sun4v:product-sn=1047BDR269:server-id=appzone01:chassis-id=1047BDR269/chassis=0/sp=0)
faulty
Description : The Service Processor failed.
Response : No automated response.
Impact : Some services such as Fault Diagnosis may be degraded as a
result.
Action : Use 'fmadm faulty' to provide a more detailed view of this event.
Please refer to the associated reference document at
http://sun.com/msg/SUN4V-8002-US for the latest service
procedures and policies regarding this diagnosis.