Manually reset SUN X4100 LED lights

Moderator: cah

Post Reply
cah
General of the Army / Fleet Admiral / General of the Air Force
General of the Army / Fleet Admiral / General of the Air Force
Posts: 1342
Joined: Sun Aug 17, 2008 5:05 am

Manually reset SUN X4100 LED lights

Post by cah »

SUN Intel-based servers come with a maintenance tool - "ipmitool - utility for controlling IPMI-enabled devices" that provides lots of information for system maintenance and troubleshooting.

Unlike SPARK platform that has "prtdiag" which provides useful information for troubleshooting, X86/X64 Intel based servers do have prtdiag but it provides virtually useless information.

Code: Select all

%/usr/platform/i86pc/sbin/prtdiag -v
System Configuration: Sun Microsystems Sun Fire X4100 M2
BIOS Configuration: American Megatrends Inc. 0ABJX102 11/03/2008
BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style)

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
Dual-Core AMD Opteron(tm) Processor 2210 CPU 1
Dual-Core AMD Opteron(tm) Processor 2210 CPU 2

==== Memory Device Sockets ================================

Type    Status Set Device Locator      Bank Locator
------- ------ --- ------------------- --------------------
unknown empty  0   DIMM0                NODE0
unknown empty  0   DIMM1                NODE0
DDR2    in use 0   DIMM2                NODE0
DDR2    in use 0   DIMM3                NODE0
unknown empty  0   DIMM0                NODE1
unknown empty  0   DIMM1                NODE1
DDR2    in use 0   DIMM2                NODE1
DDR2    in use 0   DIMM3                NODE1

==== On-Board Devices =====================================
LSI serial-SCSI #1
Gigabit Ethernet #1
ATI Rage XL VGA

==== Upgradeable Slots ====================================

ID  Status    Type             Description
--- --------- ---------------- ----------------------------
1   available PCI Express      PCIExp SLOT0
2   available PCI Express      PCIExp SLOT1
3   available PCI-X            PCIX SLOT2
4   available PCI Express      PCIExp SLOT3
5   available PCI Express      PCIExp SLOT4
On the other hand, ipmitool provides some information to us:

Code: Select all

%/usr/sfw/bin/ipmitool chassis status
System Power         : on
Power Overload       : false
Power Interlock      : inactive
Main Power Fault     : true
Power Control Fault  : false
Power Restore Policy : always-off
Last Power Event     : 
Chassis Intrusion    : inactive
Front-Panel Lockout  : inactive
Drive Fault          : false
Cooling/Fan Fault    : false
Further, it allows admins to control devices and LEDs on the server.

Code: Select all

%/usr/sfw/bin/ipmitool sunoem led get all        
sys.psfail.led   | ON
sys.tempfail.led | OFF
sys.fanfail.led  | OFF
bp.power.led     | ON
bp.locate.led    | OFF
bp.alert.led     | ON
fp.power.led     | ON
fp.locate.led    | OFF
fp.alert.led     | ON
io.hdd0.led      | OFF
io.hdd1.led      | OFF
io.hdd2.led      | OFF
io.hdd3.led      | OFF
p0.led           | OFF
p0.d0.led        | OFF
p0.d1.led        | OFF
p0.d2.led        | OFF
p0.d3.led        | OFF
p1.led           | OFF
p1.d0.led        | OFF
p1.d1.led        | OFF
p1.d2.led        | OFF
p1.d3.led        | OFF
ft0.fm0.led      | OFF
ft0.fm1.led      | OFF
ft0.fm2.led      | OFF
ft1.fm0.led      | OFF
ft1.fm1.led      | OFF
ft1.fm2.led      | OFF
Following commands are used to manually turn LEDs off:

Code: Select all

%/usr/sfw/bin/ipmitool sunoem led set fp.alert.led off
fp.alert.led     | OFF
%/usr/sfw/bin/ipmitool sunoem led set bp.alert.led off
bp.alert.led     | OFF
%/usr/sfw/bin/ipmitool sunoem led set sys.psfail.led off
sys.psfail.led   | OFF
%/usr/sfw/bin/ipmitool sunoem led get all    
sys.psfail.led   | OFF
sys.tempfail.led | OFF
sys.fanfail.led  | OFF
bp.power.led     | ON
bp.locate.led    | OFF
bp.alert.led     | OFF
fp.power.led     | ON
fp.locate.led    | OFF
fp.alert.led     | OFF
io.hdd0.led      | OFF
io.hdd1.led      | OFF
io.hdd2.led      | OFF
io.hdd3.led      | OFF
p0.led           | OFF
p0.d0.led        | OFF
p0.d1.led        | OFF
p0.d2.led        | OFF
p0.d3.led        | OFF
p1.led           | OFF
p1.d0.led        | OFF
p1.d1.led        | OFF
p1.d2.led        | OFF
p1.d3.led        | OFF
ft0.fm0.led      | OFF
ft0.fm1.led      | OFF
ft0.fm2.led      | OFF
ft1.fm0.led      | OFF
ft1.fm1.led      | OFF
ft1.fm2.led      | OFF
The LEDs went off for sure. If problems arise again shortly, that may indicate real problems than glitches.
CAH, The Great
cah
General of the Army / Fleet Admiral / General of the Air Force
General of the Army / Fleet Admiral / General of the Air Force
Posts: 1342
Joined: Sun Aug 17, 2008 5:05 am

X4100 Rear PS amger LED

Post by cah »

While checking on SUN hardware health, I saw webint01 (X4100 M2) has the amber ligh on the "Rear PS" indicating issue on the power supply.

I provided the output from prtdiag -v, ipmitool chassis status, ipmitool sunoem led get all.
Terix came back asking for the following reports:

Code: Select all

From the ipmitool:
# ipmitool sel elist
# ipmitool fru print
# ipmitool sensor list
Also asked for output from ILOM but only 'show /SP/logs/event/list' works...

Code: Select all

show –o table –level all /SYS
show /SP/faultmgmt
show /SP/logs/event/list
They couldn't tell either.
They then decided to send the field engineer to replace the power supply to see if that would clear the LED.
CAH, The Great
cah
General of the Army / Fleet Admiral / General of the Air Force
General of the Army / Fleet Admiral / General of the Air Force
Posts: 1342
Joined: Sun Aug 17, 2008 5:05 am

X4100 Rear PS amger LED

Post by cah »

Bill Korlowicz from Terix came by and replaced and swapped power supplies but the LED remained lit.

I had to run the following ipmitool commands again to turn off the Rear PS light off:

Code: Select all

webint01:%/usr/sfw/bin/ipmitool sunoem led set fp.alert.led off
fp.alert.led     | OFF
webint01:%/usr/sfw/bin/ipmitool sunoem led set bp.alert.led off
bp.alert.led     | OFF
webint01:%/usr/sfw/bin/ipmitool sunoem led set sys.psfail.led off
sys.psfail.led   | OFF
webint01:%/usr/sfw/bin/ipmitool sunoem led get all  
sys.psfail.led   | OFF
sys.tempfail.led | OFF
sys.fanfail.led  | OFF
bp.power.led     | ON
bp.locate.led    | OFF
bp.alert.led     | OFF
fp.power.led     | ON
fp.locate.led    | OFF
fp.alert.led     | OFF
io.hdd0.led      | OFF
io.hdd1.led      | OFF
io.hdd2.led      | OFF
io.hdd3.led      | OFF
p0.led           | OFF
p0.d0.led        | OFF
p0.d1.led        | OFF
p0.d2.led        | OFF
p0.d3.led        | OFF
p1.led           | OFF
p1.d0.led        | OFF
p1.d1.led        | OFF
p1.d2.led        | OFF
p1.d3.led        | OFF
ft0.fm0.led      | OFF
ft0.fm1.led      | OFF
ft0.fm2.led      | OFF
ft1.fm0.led      | OFF
ft1.fm1.led      | OFF
ft1.fm2.led      | OFF
The light went off but we will have to wait and see if it comes back.

From last experience 11 months ago, it should not come back.
One thing we found out is ILOM's firmware version is old.
scmint01 has a newer firmware and I believe I upgraded it in 2008 per Sun's recommendation when it is still under the manufacturer's warranty.
CAH, The Great
cah
General of the Army / Fleet Admiral / General of the Air Force
General of the Army / Fleet Admiral / General of the Air Force
Posts: 1342
Joined: Sun Aug 17, 2008 5:05 am

SUN X4100 Read PS failure

Post by cah »

I went to data center and saw amber lights on webdev01 and webint01.
Further investigation found rear power supply failure on each server.

Code: Select all

webdev01%/usr/sfw/bin/ipmitool sunoem led get all | grep ON
sys.psfail.led   | ON
bp.power.led     | ON
fp.power.led     | ON
p0.led           | ON

Code: Select all

webint01%/usr/sfw/bin/ipmitool sunoem led get all | grep ON
sys.psfail.led   | ON
bp.power.led     | ON
fp.power.led     | ON
p1.led           | ON
Also confirmed from ILOM:

Code: Select all

S/N - 0745BD294E (Webdev01)

646    Mon Feb  4 11:41:57 2013  IPMI      Log       critical
       ID =  159 : 02/04/2013 : 11:41:57 : Processor : p0.fail : Predictive Failure Asserted
645    Mon Feb  4 11:41:54 2013  IPMI      Log       critical
       ID =  158 : 02/04/2013 : 11:41:54 : Voltage : p0.v_vtt : Upper Non-critical going high : reading 1.90 > threshold 1.00 Volts

Code: Select all

S/N - 0745BD2950 (Webint01)

513    Sat Jan 26 16:35:53 2013  IPMI      Log       critical
       ID =  140 : 01/26/2013 : 16:35:53 : Processor : p1.fail : Predictive Failure Asserted
512    Sat Jan 26 16:35:53 2013  IPMI      Log       critical
       ID =  13f : 01/26/2013 : 16:35:53 : Voltage : p1.v_vdd : Lower Non-critical going low  : reading 0.01 < threshold 1.00 Volts
CAH, The Great
cah
General of the Army / Fleet Admiral / General of the Air Force
General of the Army / Fleet Admiral / General of the Air Force
Posts: 1342
Joined: Sun Aug 17, 2008 5:05 am

SUN X4100 Read PS failure - II

Post by cah »

From TeRix, they believe it is the firmware that is misleading us and suggest us to upgrade the firmware:
Chang-An,

That is definitely an older firmware revision. The Release Notes for FW version ILOM 2.0.2.5 states that it resolves some of these voltage threshold errors and the newest version of the firmware is ILOM 2.0.2.41. I would recommend that you upgrade these servers to the latest revision of the ILOM firmware and see if that resolves these errors. I will leave the ticket opened for monitoring until we've confirmed the issue is resolved. Please keep us posted and let us know if you need any assistance.
Thanks,
---
Todd Alloway
Senior Field Support Engineer
Direct: 408.990.1354
Cell: 720.371.6443
talloway@ops.terix.com
Since it is not power supply, I tried turning off the LEDs manually again and it seems to work fine.

Code: Select all

webint01

/usr/sfw/bin/ipmitool sunoem led set p1.led off
/usr/sfw/bin/ipmitool sunoem led set sys.psfail.led off
/usr/sfw/bin/ipmitool sunoem led set fp.alert.led off (front panel alert LED)
/usr/sfw/bin/ipmitool sunoem led set bp.alert.led off (back panel alert LED)

Code: Select all

webdev01

/usr/sfw/bin/ipmitool sunoem led set p0.led off
/usr/sfw/bin/ipmitool sunoem led set sys.psfail.led off
/usr/sfw/bin/ipmitool sunoem led set fp.alert.led off (front panel alert LED)
/usr/sfw/bin/ipmitool sunoem led set bp.alert.led off (back panel alert LED)
Then, all amber lights are gone.
CAH, The Great
cah
General of the Army / Fleet Admiral / General of the Air Force
General of the Army / Fleet Admiral / General of the Air Force
Posts: 1342
Joined: Sun Aug 17, 2008 5:05 am

Manually reset SUN X4100 LED lights - II

Post by cah »

logdev01 still had the warning amber LED blinking after I manually set some LED OFF.
I checked again and found the blinking LED has "SLOW" as its status. That's why grepping "ON" didn't find them.

Setting them to OFF and the amber lights disappeared.
CAH, The Great
Post Reply