SUN FIRE V210

Moderator: cah

Post Reply
cah
General of the Army / Fleet Admiral / General of the Air Force
General of the Army / Fleet Admiral / General of the Air Force
Posts: 1342
Joined: Sun Aug 17, 2008 5:05 am

SUN FIRE V210

Post by cah »

One of our servers (SUN FIRE V210) was having problems since last week.

It shut down itself and the ALOM was broken.
It went to a loop for half an hour and it magically came back to life (OS level).
It was then accessible.

I sent the error messages to the technical support and they determined it was the system borad problem.
/var/adm/messages wrote:

Code: Select all

Sep 25 14:26:03 ols-webtest01 rmclomv: [ID 292366 kern.crit] SC initiating hard host system shutdown due to fault at MB.T_ENC.
In the mean time, I was preparing a new boot disk for the server.
For some reasons, people before me set up a 146 GB HD by utilizing just 36 GB disk space and taking all the partitions. That means a waste of 100+ GB disk space!

I will write up how I make a new boot disk in the next post.

After the field engineer came in yesterday and replaced the system board, the system came up semi-working. The boot disk's mount points crossed between 2 internal disks and caused the "read-only" file system and the system logs could not be created and therefore the system did not come up all the way.

After fixing the issue, the server came up with either boot disk.
Everything looked good.

However, after 30 to 40 minutes, the system crashed again.
It appeared to be the same error:
console error wrote:

Code: Select all

Sun Fire V210/V240,Netra 240, No Keyboard                                         
Copyright 1998-2003 Sun Microsystems, Inc.  All rights reserved.                                                                
OpenBoot 4.11.4, 2048 MB memory instal                                    
Ethernet address 0:3:ba:42:45:43, Host ID: 83424543.                                                    

 


ok
SC Alert: DISK @ HDD0 has FAILED.

SC Alert: DISK @ HDD1 has FAILED.
boot
Boot device: disk1  File and args:
SunOS Release 5.8 Version Generic_117350-46 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
/
SC Alert: SC initiating hard host system shutdown due to fault at MB.T_ENC.

SC Alert: System poweron is disabled.

SC Alert: TEMP_SENSOR @ MB.T_ENC has exceeded low hard shutdown threshold.

SC Alert: Host system has shut down.

SC Alert: TEMP_SENSOR @ MB.T_ENC has exceeded low warning threshold.

SC Alert: System poweron is disabled.

SC Alert: TEMP_SENSOR @ MB.T_ENC has exceeded low hard shutdown threshold.
It would not boot any more no matter how long I let it rest.

I then communicated with technical support and they determined that it may be the sensor problem.
Where is the sensor?
It is on the front bezel, according to the technical support.

The field engineer came in again with the new front bezel (including the sensor, of course) but the plastic molds were broken. We decided to boot it up to see if it solved the issue. If it does, we will then order a new front bezel with the sensor later.

After booting up the server for just over 4 hours, it is still running (knock on wood).
Hopefully, this solved the problem for good.
CAH, The Great
Post Reply