It shut down itself and the ALOM was broken.
It went to a loop for half an hour and it magically came back to life (OS level).
It was then accessible.
I sent the error messages to the technical support and they determined it was the system borad problem.
In the mean time, I was preparing a new boot disk for the server./var/adm/messages wrote:Code: Select all
Sep 25 14:26:03 ols-webtest01 rmclomv: [ID 292366 kern.crit] SC initiating hard host system shutdown due to fault at MB.T_ENC.
For some reasons, people before me set up a 146 GB HD by utilizing just 36 GB disk space and taking all the partitions. That means a waste of 100+ GB disk space!
I will write up how I make a new boot disk in the next post.
After the field engineer came in yesterday and replaced the system board, the system came up semi-working. The boot disk's mount points crossed between 2 internal disks and caused the "read-only" file system and the system logs could not be created and therefore the system did not come up all the way.
After fixing the issue, the server came up with either boot disk.
Everything looked good.
However, after 30 to 40 minutes, the system crashed again.
It appeared to be the same error:
It would not boot any more no matter how long I let it rest.console error wrote:Code: Select all
Sun Fire V210/V240,Netra 240, No Keyboard Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.11.4, 2048 MB memory instal Ethernet address 0:3:ba:42:45:43, Host ID: 83424543. ok SC Alert: DISK @ HDD0 has FAILED. SC Alert: DISK @ HDD1 has FAILED. boot Boot device: disk1 File and args: SunOS Release 5.8 Version Generic_117350-46 64-bit Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved. / SC Alert: SC initiating hard host system shutdown due to fault at MB.T_ENC. SC Alert: System poweron is disabled. SC Alert: TEMP_SENSOR @ MB.T_ENC has exceeded low hard shutdown threshold. SC Alert: Host system has shut down. SC Alert: TEMP_SENSOR @ MB.T_ENC has exceeded low warning threshold. SC Alert: System poweron is disabled. SC Alert: TEMP_SENSOR @ MB.T_ENC has exceeded low hard shutdown threshold.
I then communicated with technical support and they determined that it may be the sensor problem.
Where is the sensor?
It is on the front bezel, according to the technical support.
The field engineer came in again with the new front bezel (including the sensor, of course) but the plastic molds were broken. We decided to boot it up to see if it solved the issue. If it does, we will then order a new front bezel with the sensor later.
After booting up the server for just over 4 hours, it is still running (knock on wood).
Hopefully, this solved the problem for good.