Tuesday, July 3, 2012

Solaris OS went down

Hi, Sometime back one of my solaris machine which was running AAA application went down. We could login to ALOM and boot it from there. What are the files I need to check now for any errors that pushed OS to go down. Thanks for your help!




Do you have Sun support? If so, open a case and they will tell you what they need to analyze why it went down. 

Basically I would start by looking at /var/adm/messages see if there are any error, right before you see the system booting back up. Check and see if there are any files in /var/crash/<system name> 

If there is a crash dump, you can run adm against the core and it will tell you what caused the system to crash.
If you have the alom set up, you can also look at consolehistory from there. 

/var/adm/messages is also a good place to look. 

If you do have a crash file, the tools to look at it are adb, mdb or scat. 
scat is the easiest to use but it's not included with the OS. It's available 
here: 
Brina, thanks for the correct. I wrote adm and meant adb, guess I should not try to type here and two other places at once.
Your best bet would ALOM logs and try dmesg if the server still up
See if you got a coredump, by default in /var/crash/"hostname"
When a solaris machine goes down, look at: 

1. the physical box. Bad Drives, faceplate error codes, power supply error light. 
2. Check the BACK of the box for electrical smells, or missing/snagged power cables. 
3. Boot the box on a Solaris CD or Emergency Disk, mount the boot drive, save the system logs. 
4. Note the memory, the visible installed cards, PRTDIAG and PRTCONF, and run a memory test. 
5. Specifically check your boot drive for available space, and it's MIRROR, if it has one. 
6. Check access to each drive, and check DMESG to find any current ring 1 errors. 
7. Check the SAN or NAS drive config files for the correct parameters and availability. 
8. Boot on the original boot drive, check the app system logs, /var/adm/messages 
LOM : 
Run "show environment" check any thing failed 
Run "Showlogs" at for any errors. 

OS : 
Check /var/adm/messages file and search for SunOS, Refer before this line. 
prtconf -vp | grep reboot 
prtdiag -v check any thing failed 

If there is panic info at /var/adm/messages, then core dump may generated at /var/crash/ contact sun micro for root cause.

0 comments:

Post a Comment

 
Design by BABU | Dedicated to grandfather | welcome to BABU-UNIX-FORUM