Posted by: kezhong | January 14, 2012

An easy way to find out which servers have been rebooted after black out

When I went to work on Thursday morning, my colleague on night duty said there was a power outage for 1 second which caused some of the servers to reboot. My manager wanted to know which of the servers rebooted, so that he could arrange to change the UPS in the future. In order to know which servers have rebooted, we can log on each server to check the up time. But if there are hundreds or thousands of servers that needs to be checked, it is very tedious and inefficient.

Fortunately, our servers had been installed snmp so that we could monitor the status by Nagios. Knowing the theory of snmp, I wrote a small script to check the uptime of each server.

#!/bin/bash

NET=192.168.1
COM=public

for i in `seq 2 254`
do
  echo -n “$NET.$i == “
  snmpget -v 1 -c $COM $NET.$i system.sysUpTime.0
done

Run this script to produce the below result:

192.168.1.2 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1601584) 4:26:55.84
192.168.1.3 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1720208093) 199 days, 2:21:20.93
192.168.1.4 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1600525) 4:26:45.25
192.168.1.5 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1600574) 4:26:45.74
192.168.1.6 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1096018961) 126 days, 20:29:49.61
192.168.1.7 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (689354819) 79 days, 18:52:28.19
192.168.1.8 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1648344556) 190 days, 18:44:05.56
192.168.1.9 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2357055895) 272 days, 19:22:38.95
192.168.1.10 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2357056246) 272 days, 19:22:42.46
192.168.1.11 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2356532375) 272 days, 17:55:23.75
192.168.1.12 == Timeout: No Response from 192.168.1.12.
192.168.1.13 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2357103041) 272 days, 19:30:30.41
192.168.1.14 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1752188152) 202 days, 19:11:21.52
192.168.1.15 == DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2357105237) 272 days, 19:30:52.37
… …

Improve the script to list the servers rebooted less 1 day

#!/bin/bash

NET=192.168.1
COM=public

for i in `seq 2 254`

do
  if snmpget -v 1 -c $COM $NET.$i system.sysUpTime.0 2>/dev/null|grep -v day > /dev/null
  then
    echo -n “$NET.$i == “
    snmpget -v 1 -c $COM $NET.$i system.sysUpTime.0 |grep -v day|awk ‘{print $5}’
  fi
done

Run again, the result is
192.168.1.2 == 4:50:23.73
192.168.1.4 == 4:50:13.28
192.168.1.5 == 4:50:13.69
192.168.1.18 == 4:49:47.37
192.168.1.19 == 4:50:09.00
192.168.1.22 == 4:49:38.64
192.168.1.28 == 4:49:47.55
192.168.1.31 == 4:50:53.22
192.168.1.42 == 4:50:00.35
192.168.1.62 == 4:51:41.12
192.168.1.80 == 4:50:31.65
192.168.1.110 == 4:50:39.55
192.168.1.132 == 4:51:21.66
192.168.1.143 == 4:51:27.97
192.168.1.150 == 4:50:41.75
192.168.1.166 == 4:51:32.90
192.168.1.171 == 4:50:49.12

About these ads

Responses

  1. Nice post!

    The loop..

    for i in `seq 2 254`

    …causes a subshell to be be created. This does the same thing without the subshell:

    for i in {2..254}

    But you can also use the C-style for loop:

    for ((i=2; i<=254; i++))


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

Join 46 other followers

%d bloggers like this: