Simplify large-scale server shutdowns
This month's script, GroupedShutdowns, is dedicated to those hard-working systems administrators who've had to manually reboot all their servers in the middle of the night. Manually accessing the console on numerous servers can take time and cause mistakes. Even writing a script that performs the shutdown and restart via your favorite command-line tool can fail if nodes don't respond to the Shutdown command. A properly designed script can come to the rescue by making this repetitive task easier and the results accurate and verifiable.
The GroupedShutdowns script can help you manage a sequenced shutdown and restart of many nodes. You can use any of several available command-line tools to perform remote shutdowns and restarts. However, issuing shutdown/restart commands to your entire server environment simultaneously can cause some obvious problems. I've found that a two-tiered approach works best. Such an approach lets you shut down and restart clustered or load-balanced servers sequentially, so that resources remain online on at least one node throughout the reboot process. This method lets you reboot file or database servers first, then boot Web servers, which are dependent on file-server virtual directories or databases.
How GroupedShutdowns Works
GroupedShutdowns' main tasks are to shut down and restart all the servers in an input list. The shutdowns occur in two groups; the first group of servers must all be back online before the second group reboots. After all the servers are online, the script verifies that the Shutdown command has worked.
The excerpt from Grouped-Shutdowns in Listing 1 shows the essential parts of the script. The script performs these basic steps:
- Divide the input list into two groups of reboot targets.
- Verify the input list reboot targets by checking whether each server in the list can be pinged, as the code at callout A in Listing 1 does.
- Shut down the first group of servers by issuing them the Shutdown command. Pause the script for 120 seconds to let users save their work. The code at callout D invokes the Shutdown command to perform the shutdown. (I discuss Shutdown and the other commands that GroupedShutdowns uses in the next section.)
- After a 4-minute timeout (this value is configurable), begin testing whether the first group's nodes are online again. If the nodes are online, proceed to the next group of servers. If any node remains offline after the initial 4-minute timeout, the script performs four additional 1-minute tests. After these tests, if any nodes still remain offline, the script notifies the operator and aborts the run.
- Shut down the second group of servers by issuing them the Shutdown command. After a 4-minute timeout (this value is configurable), begin testing whether the first group's nodes are online again. If the nodes are online, proceed to the next group of servers as in Step 4. If any node remains offline after the initial 4-minute timeout, the script performs four additional 1-minute tests and then a Ping test. If any nodes remain offline after these tests complete, the script notifies the operator and aborts the run.
- Perform a final verification that all nodes in the two groups responded to the Shutdown command by logging the uptime on each node, as the code at callout E does.
Notice that GroupedShutdowns performs a couple of repetitive tasks: It pings some or all the nodes at three different times and issues the Shutdown command once for each of the two server groups. The script also needs to abort to let the operator solve a problem, then restart where it left off. Say a pinged node hasn't returned online and the systems administrator has to stop the script and manually reboot the node. After the node is online again, you need a mechanism to restart the script from that point. The challenges of repeating tasks within a script and restarting a script at a particular point provide an opportunity to introduce two handy scripting techniques—multiple-usage routines and entry-point labels—which I explain in the sidebar "New Tricks for Your Scripting Tool Belt."
Tools That GroupedShutdowns Uses
GroupedShutdowns calls three utilities: Sleep (sleep.exe), Uptime (uptime.exe), and Shutdown (shutdown.exe). Sleep, which has been around since the Microsoft Windows NT Server 4.0 Resource Kit, lets you schedule a program to wait for a specified number of seconds. GroupedShutdowns calls sleep.exe to pause the script for 4 minutes. Sleep is available in the Microsoft Windows Server 2003 Resource Kit, which you can download at http://www.microsoft.com/downloads/details.aspx?familyid=9d467a69-57ff-4ae7-96ee-b18c4790cffd&displaylang=en.
Uptime also dates back to the NT 4.0 resource kit. Uptime isn't included in recent resource kits, but you can download it at http://www.microsoft.com/ntserver/nts/downloads/management/uptime. Uptime lets you determine when a server was last rebooted. GroupedShutdowns calls uptime.exe to determine whether the targeted servers actually rebooted from the remote shutdown/restart command. By examining the server uptime information in the uptime.txt log file, you can verify that the servers all rebooted correctly.
Shutdown, which lets you shut down and reboot a PC, is a built-in command in Windows 2003 and Windows 2000 Server. To view the command's switch options, at a command line enter
Be careful not to omit the /? switch; if you do, the command will initiate a shutdown of your PC or server. If you initiate an unintended shutdown and want to stop it, you can abort the shutdown by running the command
I tested the GroupedShutdowns script on a Windows XP Service Pack 2 (SP2) PC. Consequently, Grouped-Shutdowns uses the XP version of shutdown.exe, which differs slightly from the Windows 2003 version. Be sure you examine the nuances of the switches on the version of shutdown.exe you're using to determine the correct syntax for your situation. The major difference you'll find is that some versions precede switches with a dash (-), whereas others precede switches with a slash (/).
For those who prefer a third-party shutdown tool, Sysinternals offers PsShutdown (http://www.sysinternals.com/utilities/psshutdown.html), which provides several additional switch options. You can substitute PsShutdown for shutdown.exe in the GroupedShutdowns script. Depending on what tools you have installed on a PC and how they were installed, you could have more than one version of a tool on your testing PC. If you specify a full path to the tool's location (e.g., C:\resourcekit\shutdown.exe), you can avoid mistakenly running a version of shutdown.exe that has a different syntax. However, if you specify only the filename at a command line or in your scripts and use the shutdown.exe version located first in the path, you might find that you're running a different version of the command than you thought. If your PC has more than one version of shutdown.exe, as my PC did, the shutdown.exe command could fail if you specify switch options that the other version doesn't support. If you aren't sure whether your PC has duplicate versions of a tool, use the Windows 2000 Server Resource Kit Where utility (where.exe) to find all the versions. This sample Where command locates the shutdown.exe file in the PATH variable:
This script assumes that sleep.exe, uptime.exe, and shutdown.exe are in the path on your testing PC. If they aren't in the path, you'll need to specify their location in the script code. To get the GroupedShutdowns script working in your environment, follow these steps:
- Download the script from the Windows Scripting Solutions Web site. Go to http://www.windowsitpro.com/windowsscripting, enter 48416 in the InstantDoc ID text box, then click the 48416.zip hotlink.
- Configure the reboot-target input-list location. You can include spaces in this pathname. The input file should list one server per line. On each line in the input file, include the group number (1 or 2), a comma, and the server name, as you can see in the sample input file that Figure 1 shows.
- Configure the location of the log file that will contain the results of the final uptime test. You can include spaces in this pathname.
- Configure the number of seconds for the timeout duration after shutdown and before the Ping test begins. A good starting setting is 240 seconds. The point of using a timeout is to give the server hardware time to actually shut down. Remember that the server doesn't actually shut down until 120 seconds after the Shutdown command is issued. If the timeout is too small, the server might not shut down in time and could still respond to the Ping test, giving you the false impression the server is back online when it's actually still shutting down. The server's Ping response could cause the script to start rebooting the second group of servers before the first group is actually back online. You might need to increase the timeout value if you have applications or services-that take a long time to either allow the server to shut down or to return online.
- Customize the shutdown code if you need a different message or arguments or if you need to configure a shutdown reason.
Making Shutdowns More Fun
As you've seen, the GroupedShutdowns script makes the process of shutting down and restarting multiple servers much easier to manage and, as an added benefit, gives you two new techniques for your scripting toolkit. Start using the script, and you'll soon find yourself spending less time on multiple-server shutdowns and getting the job done with greater accuracy.
Dick Lewis (firstname.lastname@example.org) is a senior systems engineer with Lewis Technology in Riverside, California. He is an MCSE and an MCT specializing in enterprise management of Windows Server 2003, Windows 2000, and Windows NT servers and workstations.