The past two years have been a personal journey for me in learning how to develop scripts that automate processes and produce reports. The journey has been one of discovering not only the typical stages of script development but also the environment in which a scriptwriter works. To develop a script, you typically go through three stages: data capture, data manipulation, and data presentation. Before I explain these stages, I'll take a lighthearted look at the environment that will result as you become adept at writing scripts.

The Life of a Scriptwriter
You face an inherent danger in being a scriptwriter. After you produce a few scripts that successfully capture and present data or automate processes, you are a marked person in your organization. Others will call on you to write not only more scripts but also increasingly difficult ones.

The process for defining a script's requirements is typically informal, making your task even more difficult. For example, at a meeting, your boss might formulate a script's requirements and scope and confidently tell a client or upper manager, "Sure, we can write a script that does that." Afterward, your boss will turn to you and ask, "We can write a script to do that—right?" Because of this process' informality, you can count on several events to occur:

  1. The script's requirements will change.
  2. The script's scope will expand dramatically.
  3. If your boss defines the requirements at an 8 a.m. meeting, No. 1 and No. 2 will occur before lunchtime.

After your boss has defined the requirements, you need to start writing code as quickly as possible before those requirements change. In anticipation of changes, you need to include comments in your scripts. Comments can help you recall what functions each statement is performing when you have to change the code later. Comments can also serve as screen debug output.

The Three Stages of Script Development
Writing a script involves three stages: data capture, data manipulation, and data presentation. The following looks at what each stage entails.

Data capture. The first stage of the process is to decide how you want to capture the data that the report is to include. You have many tools to choose from, including native Windows NT commands, Microsoft Windows NT Server 4.0 Resource Kit utilities, and third-party products. I suggest that you become familiar with the resource kit utilities, because they can provide many different types of tools for capturing data.

Data manipulation. The resource kit utilities' output is seldom usable without some manipulation. For example, the output might include data that you don't need or duplicate data. If you need to manipulate the data, you must decide what type of manipulation you need to perform. The types of data manipulation include recursion, repetition, and formatting.

Recursion is vertical manipulation (i.e., manipulation down the folder hierarchy). For example, a script that uses the DIR command to list files in a directory moves vertically down through the folder structure.

Repetition is horizontal manipulation (i.e., manipulation across the top-level folders). For example, a script that lists only the top-level folders needs to move horizontally across the folder structure. To achieve repetition in this case, you can repeatedly use the DIR command sideways across the folder structure.

The distinctions between recursion and repetition are sometimes subtle. But you need to control the flow through the folder structure, especially if you're automating complex tasks. Otherwise, your script might stop without executing all the operations or your script might execute operations out of the intended scope. A runaway script that uses the DIR command won't do any damage other than corrupt output data, but a runaway script executing the DEL command can lead to disaster.

Both recursion and repetition can be static or dynamic. In other words, the boss can give you a static list of top-level folders to build reports for or you can dynamically capture the list of each top-level folder every time the script runs.

Formatting is the manipulation of raw captured data by inclusion or filtering. Let's look at inclusion first. Suppose you write a script that generates a log file any time a file or application server goes offline. You want to use this information to prepare a report on server uptime for management. The information you send to the report file needs to include the server's name, time, date, and nature of failure.

As you begin to write scripts, you probably will have to return to the data capture stage and gather additional information. Until you become more experienced, consider adding these informational elements in the output:

  • Script's filename
  • Date and time of the script's execution
  • Username of the person who ran the script
  • Computer executing the script
  • Header and footer information

Often, the original requirements won't specify the inclusion of these elements, but people usually request them later. You can eliminate work later by capturing these elements at the outset.

You might need to filter your raw output if it includes information (e.g., characters, words, lines) that you don't need or need to replace. For example, if you ping a server to see whether it responds with the PING command, the result is several lines of information that isn't important. Only the first word on the fourth line is important. This word might be Reply, Request, or Destination. Reply means the response is positive, Request means the request timed out, and Destination means the host isn't reachable. If you're producing a report, you need to capture only this ping result by filtering out the noncritical data. Then, you need to replace the word Reply with Available, Request with Not Available, and Destination with Not Reachable. These manipulations will make the data presentation step easier.

Data presentation. Almost all changes in a scripting project's scope occur at this level. Initially, users just want to have the information and don't care about how you present it. However, after you produce the result, their response might be, "Great job! Now can you make it look presentable for the client?" or "This report will provide valuable information for employees. Can you put it into HTML so we can post it on the intranet?"

Now that you know the stages, I'll guide you through a simple real-world example. I'll show you how to create DirSizeReport.bat, a script that generates a report on folder sizes. DirSizeReport.bat uses NT shell scripting commands. (If you're unfamiliar with these native NT commands, see Tim Hill, Windows NT Shell Scripting, Macmillan Technical Publishing, 1998.) Although the commands I use are specific to scripts that generate reports, you can apply the scripting principles to any type of script.

A Real-World Example
Suppose you have a file server that's nearing its storage capacity. Your boss wants to determine usage trends for this server. He or she instructs you to create a report that identifies each top-level folder, the folder's size, and how many files it contains. Your boss also wants the report to include data about the entire directory's size. The report needs to be in simple text so that everyone can read it.

Because the report you need to produce is simple, you need only two procedure modules: one that provides dynamic recursion and another that provides dynamic repetition. You can think of a procedure module as a logical grouping of operations. If you develop a section of code that captures the time and a section that captures the date, you can combine these two sections into a procedure module because they are logically connected. Any time you're writing a script that requires data and time input, you can paste in that section of code—you don't need to waste time reinventing the wheel. Another advantage of using procedure modules is that modules make it easier for you to comment your code and for other people to understand and update your work.

As you write scripts, you'll build a library of reusable procedure modules. Eventually, writing a new script might be as easy as combining previously proven modules.

Data Capture
You decide to use the diruse.exe resource kit utility to capture data because it supplies information about how much disk space each folder in a directory uses. This utility has 13 switches:

  • The /* switch tells diruse.exe to use the top-level directories.
  • The /o switch tells diruse.exe not to check subdirectories for size overflow.
  • The /v switch displays progress reports while scanning subdirectories.
  • The /s switch includes subfolders in the output.
  • The /b switch displays disk usage in bytes (default).
  • The /k switch displays disk usage in kilobytes.
  • The /m switch displays disk usage in megabytes.
  • The /, switch displays the thousands separator in file sizes.
  • The /c switch tells diruse.exe to use compressed file size instead of the apparent file size.
  • The /q:# switch uses an exclamation point character (!) to mark folders that exceed the specified size (#).
  • The /a switch generates an alert if the folder exceeds the size that the /q:# switch specifies.
  • The /d switch displays only those folders that exceed the size that the /q:# switch specifies.
  • The /l switch logs oversized directories in the diruse.log file.

If you're unfamiliar with diruse.exe, try experimenting with all the switches. When you're deciding which switches to use, keep in mind how you will use the output. For example, using the /, switch can make it easier for you to read file sizes but can create havoc if you plan to store the output in a Comma Separated Values (.csv) file. For DirSizeReport.bat, you decide to use the utility's /s switch to give you dynamic recursion, the /m switch to display disk usage in megabytes, and the /, switch to put the thousands separator in the folder sizes. (If you're using the m/ switch to measure small-sized folders, you might experience incremental rounding errors.)

Data Manipulation
After you select the data capture method, you need to test it to see whether you need to manipulate the data. To test your switches, run the following command at the command prompt:

DIRUSE /s /m /, "e:\Internet Explorer 4.0">>%TEMP%\output.txt

The "e:\Internet Explorer 4.0" string specifies the directory path to check. I recommend that you pick a small test target, such as the Internet Explorer (IE) 4.0 folder, to see whether the initial results are positive. The double greater-than character (>>) redirects the output to the output.txt file in the TEMP folder.

Figure 1 shows the output from this initial test. You need to examine the output to see whether you get the data you need and whether you can eliminate any unnecessary data. As the highlighted lines in Figure 1 show, you can filter out the lines with a size of 0.00 and the line that contains duplicate information.

You can use the FINDSTR command to filter out these lines. This command, which has 15 switches, searches for strings in files. After trial and error with the FINDSTR command, you determine that the following command filters out the unwanted lines:

DIRUSE /s /m /, "e:\Internet Explorer 4.0"| FINDSTR /V "\<\[0\].\[0\]\[0\]" | FINDSTR /V "SUB-TOTAL">>%TEMP%\output.txt

In the FINDSTR /V "\<\[0\].\[0\]\[0\]" filter, you're using the /V switch to tell FINDSTR to display the lines that don't start with 0.00. In the FINDSTR /V "SUB-TOTAL" filter, you're using the /V switch to tell FINDSTR to display the lines that don't have the word SUB-TOTAL in them. After you filter out the highlighted lines, the result is the dynamic recursion module.

Although the diruse.exe utility's /s switch provides dynamic recursion (i.e., it captures data down the directory hierarchy), it doesn't provide dynamic repetition (i.e., it doesn't capture data across the directory hierarchy). Thus, you need to develop a dynamic repetition module by running the recursion module against each top-level folder. One way to list the top-level folder names is to use the DIR command, which lists files in a directory, with FINDSTR filters. This approach uses the code

DIR /O:N | FINDSTR "DIR" | FINDSTR /V "\[.\]" >>%TEMP%\output.txt

The /O:N switch tells DIR to sort the files alphabetically by filename. The FINDSTR filters tell DIR to list only those lines containing the string "DIR" (FINDSTR "DIR") but that don't have a period (.) in them (FINDSTR /V "\[.\]"). The >>%TEMP%\output.txt section redirects the output to the output.txt file.

Figure 2 contains the output for the filtered DIR command. Once again, you need to filter out unwanted data from the output in Figure 2. You can use the FOR command to parse a text file and extract sections of a line. Because explaining how to use this versatile command could fill a book, I'll discuss only the aspects of its use specific to the task at hand.

In your script, you use the FOR command

FOR /f "tokens=1,2,3,* delims=" %%I in ('DIR /O:N ^| FINDSTR "DIR" ^| FINDSTR /V "\[.\]"') DO ECHO %%L>>%TEMP%\output.txt

The file-parsing (/f) switch lets you use the FOR command to parse a file or the output of another command. In your case, you're parsing the DIR command output in Figure 2. Each line in the output has four sections or tokens. In the first line of the output, you see the line

10/16/97      09:35p      <DIR>      addins

The date, time, and <DIR> tokens are not of interest. (<DIR> specifies a directory and not a file.) You want to capture the fourth token, which contains the directory name. To capture this token, you specify "tokens=1,2,3,* delims=" in quotation marks. The asterisk in tokens=1,2,3,* specifies that you want to capture the rest of the line after the third token. You can also use the number 4 instead of an asterisk. However, if you use 4 and the directory name has any spaces, the command picks up the first word only. You use delims= to specify the delimiter that the command is searching for. The default delimiter is a space or a tab. You can also specify a character as the delimiter, in which case you need to locate that character and count out to the desired token from there. For example, if the same output is in a .csv file and looks like


you use delims=, to select the token containing addins.

The next part of the command—%%I—is the iterator variable. This variable serves as a placeholder for the captured information that you're going to look for in the output. You name the iterator variable using one letter and either one or two percent signs (%). The letter is case-sensitive, so you must be consistent with its usage throughout the script. If you run the command from the command prompt, you use only one percent sign. In a script, you must use two percent signs, or else the script will fail during runtime.

The material in parentheses after the iterator variable—in ('DIR /O:N ^| FINDSTR "DIR" ^| FINDSTR /V "\[.\]"')—identifies the output that the FOR command will parse. In this case, the output is the results of the filtered DIR command in Figure 2. (If you were using a static list to identify the directories that you want FOR to search, the material in parentheses might be a filename.) The use of single quotes specifies that the output is a command's results. The caret character (^) tells the FOR command to escape out any reserved characters, such as the pipe character (|) and the equals sign character (=). In other words, the FOR command won't see these reserved characters as illegal characters.

Because you started with the %%I variable, you have four parsed tokens available (i.e., %%I, %%J, %%K, and %%L). However, you want to display the results of only the fourth token, so you specify DO ECHO %%L. Although this example uses the ECHO command to display the results on screen, you can also redirect output to a file. You now have the dynamic repetition module.

With the recursion and repetition modules in hand, you can assemble them into your directory report creator. However, iterator variables (such as the %%L variable in the repetition module) are local to the command in which you use them, so if you want to use the information they capture in another command, you must set them to another variable. In your case, you need to set the %%L variable to the permanent Target_Directory variable, so that you can use the repetition information in the recursion module. You use the SET and CALL commands for this purpose:

FOR /f "tokens=1,2,3,* delims=" %%I in ('DIR /O:N ^| FINDSTR "DIR" ^| FINDSTR /V "\[.\]"') DO SET Target_Directory="%%L" & CALL :CREATE

The SET Target_Directory=%%L command sets the %%L variable's output to the Target_Directory variable. The CALL :CREATE command calls the CREATE procedure to write the directory report. The ampersand character (&) joins the two commands, which execute in sequence.

As part of the formatting, you add the script's filename, copyright information, development date, purpose, syntax, and debugging syntax as comments in the script. You also add the scriptwriter's name and email address.

Data Presentation
Because the script is for internal purposes only, you present the output in a text file. You also add some good housekeeping procedures and a few comments to make the script look impressive. The good housekeeping procedures you add include the following:

  • Giving the script a title that appears in the title bar.
  • Listing the author, date, and dependencies on other files or resource kit utilities.
  • Using code to make the script exit if it runs on a non-NT machine.
  • Using the SETLOCAL and ENDLOCAL commands to make any environment variables local to the script.
  • Deleting any temporary files the script uses to store information at the beginning and end of the script.
  • Providing Help interaction if you type help, /?, or ? as parameters.
  • Displaying an onscreen message that tells the user when the script has finished running.

The result is the DirSizeReport.bat script in Listing 1. Figure 3 on page 4 shows the output you receive from running this script.

Listing 1 contains numerous comments to help you understand what is occurring in the script. I recommend that you remove these comments before running the script to optimize its performance.

A Job Well Done, But...
You've skillfully captured, manipulated, and formatted your script for presentation. You run the script and hand the resulting report to your boss. Your boss is pleased, because the report will help him or her manage the company's file structure and disk-space usage. Smiling, you go to lunch.

After lunch, however, you get a call to come to the IS director's office. "I like the output, but the report is just too plain," the director says. "Can you spruce the report up by putting the information in a table, affixing a time and date stamp, and adding the IS logo? Can you also post the report on the corporate intranet?"

You walk down the hall a few minutes later, wondering if you really heard yourself say "Yes" to all those requests. Next month, I'll show you how to meet these new requirements.