Customize this port scanner's XML output feature
In "Nmap," February 2007, InstantDoc ID 94848, we used the nmap port scanner to sweep the network to quickly find whether antivirus clients were installed and listening on their management port. Nmap performs the port-scanning job admirably. To get the most from nmap, let's now look at nmap's XML output feature and how custom Extensible Style Language Transformations (XSLT) program code lets you tailor the output exactly as you want it.
A Flexible Output Format
Command-line tools tend to offer more flexibility than their GUI counterparts simply because command-line tools were likely developed to be used with programs and scripts, which means you can customize these tools based on your own particular environment. Nmap is no exception--its robust command-line parameters let you fine-tune how you wish to run the tool. For example, you can choose your preferred output format from among the following formats: -oN (normal), -oX (XML), -oG (grepable), or -oS (scriptkiddie). (Yes, the last one is a joke--but it exists.)
The XML output is the most flexible format because it lets you define how the data looks. Alternatively, you can input the XML data into a variety of other programs (e.g., Microsoft Office Excel can import XML files). The preferred method of formatting XML data files is to use an XSLT file. XSLT files describe how an XML file should be rendered. The benefit of XSLT is that you can define the output once, then pass multiple XML files through the XSLT code to get repeatable and consistent output.
Diving in to XML
To use nmap to detect whether Symantec Antivirus is running on a host, then output the results to an XML file, type the command
nmap -p 2967 -oX avStatus.xml 192.168.0.8
This creates a file named avStatus.xml similar to what Listing 1 shows. XML is self-describing, meaning that tags similar to HTML tags enclose the actual content, and are named for the content data. Usually you can match a data value to its name by simply looking at the XML code. For example, data about the machine the command ran on (i.e., the host) is enclosed in XML tags in this manner:
The text myComputerName is called an element. XML also uses attributes that are defined within an element:
<address addr="192.168.0.8" addrtype="ipv4" />
In this example, the element address has two attributes, addr and addrtype, but doesn't actually contain element text data. The /> is shorthand for closing the element and is equivalent, in this case, to typing </address>. The result is a data structure that, unlike a traditional database schema, you can read and interpret without knowing much about how the data is stored.
In Listing 1, we can see right away near the top of the file the input parameters used to run the program: The scanner is nmap and the args attribute shows the command-line parameters. We can also see other useful information such as when the scan was run.
The most important pieces of the XML data are the scan results themselves. In Listing 1, you can see code defining a scan. Notice that there are nested tags between the <host> and </host> tags. This code defines a node. Nodes contain subsets of XML data including elements, attributes, and even other child nodes. For example, the IP address of the host is stored in the address element as an attribute named addr. We can see that the host actively responded, as the status element's state attribute value is up. In addition to elements and nodes, XML is hierarchical, with parent nodes containing child nodes: For example, the host element is actually also a node that contains child nodes named address and ports.
Continuing with our visual inspection of Listing 1, we see that the port node is TCP 2967 and that the state is open. This is what we were looking for! The open state means that the AV client is listening on the network. (Well, we assume it's the AV program, but it could potentially be another program sharing that same port, however rare that situation might be.)
From XML to XSLT
Let's tie together the nmap subnet scanning method with the XML output format to create a script that shows us a list of all computers in a subnet that aren't listening on the antivirus software management port. In other words, we want a list of computers that need antivirus software installed.
To get this list, we'll create a script "wrapper" that calls the nmap program and then pipes (using the | character) the output to a program named xsltproc, which transforms the nmap XML into output that we've customized by using an XSLT file. Xsltproc is built into Mac OS X and can be downloaded and configured for other UNIX variants as well as for Windows. Xsltproc lets you perform XML transformations from the command line, which means you can use Xsltproc in your command-shell scripts.
Figure 1 shows the script output transformed by the XSLT file. Don't be fooled by its minimalism: There's quite a bit going on behind the scenes. And while the example in this article is fairly simple, it's not hard to expand it to show all sorts of other data in a custom format that you define.
Here's the script harness that runs the nmap command and pipes the data to xsltproc:
echo "Symantec Antivirus not found on the following computers: "; nmap -oX - -p 2967 "$1" | xsltproc noav.xsl -
Xsltproc requires that we specify an XSL file; I've specified noav.xsl, which Listing 2 shows. The command is written to run in the bash shell on a UNIX or Mac OS X computer, but you can easily modify it to run on a Windows computer.
The command is straightforward, but I want to point out two things. First, notice the use of the variable $1, which takes the first command-line argument and passes it to the nmap program.
Second, notice the use of the two hyphens in the nmap command, following the -oX parameter and in the xsltproc command after noav.xsl. These hyphens are essential, as they tell the commands to use streamed data as a standard I/O (STDIO) source. Nmap usually requires a filename after the -oX command, which tells the command where to write the file. Inserting a hyphen tells nmap to send the output to the console, or to another command when the pipe (|) character is used. Similarly, the xsltproc command requires two inputs: the XSLT file and the XML file. By inserting the hyphen, we tell xsltproc to get its XML input file from the previous command piped to it, so we don't have to run a command, save the output to a file, and have the second command read the file.
The first step is to copy the script harness above into a file named findAntiVirus.sh. Next, copy the XSLT code from Listing 2 into a file named noav.xsl.
XSLT follows a hierarchical node and tag format similar to that of XML and HTML but with key differences. Whereas the XML node names are somewhat arbitrarily designed by the developer to best describe the data at hand, XSLT is a full-featured programming language that has established node and tag names and supports conditionals, loops, setting variables, and other programming syntax. XML is the data and XSLT is the program that manipulates the data. Xsltproc is the tool that takes the XML data and transforms the data into custom output defined by the XSLT program.
Each command in an XSLT program must be opened with <some-command> and closed with </some-command>. The first command in noav.xsl, <xsl:stylesheet>, tells the parser we're defining a style sheet, which is an element in the XSLT namespace.
Recall previously that I described XML as a hierarchical data set. In this style sheet, we're going to march up and down the XML data-set hierarchy, looking for specific data. We'll do this by using the <xsl:template> and <xsl:apply-templates> commands to match the nodes defined in Listing 2.
The first template statement:
instructs the program to find all of the nodes named nmaprun, which in our example in Listing 1 is the entire data set. Next, take that resultant data set and look for data matching the following criteria:
and apply the template address. The entire command is
<xsl:apply-templates select="host\[not(ports/port/state/@state='open')\]/address" />
Because address is the last node specified, it's the template that Xsltproc will look for. But the right and left brackets (\[ and \]) might throw you, so before we get to the address, let's deconstruct the select statement.
As I've said before, XML is hierarchical. If we simplified the nmap XML output in Listing 1, the data would look like this:
<nmaprun> <host> <status> <address> <hostnames /> <ports> <port> <state> </port> </ports> </host> </nmaprun>
What we are after are the addresses of the computers without antivirus software. To find those computers, we need to look at the data set found at
However, we want to find only the hosts that have the port state of not open. We can locate these hosts at
XSLT lets you independently crawl the tree for a conditional separate from the tree of the data you wish to return. This is evident in the command
This command says to select the nodes host/address, but only where the state attribute isn't equal to open. If you look really hard, you'll see that the data is somewhat forked, meaning that our conditional is going down /host/ports/port/state, but that we really want to return the data host/address. Also, notice the use of @state to denote an attribute. The node <status> contains the attribute named state, and looks like this:
<status state="up" />
You can access this attribute by referencing host/ports/port/status/@state.
Now that we've identified the address nodes with the state not equaling open, we call the command <xsl:apply-templates>. This tells Xsltproc to look for a template for <address>, which we see at the end of our XSL statement. Like the earlier <nmaprun> match command, the <xsl:template match="address"> command defines the steps for processing the address nodes.
In our simple example, we call the <xsl:value-of> command to simply print out the attribute named addr, which contains the IP address of the node. The command <xsl:text> lets us define text, such as carriage returns, to format the output to our taste.
Run the nmap script harness by entering the filename for the script and specifying the IP address or subnet that you wish to scan:
As I noted earlier, Figure 1 shows the result that we wanted: a simple listing of just the IP addresses of computers that don't have antivirus software correctly configured. What's great about XML/XSLT is that to change the output, you need only change the XSLT file.
This example shows how to extend a tool that supports XML output by creating your own XSLT file that you can use to define the output to anything you want. Search the Internet for XML, XSL, and XSLT, and you'll find many examples of just how much you can customize your output.