As a systems administrator, you often need to store and manage data. For example, you might want to store information about all the servers you manage—their names, the settings that apply to each of them, characteristics of users that can connect to the servers. Whatever the nature of the information, your goal should be to store and read it as effectively as possible.
For a long time, plaintext files were the only significant alternative to human memory. In the past couple of years, though, we've observed the incredible success of XML. In this column, I start a tour of XML's main features. In particular, I guide you through the forest of XML nodes and head toward what really is of interest to you as a systems administrator: the ability to properly manage structured and less-structured information.
What XML Is All About
XML files are, first of all, plaintext files—that is, the data in .xml files is in ASCII text format. XML is simply a metalanguage that defines rules for formatting words within any text file saved with an .xml extension. XML is also a markup language that offers tags that identify certain chunks of information that have a specific meaning.
Anyone who frequently works with text files must figure out an effective way to mark the beginning and end of different pieces of information in the text file and to communicate what a certain piece of information actually means. In an ASCII text file, one word or logical grouping of characters flows into the next without interruption. You resort to characters such as carriage returns or commas to break the string of information into meaningful pieces.
A typical text file might have the form
Of course, you can use the FileSystem-Object object and VBScript's Split() function to separate into array slots the pieces of information in this file—the file format is simple to parse. However, don't forget that you're using a personal, undocumented protocol when you store information in a file such as this. The file itself doesn't describe the information it contains, so if you don't work with it often or if you must make it accessible to others, you should create separate documentation to explain, for example, what each line contains and the role of the commas.
Compared with .txt files, .xml files are verbose—each field has a descriptive label. XML also offers some choices about how you format file content. Verbosity is the price you pay for the ability to identify and structure your information more effectively.
From the systems administrator's point of view, the key advantages of XML are the self-describing nature of its files and more flexibility in structuring information. The disadvantages are a slightly more complex syntax for content and the need for special libraries (i.e., XML parsers) to parse content.
In general, XML has many advantages over plaintext files, but unless you want to use specialized editors to cope with .xml files, you should keep the files as simple as possible. For systems administration, XML has definite, but limited, uses. With XML, in fact, the boundary between what's effective and what's just a burden is quite slim.
The simple text file above doesn't include any description of its content. To extract and label the file content, you would have to use code such as that in Listing 1. From the code output lines, you can see that the first token is a server name, the second is an IP address, and the third is a flag that indicates whether the connection is secure.
Listing 2 shows one way to rewrite the text file in XML. Servers1.xml has three levels of nodes. The root node, <servers>, contains all the other nodes; each XML document must have exactly one root node. The <server> node contains all the information available for the specified server. This server information is expressed in terms of the child nodes <name>, <address>, and <secure>.
As I've mentioned, the .xml file is significantly more verbose and harder to read at first glance. However, the file describes every piece of information it contains so that anyone can easily understand the file content.
Listing 3 shows an alternative XML format for the same text file: servers2.xml. This file is more compact than servers1.xml and looks more like the original text file. Instead of the three levels of nodes that servers1.xml uses, servers2.xml uses two levels: the <servers> root and its <server> children. Servers2.xml renders the information that servers1.xml defines in the <name>, <address>, and <secure> nodes as attributes of the <server> node. The two .xml files have no logical or functional differences—just different designs.
XML's flexibility means that the language has few certain rules. As long as your XML documents are well-formed (e.g., you close all opened tags, you don't overlap tags), you can use whichever format you like. Of course, the different XML formats require some differences in the code that you use to parse the files.
For file-readability purposes, you might want to employ child nodes for longer, more descriptive text and use attributes for short pieces of information. Compact and readable text files are a high priority for systems administrators. Don't forget that you're not using XML to be cool or trendy or to publish your files over the Internet. You're using XML to make yourself and your colleagues more comfortable working with shared files.
To extract the content of an .xml file, you need a special tool called a parser. In principle, nothing prevents you from writing a script that parses the XML file. But, as you might realize, parsing without a parser would become an awful mess in a matter of seconds.
Microsoft provides great tools for working with XML and parsing .xml file content to a standard object model called the Document Object Model (DOM). As I write this, Microsoft XML Parser (MSXML) 4.0 is available as a technology preview at http://msdn.microsoft.com/downloads/default.asp? url=/downloads/topic.asp?url=/msdn-files/028/000/072/topic.xml.
Listing 4 shows the code for extracting information from servers1.xml. The line at callout A in Listing 4 creates an instance of the XML parser object. The line at callout C loads the valid .xml file into the DOM object. The Load method returns a true value when the .xml file is successfully imported and transformed into a document object and a false value if something goes wrong. You might want to set the Async property to false, as the line at callout B does, before you load the XML file. The Load method works asynchronously by default. Setting Async to false stops the script from processing until the Load method returns a value so that you're sure the document is properly loaded when you start processing it.
The line at callout D accesses the loaded document's root node. Remember that below the <servers> root node, servers1.xml has a level of <server> nodes. To loop through these child nodes, you need code similar to the following:
' do something ...
The childNodes collection returns all the first-level nodes available under the root. Thus, the above loop can catch the <server> nodes under <servers> but not the various child nodes of <server>, such as <name> and <address>. To return these second-level nodes, you need a second, nested loop.
The code at callout E shows the two loops. The inner loop uses the childNodes collection of the current <server> node to create a multiline string with the name and value of each node. To find the name, or type, of a given node, you use the NodeName property. To extract the value (i.e., the text between the opening and closing tags), you use the Text property. Within the inner loop, you access the information in a sort of "content-safe" way (i.e., you know what kind of information you're manipulating because each piece of information is labeled).
The XML format in servers2.xml uses node attributes instead of some nodes, so you must use slightly different code to extract servers2.xml's information. As Listing 5 shows, the only difference lies in the inner loop, which walks through the content of the Attributes collection rather than the childNodes collection. Callout A in Listing 5 highlights the only line in Listing 5 that differs from Listing 4. You treat attributes like nodes—they have a NodeName property for their name and a Text property for their content.
The XML Trend
XML is a trendy choice today, but as a systems administrator, you can't afford to give in to fashion. You need to take a measured, controlled approach to using XML. Huge .xml files are hard to parse and are no more readable and helpful than the cryptic text files they purport to replace. However, when you keep .xml files compact and short, XML's self-describing nature can only help you. And remember that XML DOM is the best Windows tool available to programmatically extract information from XML documents.