Over the last several weeks, I've been working extensively with comma-separated
value (CSV) files. One simple example is the component I described in "Gathering
File System Data" (October 2006, Instant-Doc 93451), FileDB.wsc, which saves
information about files to a CSV file. CSV files are plain-text database files
in which each line represents a record (row) that's divided into fields (columns),
with commas separating data items from each other, as Figure
1 shows. The first line (sometimes called a header line) of the CSV file
contains the field or column names, and subsequent lines contain the data. Double-quote
characters define individual items in case a data item contains a comma. CSV
files can contain an arbitrary number of lines, and they might not contain a
header line that contains the field names. As I worked with the output from
FileDB.wsc, I realized that an HTML application (HTA) would be an ideal way
to quickly and easily view CSV files. Let's take a look at the HTA I wrote and
see how the scripting behind it makes it work.
An HTA is a dynamic HTML (DHTML) document with an .hta extension that runs
at a higher privilege level than Internet Explorer (IE). As such, HTAs can use
any available ActiveX objects on the computer, even those that aren't marked
as "safe for scripting" for IE. For example, an HTML document that attempts
to use the FileSystemObject will generate a warning message if you open the
document in IE because the FileSystemObject is not marked "safe for scripting."
(And this is a good thing, because otherwise any Web site that you browse in
IE could potentially access the local file system.) In contrast, HTAs run in
mshta.exe, which allows access to all ActiveX objects on the computer. As such,
HTAs need to be treated with the same care as scripts or executables, but they
provide a convenient way to create GUI-based applications.
The Microsoft Tabular Data Control (TDC) simplifies the creation of an HTA
for viewing CSV files. The TDC is an ActiveX object that lets you display CSV-based
data in an HTML table. It's a core IE component that first appeared in IE 4.0
and is available on every Windows OS that has IE 4.0 or later. The HTA I wrote,
CSVviewer.hta, is a convenient tool for anyone who needs to view a CSV file
but doesn't need a data-analysis tool such as a spread-sheet or a database.
The CSVviewer .hta script is too long to print in its entirety here, but you
can download it from the Windows IT Pro Web site at http://www.windowsitpro.com/windowsscripting,
InstantDoc ID 94260.
The CSVviewer.hta script generates a screen like the one that Figure
2 shows. The HTA is straightforward: Simply enter the name of a CSV file
or click Browse to select a file, then click OK. If the CSV file's first line
doesn't contain field names, deselect the First line of CSV file contains
field names check box. The Display x records per page setting (where
x is a number between 1 and 999) defines the number of CSV file records
that the application will display at one time. To clear the form and start over
with a different CSV file, click Reset. Finally, you can click Exit to close
the application.
After you've selected a CSV file and clicked OK, the data from the file appears
in a table in the lower half of the window along with a Filter button and a
set of navigation buttons, as Figure 2 shows.
The Filter button provides access to the TDC's Filter property, which I explain
in a moment. The First Page, Last Page, Previous Page, and Next Page buttons
let you navigate through the data. You can also sort the data by clicking the
column headings. The first click of a column heading will perform an ascending
sort; subsequent clicks on the same column heading will reverse the sort order
with each click.
A Peek Under the Hood
CSVviewer.hta is easy to use, but there's plenty of scripting under the hood
that makes it work. I wanted CSVviewer.hta to be able to display data from an
arbitrary CSV file; that is, I didn't want to "hard-code" the column names and
the CSV file's name into the document. The CSVviewer.hta script accomplishes
these goals by reading the column headings and setting the appropriate TDC properties
at runtime.
Creating the object reference. Before you can use the TDC in
an HTML or HTA document, you need to insert an <object> element for it.
CSVviewer.hta uses the following HTML code to insert a TDC object:
<object id= "csvdata"
classid= "clsid:333C7BC4-460F-
11D0-BC04-0080C7055A83">
<param name="CaseSensitive"
value= false //>
</object>
The id attribute provides a named reference for the data in the table
to connect, or bind, to the TDC's data source (a CSV file). The classid attribute
must always be set to this particular value because it specifies the TDC's globally
unique identifier, which remains constant regardless of the computer it's used
on. The <object> element can include one or more <param> elements
that specify the TDC's initial property settings. CSVviewer.hta sets the TDC's
CaseSensitive property to false, which means that data comparisons will ignore
case. The other TDC properties (such as the DataURL property that specifies
the CSV file) are set at runtime. For more information about the TDC, see "About
Micro-soft Tabular Data Control" (http://msdn.microsoft.com/library/default.asp?url=/workshop/database/tdc/overview.asp).
Selecting a file. I wanted the HTA to provide an interactive
way of selecting the CSV file to view. HTML provides the <input type="file">
element, which includes a Browse button. This element's main purpose is to provide
a way to upload files to a Web server, but CSVviewer.hta uses it to select a
file. The id attribute makes the element's contents (i.e., the typed
or selected filename) easily accessible to the HTA's script code.
Verifying that the file exists. An <input type="submit">
element provides the OK button. This element's onclick event handler
calls the processFile function; that is, the HTA executes the processFile function
when you either press Enter or click OK. The processFile function, which you
can see in Listing 1, uses the
File-Exists method of the FileSystemObject to determine whether the selected
file exists. If the file doesn't exist, the function uses the alert method to
display an error message; otherwise, the function sets the global variable filename
to the name of the selected file. The function then executes the getFields
and putData methods, which I describe in a moment.
Processing the CSV file's fields. After the HTA verifies that
the file exists, it executes the getFields function. As Listing
2 shows, the getFields function populates the fields array (a global
variable) with the field names from the CSV file. The function uses the FileSystemObject's
Open-TextFile method to open the CSV file and read its first line, then closes
the CSV file. Next, it empties the fields array by setting its length
property to zero. The function uses the split method to return an array of substrings,
using the comma (,) character as a delimiter. Be aware that the get-Fields function
uses the split method to parse the first line of the CSV file, so the function
will fail if a field name contains a comma (even if it's enclosed in quotes).
The getFields function then uses the for statement to iterate the array
of substrings that the split method creates. If the useHeader function returns
true (which will be the case if you've selected the check box), the function
uses the replace method to remove any double-quote (") characters from
the field name. Next, the function determines whether the field name contains
the colon (:) character, indicating a field type. (I'll discuss field types
in a moment.) If the field name contains a colon, the getFields function removes
the colon and the remainder of the string so that only the field name is added
to the fields array. For example, if the field is named Size:Int, the
fields array will contain only Size (the :Int is not part of the
field's name).
If the useHeader function returns false, the getFields function will populate
the fields array with the field names Column1, Column2, Column3, and
so forth. The TDC requires these field names if it's processing a CSV file that
doesn't have a header line.
Defining the field types. When a CSV file contains a header line,
a field name can include a colon (:) and a field type. Acceptable field types
are String (the default), Date, Boolean, Int, or Float. For example, if the
field's name is Size:Int, the TDC will use Size as the field's name and Int
as its type. Without field types, the TDC will sort information as text strings,
so if you want to sort a numeric field, make sure to specify its type in the
CSV file before viewing it with CSVviewer.hta. See http://msdn.microsoft.com/workshop/database/tdc/reference/use
header.asp for more information about field types.
Creating the output table. After the getFields function determines
the list of field names, the processFile function executes the putData function
to create the output in the bottom section of the HTA window. The putData function
dynamically populates the data area of the document with the CSV data by setting
the innerHTML property of the <div> element at the bottom of the document.
The putData function starts by creating a series of HTML elements that contains
the Filter and navigation buttons. Next, the function creates the HTML table
element that connects to the TDC as a data source by setting the table's datasrc
attribute to #csvdata (the leading # character is required). The function
also sets the table's id attribute, which binds the navigation buttons
to the table; that is, the Previous Page, Next Page, First Page, and Last Page
buttons use this ID to update the table on the screen with the data from the
CSV file. The table's datapagesize attribute comes from the Display
x records per page text box near the top of the document.
After creating the opening <table> tag, the putData function adds a <thead>
tag and a <tr> tag and iterates the array of field names populated by
the getFields function. For each field name, the function creates a <th>
element, with each <th> element's id attribute set to the field's
name. It also sets the <th> element's onclick event handler to
a sorting function, which I describe shortly. After the function iterates the
field names, it closes the table heading by adding the </tr> and </thead>
tags.
Next, the putData function adds a <tbody> tag, an opening <tr>
tag, and iterates the fields array a second time. For each field name,
the function creates an opening <td> tag, a <span> element with
its datafld attribute set to the TDC object's id, and a closing </td>
tag. The datafld attribute is how the table rows bind to the TDC data
source. After iterating the array, the function adds the closing </tr>,
</tbody>, and </table> tags. Figure
3 shows the HTML table the putData function will create if you view the
CSV file shown in Figure 1.
After the putData funtion has generated the HTML that contains the navigation
buttons and the table, it sets the innerHTML property of the document's <div>
element to the generated data. To complete its work, it sets the TDC's DataURL
property to the CSV file's name, sets its UseHeader property, and calls the
TDC's Reset method to display the data.
Filtering the data. When you click Filter, CSVviewer.hta calls
the setFilter function, which uses JScript's prompt method to display a simple
dialog box that requests a filter string, as Figure
4 shows. The string you enter in this text box is set as the TDC's Filter
property. See http://msdn.microsoft.com/workshop/database/tdc/reference/filter.asp
for information on filtering the TDC's data.
Sorting the data. The TDC initially displays an unsorted view
of the CSV file, but CSVviewer.hta makes sorting easy. When the putData function
constructs the HTML table, it creates <th> elements with id attributes
set to the CSV file's field names. It also configures each <th> element's
onclick attribute to call the sortBy function, with the field name as
a parameter (see Figure 3 for an
example). The sortBy function sets the TDC's Sort property to the column name
that you clicked, then calls the TDC's Reset method to redisplay the data.
Know Your Limits
CSVviewer.hta relies on the TDC to connect to the CSV file and retrieve its
data, so it's subject to the TDC's limitations. If you open a very large CSV
file, be aware that the file might take a long time (and a lot of memory) to
open. Also, take care that the CSV file doesn't start with a blank line, which
will cause the TDC to not display any data at all. Even with these limitations,
CSVviewer.hta is a useful tool for viewing CSV files.
There is no code in the zip file other than a couple excerpts.
The HTA script is missing from the zip file. The zip only includes the two functions.
Nice but where is the rest of the code.