Import text files and get around Excel limitations by using the tricks in these creative routines
If you’ve been following this article series, you have already learned how much flexibility you can get in Microsoft Excel reports when you use VBScript scripts to expand Excel’s functionality. The ExcelerateYourVBScripts.hta file (which you can download in its entirety with the .zip file associated with this article) provides the practical code examples for the series. In parts 1 and 2 of the series (InstantDoc IDs 94768 and 95093), I covered fundamental routines that VBScript and Excel can perform, showed you how to reuse segments of code that perform common functions, and delved into two of the HTA’s subroutines: XLCharts and XLHyperlinks. In part 3 (InstantDoc ID 95238), I focused on routines that let you concatenate items, find duplicates, and filter results in a spreadsheet. This article highlights two subroutines, one that uses the Excel Query Tables class to import text files and one that gets a simple result by creating a couple of surprisingly monstrous formulas. Note that the listings associated with this article have been formatted for easier viewing. To use the subroutines in your own environment, you can download them with the full HTA in the .zip file associated with this article.
Import with Lookup, Match, and Index
The XLImportLookUpMatchIndex subroutine (which Listing 1 shows) from the ExcelerateYourVBScripts.hta file shows you how to import delimited and fixed-length text files by using the Excel QueryTables class. Excel is capable of importing data from various sources including SQL Server, Microsoft Access, Oracle, and text files. The example in the XLImportLookUpMatchIndex subroutine focuses solely on importing text. The HTA script for this example also demonstrates three methods of looking up values from one worksheet and returning related data from the imported worksheets.
Be aware that this routine temporarily creates two small text files on your C: drive: C:\~~XLimport1~~.txt and C:\~~XLimport2~~.txt. After the files are imported into the demo spreadsheet, they’re deleted.
The Excel class required to import text-file data into Excel is called the QueryTables class, and it uses a method called Add to import data directly into a worksheet. The Add method actually derives its properties and methods from another Excel class called QueryTable. These properties define the imported file’s characteristics, such as whether the text file contains headers, whether the file is delimited or fixed length, how the file is delimited, and which row to start at.
The Add method also defines the connection and destination for the file you’re importing. The connection refers to the data source that the import file comes from (in this case, the text file), and the destination refers to where the import file will go. Callout A in Listing 1 shows the source code you’ll use to import a text file into an active worksheet.
The first parameter of the Add method that you see in callout A is "TEXT;" & tf. This parameter defines the connection as a "TEXT" type file, and the full path and filename are stored in a variable named tf. The second parameter, xl.Range("A1"), defines the destination as cell A1 in the current worksheet. The Fieldnames = False line indicates that the first line of the imported file doesn’t contain fieldnames. The TextFileStartRow number indicates which line in the text file to start at. TextFileParseType will be either xlDelimited or xlFixedWidth. (This example uses a tab-delimited file, so the TextFileParseType is set to xlDelimited.) TextFileTabDelimiter is set to True because the file is tab delimited. (If the file were delimited with semicolons, commas, or spaces, you’d set that property to True instead.) And Refresh BackgroundQuery = False indicates that when this worksheet is opened, it shouldn’t refresh itself with the original Text file.
The other example in the HTA demonstrates how to import a fixed-length text file. The only changes to the import file characteristics in that example are these:
- .TextFileParseType = xlFixedWidth
- .TextFileTabDelimiter = False
- .TextFileFixedColumnWidths = Array(5, 12)
The first two characteristic changes are pretty obvious: The file is importing a fixed-width file, so it’s not tab delimited. And the new characteristic, TextFileFixedColumnWidths, defines the lengths of each field. Simply add the lengths of each field to the Array and separate them with commas. The second half of the Import Lookup routine uses the imported data and performs three types of lookups: Lookup, Match, and Index.
The LOOKUP function. The first, the LOOKUP function, returns a value from one row or one column (referred to as a vector). The LOOKUP function can also return a value from an array, but I won’t be covering that in this article. Here’s the syntax for the Lookup function:
The vector form of the LOOKUP function takes the value in the lookup_value row or column, searches the lookup_vector for that value, and returns a value from result_vector. For example, let’s say I have two worksheets: the LookupMatchIndex worksheet, which has names of people from one department in it, and the ImportReference worksheet, which contains the names and IDs of all employees in a company. I want to find the IDs for each name in the LookupMatchIndex worksheet and insert them next to each name. Here’s the code I’d use accomplish this task by using the LOOKUP function:
XL.ActiveCell.FormulaR1C1 = _
In this code snippet, you might notice that relative referencing, even when used in another worksheet, still refers to the selected Row and Column (in this case, B2). The first C\[-1\] refers to Column A (which contains the names we want to look up). ImportReference!C\[0\] refers to column B of the ImportReference worksheet. Column B is the column we search for the matching name. ImportReference!C\[-1\] refers to Column A of the ImportReference worksheet. Column A contains the identification numbers. So, for the first formula in cell B2, the function takes “Elizabeth” as the lookup value, looks for it in the ImportReference worksheet, and returns the ID number that’s in Column A of the ImportReference worksheet.
When you’re working with this routine, you need to consider two important factors:
- The values in lookup_vector must be sorted in ascending order. If they’re not, LOOKUP might not give the correct value.
- If LOOKUP can't find the lookup_value, it returns the closest value it can find.
This second important point is certainly something to consider if you don’t have matches for all the values you’re looking up. However, returning the closest value might be acceptable in certain situations.
The MATCH function. The MATCH function works much like LOOKUP does, but instead of returning a value from another cell, MATCH returns the row number if it finds a match. If it doesn’t find a match, it returns #NA. Use MATCH instead of LOOKUP when you need the position of the item instead of an associated value of the found item. Here’s the syntax for the MATCH function:
In this code, Lookup_value is the value you want to find. Lookup_array is a contiguous range containing possible lookup values. Match_type (-1, 0, or 1) specifies how Excel matches lookup_value with values in the lookup_array:
- Match_type 1—Finds the largest value that’s less than or equal to the lookup_value; the Lookup_array has to be sorted in ascending order.
- Match_type 0—Finds the first value that’s exactly equal to lookup_value; the Lookup_array doesn’t have to be sorted in any order.
- Match_type -1—Finds the smallest value that’s greater than or equal to the lookup_value; the Lookup_array must be sorted in descending order.
Match_type 1 is assumed if the match_type value is omitted.
In my example, the match result is placed in column C (which is the relative column). I use the following code, which takes the name in column A (C\[-2\]) and looks for an exact match in column B (C\[-1\]) of the ImportReference worksheet:
XL.ActiveCell.FormulaR1C1 = "=MATCH(C\[-2\],ImportReference!C\[-1\],0)"
The resulting number represents the row that the name was found in. If you run the example routine, you’ll notice that some names didn’t have a match and therefore contain #NA values.
The INDEX function. You could use MATCH and LOOKUP together if you wanted to make sure you had matches for all your lookups, but that wouldn’t be very efficient. To get that kind of result with just one function, use the INDEX function. There are two forms of the INDEX function: the array form and the reference form. I’ll cover just the array form in this article.
The array form of the INDEX function returns the value of an element in a table or an array selected by row and column number. Here is the syntax of the INDEX function:
Here’s the code snippet for performing an index lookup:
XL.Range("D2").Select XL.ActiveCell.FormulaR1C1 = _ "=INDEX(ImportReference!C\[-3\],MATCH(C\[-3\],ImportReference!C\[-2\],0),1)"
You’ll notice that the array (and not the name you want to look up) is the first argument of this function. Here, ImportReference!C\[-3\] indicates I’m using Column A of the ImportReference worksheet as my array. Column A contains the identification numbers. Next, you’ll see that I use the MATCH function just as I did in the previous example. This function looks up the name in Column A in the ImportReference worksheet and returns the row number of where the name is found. The return value is used as the row_number argument of the INDEX function. Finally, I indicate the column number 1 because I have only one column selected as my array. The resulting value of the INDEX function is the value of the row numbered element in the array. So for example, if a match was found in row 5, the value of the fifth element in the array would be returned. If no match is found, #NA is returned.
The Monster Formulas
The XLFormula1 routine, which Listing 2 shows, demonstrates how to create two complex formulas in Excel by using VBScript. The first formula takes a full path string such as C:\samples\vbscript\excel\demo1.vbs and extracts just the filename demo1.vbs. The second formula takes that same full path string and extracts just the path C:\samples\vbscript\excel. I call these complex formulas because they contain several functions (five to be exact) nested within a single formula.
I assumed when I started that this subroutine would be a piece of cake and that there’d be an in string Reverse (instrRev) function or something similar in Excel that I could use to find the last occurrence of a character (such as a backslash) within a string. Well, I was wrong.
After spending more time than I care to admit, I finally came up with what I call the Monster formulas: They’re actually frightening when you first see them. Here’s what the formula to extract the filename looks like:
XL.ActiveCell.FormulaR1C1 = _ "=MID(C\[2\],FIND(CHAR(127),SUBSTITUTE(C\[2\],""\"",CHAR(127),LEN(C\[2\])-LEN(SUBSTITUTE(C\[2\],""\"","))))-1)"
And here’s the formula to extract the path:XL.ActiveCell.FormulaR1C1 = _ "=LEFT(C\[1\],FIND(CHAR(127),SUBSTITUTE(C\[1\],""\"",CHAR(127),LEN(C\[1\])-LEN(SUBSTITUTE(C\[1\],""\"",
My initial thought was to use the FIND function to locate the position of the last backslash, then use the MID function to pull out the filename. However, the ability to find a specific occurrence of something isn’t built into the FIND function. That limitation brought up another question: How could I determine how many times the backslash occurred in the full path? After a while, it dawned on me that I could figure out the number of backslashes by getting the length of the string, removing the backslashes, then using the difference between the two.
I decided to use the SUBSTITUTE function to remove the backslashes. It also occurred to me that I could use the SUBSTITUTE function to replace the last backslash with something unique because the function lets you substitute a specific occurrence of a character. I just needed to find a unique character that I could use. From the available characters, I found several that would work because they couldn’t be used as part of a filename, but I decided to use the DEL character—represented by CHAR(127)—instead of a character such as the semi-colon, figuring that I could use this Monster on other strings that weren’t necessarily file-system related. And with that, the first Monster formula was born. The second Monster, which extracts the path, is basically the same except that it grabs everything to the left of the last backslash.
I’ll break the first formula into pieces and show you how it works. Let’s look at the snippet of code that puts the Monster formula into the selected cell:
XL.Range("A2").Select XL.ActiveCell.FormulaR1C1 = _ "=MID(C\[2\],FIND(CHAR(127),SUBSTITUTE(C\[2\],""\"",CHAR(127),LEN(C\[2\])-LEN(SUBSTITUTE(C\[2\],""\"",""""))))+1,254)"
Although this snippet might seem complicated enough already, there are a couple of things you need to be aware of when you’re coding a formula. If your formula contains anything enclosed in double quotes—such as the backslash in the SUBSTITUTE function—you must enclose the quoted strings within a second set of double quotes. And the other noteworthy item that you might recall from part 3 of this article series (at InstantDoc ID 95238) is that formulas are coded by using relative cell referencing. Let’s get the relative referencing out of the way before we break down the formula.
Because A2 is the selected cell, Column A is the reference column, and Row 2 is the reference row. If you look at the formula again, all of the occurrences of C\[2\] refer to Column C (A + 2). Column C is where the FullPath text strings reside.
At the heart of this formula is the MID function, which returns a specific number of characters from within a text string, such as the filename from within the FullPath string. Here’s the syntax for the MID function:
- Text is the text you want to search.
- Start_num is the position in the string you want to start at.
- Num_chars specifies the number of characters from the starting point that you want to return.
The text portion of the MID function in our example is C\[2\], which refers to Column C, the FullPath text string. The Start_number part of the MID function is where the Monster formula resides, and we’ll break this down shortly. To make things a little easier to understand, I’m going to start by using the actual formula instead of the relative reference code. Here’s the segment of code I use to find the start_number for the MID function:
The num_chars part of the MID function is 254. I could have tried to programmatically determine how many characters to return but I took the easy way out and used the maximum filename length.
Dissecting the Monster. Here’s the syntax for the FIND function:
- Find_text is the text you want to find.
- Within_text is the text you want to search.
- Start_num specifies the position at which to start the search; if omitted, its value is assumed to be 1.
The find_text part of the function is CHAR(127). This function represents the DEL character that we eventually want to find. The within_text part of the FIND function is where the Monster actually comes to life. That would be this segment of code:
Here, I figure out how many backslashes are in the string and replace the last backslash with the DEL character. The SUBSTITUTE function's syntax is:
- Text represents the text or reference to a cell containing text that you want to use when you substitute characters.
- Old_text represents the text you want to replace.
- New_text represents the text you want to replace the old_text with.
- Instance_num specifies which occurrence of the old_text you want to replace with the new_text. If you specify an instance_num, only that instance of old_text is replaced. Otherwise, every occurrence of the old_text in text is changed to the new_text.
In the Monster formula, the text part of the function is C:C, which is column C, where the FullPath strings reside. The old_text is simply the backslash character (\). The new_text is the DEL character—CHAR(127). And the instance_num to find the last occurrence of the backslash is derived by using a combination of LEN and SUBSTITUTE functions as seen in this segment of the formula:
Here, we’re using the length of the FullPath string and subtracting the length of the string with the backslashes removed.
Now that we have the instance number, everything will fall into place. We can replace the last backslash with the DEL character, find the DEL character to get the start position for our MID string (to which we add 1), and return up to 254 characters from that point on.
I’m sure this is a bit confusing; it certainly was for me. So now that I’ve gotten the complicated details defined, let me walk you through the major pieces of this formula to find the starting position we’d get if we wanted to get the filename out of the path string c:\Folder1\SubFolder1\SubFolder2\File1.txt, which resides in column C of the spreadsheet .
Working from right to left, we’ll walk through the formula:
The +1 at the far right of the formula is obvious; it will simply add 1 to whatever number we get as a result of the FIND function. The section SUBSTITUTE(C:C,"\","") replaces all of the backslashes with nothing. In effect, it removes all of the backslashes and internally evaluates to c:Folder1SubFolder1SubFolder2File1.txt. This value is produced solely within the function and is never actually written or stored anywhere in the spreadsheet.
With this value in hand, we can now surmise how many backslashes are in the path string by comparing the path string length without backslashes to the original path string length. That’s what the LEN(C:C)-LEN(SUBSTITUTE(C:C,"\","")) section does, resulting in a value of 4. The formula then replaces the fourth backslash in the path sting with the DEL character—CHAR(127).
The next step to the left in the function
actually produces the following:
Notice the little square character? That’s the DEL character—CHAR(127). Now we have the unique character CHAR(127) in the string where the last backslash was, which effectively enables us to use the FIND function to locate the beginning of our filename. So when we look back to the FIND function
FIND(CHAR(127),SUBSTITUTE(C:C,"\",CHAR(127),LEN(C:C)- _ LEN(SUBSTITUTE(C:C,"\",""))))+1
we see that the function looks for the DEL character in the path string and gives us the position of that character—in this case, position 33. Then, the formula adds 1 to that, giving us 34, which is the starting position that we need to pluck the filename from the path string. In the case of our example, string position 34 is the letter F, the first character of our filename. And that’s how it all comes together.
With that, we close this article in the “Create Excel Reports the VBScript Way” series. Next month, look for part 5 of this series, in which I’ll tackle programming Pivot Tables in Excel. Author’s Note: No Monsters were harmed in the making of this script.