Use these subroutines to concatenate, find duplicates, and filter
When you use VBScript scripts to create Microsoft Excel reports, you open up a tremendous amount of flexibility in Excel. This article series walks you through the practical code examples in the ExcelerateYourVBScripts.hta file, which you can download in its entirety with the .zip file associated with this article. In Parts 1 and 2 of the series, I covered fundamental routines that VBScript and Excel can perform, showed you how to reuse segments of code that perform common functions, and delved into two of the HTA's subroutines: XLCharts and XLHyperlinks. Now, I'd like to focus on the next routines in the ExcelerateYourVBScripts HTA, which let you concatenate items, find duplicates, and filter results in a spreadsheet.
This article covers not only concatenation, but also several commonly used Excel features, including conditional formatting, inserting columns, pasting formats, the CountIF function, and relative cell referencing. And although the focus of the example code is to find duplicates, I also discuss a number of fundamentals that you'll see repeatedly throughout this series—and that you're likely to use in many of your own Excel VBScript scripts.
Start at the Beginning
Let me start with a brief description of conditional formatting. As the name implies, conditional formatting provides a means of modifying a cell's format if that cell meets a specific condition. In this article's example, I use conditional formatting to change the background color of a cell to teal if it contains the value “True.”
You use the Concatenate function to join multiple text items into one text string. The text items can be text strings, numbers, or single-cell references. In the example, I create a new field (or column) to hold the concatenated values of three cells: First Name, Middle Initial, and Last Name. By stringing these cells together, I can easily determine whether a record is unique.
The CountIf function lets you count within a range of cells the number of cells that meet given criteria. I use this function to determine whether the concatenated name in a cell occurs more than once within the column. If it does, that record is considered a duplicate. A quick note about Excel functions: Even though they perform a function, Excel functions are also considered elements of an Excel formula. You'll see what I'm referring to in the code snippet that you see at the beginning of the next section. Notice the word formula that occurs in the second line of code (XL.ActiveCell.FormulaR1C1).
The XLFindDups Subroutine
As Listing 1 shows, the XLFindDups subroutine begins, as do most of the scripts in this series, by setting up the test data. Then, in the code at callout A, I insert a new column called “Concat”. This column will contain the concatenated First, Middle, and Last Names of each record in the spreadsheet. To concatenate, start by selecting the cell that will contain the concatenated values, then use the ActiveCell.FormulaR1C1 method with the Concatenate function to create a formula for the cell, as illustrated in the following two lines of code:
XL.ActiveCell.FormulaR1C1 = "=CONCATENATE(C\[+1\],C\[+2\],C\[+3\])"
I'm sure you're probably scratching your head and wondering what's going on with that Concatenate function. What you see here is the use of relative cell referencing. The reference is relative to the currently selected cell, which is cell A2 (Column A Row 2). The C just before the \[+1\], indicates it's a reference to the current Column, and the \[+1\] is a direction to advance one column from the currently selected column. If you see an R in a relative reference, it indicates Row, and if you see a minus sign
(-), it indicates the reference is in the reverse direction. So in this example, C\[+1\] means Column A + 1, which in relative terms refers to Column B. C\[+2\], as you might deduce, refers to Column C, and C\[+3\] refers to Column D.
When I put the formula above into cell A2, Excel converts it to =CONCATENATE(B:B,C:C,D:D). This change effectively fills the cell with a string composed of the FirstName, the Middle Initial, and the LastName. I need this formula not only in the selected cell but also in every subsequent cell in the Concat column. As it turns out, I can automatically fill these cells fairly easily by using the AutoFill method with a defined Range. Here's the line of code I use to AutoFill my concatenate formula all the way down the Concat column:
XL.Selection.AutoFill XL.Range("A2" & ":" & "A" & lastrow)
Notice that I set the lastrow variable equal to 9 earlier in the script. So actually, the range that I'm using for the AutoFill is A2 thru A9. I'll demonstrate other methods of programmatically determining last cells, last rows, and last columns throughout this series.
Next, I copy the entire worksheet and paste it into a new worksheet called FindDups. I find that having two separate worksheets makes it easier to visualize and analyze the results of the Concatenate process and the FindDups process.
The code at callout B in Listing 1 shows how to copy and paste a worksheet. Start by selecting the first cell of the spreadsheet with this line of code:
By using this code, you can then use the XL.Selection property to reference the selected cell. We'll use this cell as the starting point of the Range we want to copy. Because we want to copy the entire worksheet, the “last used cell” in the spreadsheet is going to be considered our ending range. You can automatically determine the last cell in the spreadsheet by using the.SpecialCells(11) property. The number eleven represents an Excel Constant named xlLastCell. You might want to declare that Const in your scripts and use xlLastCell instead of the hardcoded 11.
To select the entire worksheet, I use the following Range Select statement, which contains both the start and endpoint parameters:
Then, to copy the selection, simply use a Copy method such as XL.Selection.Copy, as the code at callout B shows. To paste the worksheet into a new worksheet, I start by creating the FindDups worksheet, then I select the FindDups worksheet and complete the process with the ActiveSheet.Paste method. Here's the segment of code I use to accomplish this task.
XL.Sheets.Add.name = "FindDups"
Now that I have my new worksheet in place, I want to create a new column called Dups, which I'll use to indicate whether the record is a duplicate. To insert a column into an Excel spreadsheet, you start by selecting a column, then inserting either to the left or right of that column. I want to insert the Dups column between columns A and B, so I select Column A and insert a column to the Right by using the Insert method with the xlToRight parameter, as demonstrated in these two lines of code:
Cells in this field will contain the value of either True or False. True means the spreadsheet contains one or more records that has the same First, Middle, and Last name as this record; False indicates that the record is unique. To accomplish this part of the task, I use the Excel Countif function to determine whether a concatenated name occurs more than once in the column. The syntax for the Countif function is:
As you might recall, this function counts the number of cells in a specified range that meet given criteria. The range in this example is Column A. The Criteria define what is to be counted. In this example I want to look at each individual cell value in Column A (the concatenated First, Middle, and Last Names) and search the entire Column to see whether that value exists more than once, so the criterion is the value of a Cell. If the value exists more than once, I want the Dups cell to hold the value True; otherwise I want the cell to have the value False. To accomplish this task, I can use the following two lines of code, which you can also see in the code section at callout C:
XL.ActiveCell.FormulaR1C1 = "=COUNTIF(C\[-1\],RC\[-1\])>1"
Here again, you'll notice the use of relative referencing. Because cell B2 is the selected cell, C\[-1\] translates to Column B-1, which refers to Column A. The letter R refers to Row, and you'll notice that the R doesn't have any plus or minus element next to it. This means it's relative to the currently selected row, in this case Row 2. Finally, C\[-1\] refers to Column A. When this statement is placed in cell B2, as the code at callout D shows, it translates to
Note that when this relative reference is applied to subsequent Rows, the Row number is relative to whatever Row the reference is in. In other words, when this formula is copied to cell B3, for example, RC\[-1\] translates to A3, so the formula would look like this:
Learning About Countif
Let me briefly elaborate on the powerful Countif function. The Countif function searches a range of cells (in this case, all of Column A) for the value contained in a specific cell (in this case, cell A2), and returns the number of times that value was found. In my example, this formula or function is in every cell in Column A. Therefore, the process of searching the entire column takes place on every cell.
As you can imagine, if you have tens of thousands of records, you could really give your CPU a workout and the entire process could take a long time to complete. Keep this in mind when you're considering using this function, and particularly if you use it more than once in the same routine, as I do with Conditional formatting, which I discuss next.
In our example, conditional formatting uses the Countif function with the FormatConditions property to color the interior of the cell teal if the condition is True. Then, after the first cell is formatted, the format is copied and pasted down the entire row by using the PasteSpecial method. This process is demonstrated in the code at callout D.
As you examine callout D's code, you'll see that any existing conditional formatting that might have been in the selected cell is deleted straight off. Then, the new conditional format is created by using the FormatConditions.Add method. This method takes three to four arguments depending on whether you're using an expression as your condition (such as the Countif formula I used in callout D) or the Cell value as the condition. If you use an expression, xlExpression (which is defined at the beginning of Listing 1 as Const xlExpression = 2) will be the first argument. The second argument represents an operator (e.g., greater than, less than, equal to) and isn't used when the condition is an expression. The third argument represents the formula you want to use for your condition.
If you were using the cell value to determine the condition, your first parameter would be xlCellValue, your second parameter would be an Excel constant representing an operator such as xlbetween, the third parameter would be the first condition or formula, and the fourth parameter would be the second condition or formula. You'd use the CellValue conditional formatting if you wanted to format the cell according to some condition—for example, if its value was between 1 and 5.
The next line of code
XL.Selection.FormatConditions(1).Interior.ColorIndex = 42
defines how you want to format the cell if it meets the condition you've set. Here, we simply set the interior color of the cell to teal. You can have more than one conditional format defined if you choose to do so. That's why there's an element number within the FormatConditions(1) segment of the Statement. You might have chosen to format the False values in orange, for example, as a second condition and would have added code similar to this:
XL.Selection.FormatConditions.Add xlExpression,, "=COUNTIF(A:A,A2)=1"
XL.Selection.FormatConditions(2).Interior.ColorIndex = 45
As you can see, the element number increased as we added another conditional format.
All that's left to do is copy the cell that includes the conditional formatting, select the entire column, and paste the conditional format by using the PasteSpecial method with the xlPastFormats argument, which indicates that I want to paste formats as opposed to values or formulas. And, as with all Excel constants, you must define the xlPastFormats constant within the script. Once the formats have been pasted all the way down the column, you'll begin to see True values change color. Using conditional formatting in this manner makes it easy to spot the duplicates.
The XLAutofilter Routine
Excel's AutoFilter feature lets you filter a spreadsheet in several ways so that only certain records are displayed. For example, you can set a filter to display only the top 10 sales, the top 5 percent of sales, or the lowest and the highest sales. Or you might want to display records of only specific individuals. The syntax of the AutoFilter function is structured like this:
Autofilter Field#, Criterion1, Operator, Criterion2
This function is a little out of the ordinary in that it can be structured based on user-defined criteria or based on what's called the Top 10 AutoFilter. Let me start with an explanation of how the AutoFilter is structured when you're using user-defined criteria. Field# indicates which field you want to filter on. (This description is true for the Top 10 AutoFilter as well.) Criterion1 is the condition you want the filter to meet (i.e., Field1 is greater than 19). Operator will be AND or OR and is used only if you have another condition you want the filter to meet. Criterion2 is the second condition you want the filter to meet (i.e., Field1 is less than 2). These examples from the code in Listing 2 using AutoFilter with user-defined filters are defined as follows:
- XL.Selection.AutoFilter 1, ">15"—Filters amounts in Field1 that are greater than 15.
- XL.Selection.AutoFilter 1, ">19",xlOr , "<2"—Filters amounts greater than 19 or less than 2.
- XL.Selection.AutoFilter 2, "Jack"—Filters names in Field2 where the name equals Jack.
The second type of AutoFilter is called the Top 10 AutoFilter. Figure 1 and Figure 2 show the feature in use. Don't let the name of this filter mislead you. Although it's called the Top 10 AutoFilter, you can filter from the top or bottom and you can choose any number (not just 10) for the percentage or number of items that you want to filter. Here's how to structure a filter when you're using the Top 10 AutoFilter.
Autofilter Field#, Criterion1, Operator
Field# indicates which field you want to filter on. Criterion1 is the number of items or percentage you want to use as your top or bottom number. Operator will be one of the following constants:
- xlTop10Items—Indicates you'll be filtering the top x number of items that meet the criteria.
- xlTop10Percent—Indicates you'll be filtering the top x percent of all records meeting the criteria.
- xlBottom10Percent—Indicates you'll be filtering the bottom x percent of all records meeting the criteria.
- xlBottom10Items—Indicates you'll be filtering the bottom x number of items that meet the criteria.
Note that you'll need to include these constants in your script:
- Const xlTop10Items = 3 : Const xlTop10Percent = 5
- Const xlBottom10Percent = 6 : Const xlBottom10Items = 4
Here are four examples from Listing 2 of how to script the four Top 10 AutoFilter types:
- XL.Selection.AutoFilter 1, "2", xlTop10Items—Sets an AutoFilter on values in Column 1 to filter the top 2 items.
- XL.Selection.AutoFilter 1, "1", xlBottom10Percent—Sets an AutoFilter on values in Column 1 to filter the bottom 1 percent.
- XL.Selection.AutoFilter 1, "5", xlTop10Percent—Filters the top 5 percent.
- XL.Selection.AutoFilter 1, "8", xlBottom10Items— Filters the bottom 8 items.
The example code in Listing 2 might look like it does a lot, but it's mostly data and worksheet setup code. But as you can see, you can set a filter with a single line of code.
That wraps up Part 3 of this series about how to eXLerate your scripts. Stay tuned for the next article, in which I'll resume coverage of the ExcelerateYourVBScripts HTA.