Regular Expressions Intro

Have you experienced the power of regular expressions? If you have, then you already know what a useful tool they can be - I myself was put off for a long time by the nutso syntax. I mean, deciphering "^\\\\\w+\\\w+(\\\w+)+" isn't exactly easy-peasy. But once you dig into them, regular expressions have a lot of utility. PowerShell happens to have great regex support, making it easier to use regular expressions that you've written or copied - ahem, "repurposed" - from the Intertubes.

Essentially, a regex is a way of describing a text pattern to a computer. I've used them to extract IP addresses from IIS server logs and firewall logs, to validate user input to make sure something looks like a UNC, extract hyperlink tags from an HTML document, and to check e-mail addresses for conformity to a corporate standard. The regex syntax is simply a very specialized mini-programming language, used to describe those patterns to the computer. PowerShell can then tell you if a piece of data matches a given regex, or it can use a regex to locate and extract information from a larger body of text.

My two favorite Web sites are RegExLib and RegExTester. The former is a vast, free library of user-contributed regular expressions for a variety of tasks, and the second is a free, Web-based tester for regexes. Need to validate the pattern for an Italian address?


Wow. Need to detect and strip potentially-malicious HTML code from user input?


Thanks, RegExLib! Putting these to use in PowerShell requires one of two methods: The -match operator, or the Select-String cmdlet (other commands, such as the Switch construct, also accept regular expressions). Need to see if a suer has entered a valid YYYY-mm-dd formatted date? Get the user's input into a variable, such as $userdate, and do this:

$userdate -match "^[0-9]{4}-(((0[13578]|(10|12))-(0[1-9]|[1-2][0-9]|3[0-1]))|(02-(0[1-9]|[1-2][0-9]))|((0[469]|11)-(0[1-9]|[1-2][0-9]|30)))$"

PowerShell will return True or False if there's a match. The trick is to remember that the text data goes before the -match operator, and the regex comes after. Heck, there's a third way: The -replace operator can replace regex matches with whatever you want. So replace that malicious HTML input with an empty string (effectively deleting the offending markup):

$user_input -replace "^[^`~!/@\#}$%:;)(_^{&*=|'+]+$", ""

Both -match and -replace are case-insensitive; use -cmatch and -creplace if you need a case-sensitive version. 

Are regular expressions something you'd find useful in PowerShell? Say the word and I'll write up a more detailed syntax tutorial!

Discuss this Blog Entry 2

David Clarke (not verified)
on Sep 21, 2010
The best treatment of regular expressions I've read is Jeffrey Friedl's Mastering Regular Expressions Also one of the best tech books I've read which is quite an achievement considering the subject material could otherwise be considered fairly dry, highly recommended.
JT (not verified)
on Sep 21, 2010
Hi Don...

I have also used RegEx in powershell as you described. In my work with MOSS 2007, our developers sometimes need to verify that a web.config update has propagated. I can use the select-string -path to recurse through all the virtual directories and peek into the web.config and look for the module that they wanted to add. It has proven to be a very quick method of searching for a specific item in a bunch of files without having to "browse | open | Ctrl-F | type search string | close."

I have also found it useful to use nested "select-strings" when looking at ULS logs or IIS Logs. I can set up the first "select-string" to search for a broad target (a filename or site name, for instance) and then pipe those results to another "select-string" that will search for perhaps a success or failure code.


Please or Register to post comments.

What's PowerShell with a Purpose Blog?

Don Jones demystifies Windows PowerShell.

Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×