Last week, in "Online Fraud Continues to Escalate" (URL below), I presented an overview of data collected and analyzed by Cyveillance. You recall that the data revealed several interesting statistics, including that in fourth quarter 2007, 51 percent of phishing pages were hosted on compromised servers.

http://www.windowsitpro.com/Windows/Article/ArticleID/98332/98332.html

Google released a report that reveals still more staggering figures. From January through October 2007, Niels Provos and Panayiotis Mavrommatis (both of Google), along with Moheeb Abu Rajab and Fabian Monrose of Johns Hopkins University, subjected approximately 66.5 million Web pages to in-depth analysis, which revealed that more than 3.4 million malicious Web pages are plaguing the Internet with drive-by downloads.

A drive-by download is a situation in which a computer becomes infected with some sort of malware when its user simply visits a Web page. Such infections typically take advantage of vulnerabilities in browsers, browser components (typically add-ons), and OSs.

One question that might arise is this: If Google indexes billions of Web pages, then why were only 66.5 million pages subjected to in-depth inspection? The answer is both simple and complicated. Put simply, the team sifted Google's multibillion-page haystack to find suspicious needles by using a somewhat complex methodology. In short, the team used automation to analyze page content for specific factors such as "out-of-place" iFrames, iFrames that pulled content from known malware sites, and obfuscated JavaScripts.

That led to the discovery of a huge pile of needles (66.5 million), which were then further subjected to scrutiny in a second phase of processing. The second phase involved building a big honeynet of virtual machines running Windows, unpatched copies of Microsoft Internet Explorer (IE), and various antivirus packages. That setup was combined with various heuristics to determine whether a page was most likely malicious as opposed to only being suspicious.

In the report, the team wrote that, "To limit false positives, we choose a conservative decision criteria \[sic\] that uses an empirically derived threshold to mark a URL as malicious. This threshold is set such that it will be met if we detect changes in the system state, including the file system as well as creation of new processes. A visited URL is marked as malicious if it meets the threshold and one of the incoming HTTP responses is marked as malicious by at least one anti-virus scanner. ... Finally, a URL that meets the threshold requirement but has no incoming payload flagged by any of the anti-virus engines, is marked as suspicious."

The team was also able to map malware distribution networks. By analyzing page content relationships, the team determined how pages fell into a hierarchy. Landing sites led to malicious pages, and malicious pages led to malware distribution sites.

The mapping revealed that approximately 181,700 sites host the 3.4 million malicious drive-by download pages discovered. The mapping also revealed approximately 9,340 malware distribution sites behind those pages. Eighty percent of all sites that contain drive-by downloads are hosted in China and the United States, with China being by far the most prevalent location. The percentages are similar for distribution sites. China leads the pack, with the United States following in a distant second place.

The report contains a ton of useful information and loads of statistics that you will undoubtedly find incredibly interesting. For example, the researchers revealed that "1.3 percent of the incoming search queries to Google's search engine return at least one link to a malicious site." The team also revealed that if someone visits a malicious URL with an unprotected Windows system, an average of 8 executable files are downloaded to the system, and in extreme cases, the number can be as high as 60! If that's not valuable data to help bolster security budgets, then I don't know what is.

To learn more, get a copy of the team's 22-page report in PDF format at the URL below.

http://research.google.com/archive/provos-2008a.pdf