In "Building and Using an Incident Response Toolkit, Part 1," April 2004, InstantDoc ID 41900, and "Building and Using an Incident Response Toolkit, Part 2," May 2004, InstantDoc ID 42173, I discuss how to quickly and appropriately respond to a computer security incident. In this two-part article, I delve into the field of computer forensics and show you how to conduct an in-depth analysis of the compromised machine. The preparation for this analysis involves creating a bootable CD-ROM and duplicating the compromised machine's hard disk. You can then use several utilities to analyze the contents of that duplicate disk. Before I begin, however, let's quickly cover three basic forensic-analysis principles:

  • You must avoid changing any data on the corrupt machine. To preserve the data, you need to perform a forensic duplication of the compromised hard disk, verify that the duplicate disk is exactly the same as the original, then perform all the analyses on the duplicate disk. Most forensic tools are designed to work on duplicate disks to avoid changing any pertinent information (e.g., access times) on the compromised computer.
  • You shouldn't trust any programs or data on the compromised computer because the attack might have compromised them. By performing the forensic duplication, then using forensic tools on the duplicate disk, you avoid the use of potentially compromised programs and data.
  • You need to document the forensic duplication and analysis. One good documentation method is to create digital hashes of the compromised hard disk and the forensic image. If the hashes match, you've mathematically proven that the data hasn't been altered. You should also document the programs you run and their output. If you anticipate that the computer security case will go to court, you might consider having two parties document the forensic duplication and analysis.

Keeping these three principles in mind, let's look at how to create a bootable CD-ROM and how to make a copy of the compromised disk. Note that although these tasks involve running a few Linux commands, you don't need any prior knowledge of Linux to use the methods I describe.

Creating the CD-ROM
The first task in a forensic analysis is to obtain the tools you'll need to duplicate the compromised disk and analyze the duplicate disk's contents. The forensic software that I use is the Penguin Sleuth Kit, an open-source version of the Knoppix distribution that's been modified specifically for forensic use. Knoppix is a Linux distribution that's designed to run from a CD-ROM drive without any installation. The software boots from the CD-ROM and loads the entire OS into memory. Thus, any computer with a CD-ROM drive can instantly function as a Linux-based forensic workstation. Whereas the standard Knoppix distribution writes to the disk to provide a swap partition, Penguin Sleuth Kit creator Ernest Baca has tweaked the Knoppix distribution so that the Penguin Sleuth Kit doesn't write any information to the compromised hard disk.

Other third-party tools are also available for performing forensic duplication and analysis, such as Guidance Software's EnCase and New Technologies Incorporated's (NTI's) SafeBack. However, I won't cover those tools here because, for the price of a CD-ROM, you can duplicate the third-party tools' functionality with comparable or better performance.

The Penguin Sleuth Kit contains all the standard Linux utilities, plus dozens of forensic tools, including the extremely useful Sleuth Kit (formerly known as The @stake Sleuth Kit—TASK) and Autopsy tools. In addition, you can use the Penguin Sleuth Kit to make a copy of the compromised disk or even browse that disk without disturbing any evidence.

You can download the Penguin Sleuth Kit from Linux-Forensics.com. Go to http://www.linux-forensics.com/downloads.html and click one of the Download Penguin Sleuth Mirror links. You need to burn the bootable International Organization for Standardization (ISO)-based image to a CD-ROM. That way, you simply place the bootable CD-ROM in any computer's CD-ROM drive and the Penguin Sleuth Kit is ready to use.

Besides creating the bootable CD-ROM, you must prepare the medium on which you want to copy the compromised disk. The medium needs as much space as the hard disk has because the Penguin Sleuth Kit copies even empty sections of the disk. Therefore, if you have a 10GB disk with 2GB free, you need at least 10GB, not 8GB, on which to store the image. You can use an external hard disk that's connected through a USB 2.0 or FireWire (IEEE 1394) port or an internal hard disk. If you choose to use an internal hard disk, you need to install it in the corrupt computer without displacing the CD-ROM drive or the compromised hard disk. Don't forget to note the internal hard disk's logical location (e.g., the slave on the second IDE bus).

Duplicating the Disk
When you suspect that you have a compromised computer, you first need to retrieve as much volatile evidence as possible, a topic that I discussed in Part 1 and Part 2 of "Building and Using an Incident Response Toolkit." Afterward, you must shut down the system. Before you do so, you should check whether your organization has an official policy on how to handle shutdowns. For example, the US Department of Defense (DoD) guidelines state that you should immediately pull the plug on the machine instead of performing a clean shutdown.

After you shut down the compromised computer, you must boot it from the CD-ROM you created. Insert your CD-ROM, boot the compromised computer, then press the key that will let you change the BIOS settings. (Check your OS manual to find this key. Common keys include F2 and Delete.) Change the default boot device so that the CD-ROM boots before the hard disk. You might want to open the computer and disconnect the hard disk in question before attempting to change the BIOS settings, in case your machine boots so quickly you miss your chance to press the key that lets you change the settings. This precaution ensures that the boot sequence starts with the CD-ROM first.

Your CD-ROM will boot to a command prompt. Type the command

knoppix lang=us 2

Be sure to use all lowercase letters because, like the Knoppix distribution commands, the Penguin Sleuth Kit commands are case sensitive. In this code, knoppix is the command that launches the default kernel image. The lang=us option tells the Penguin Sleuth Kit to use US standard keyboard mapping. If you use a different keyboard mapping, you should replace us with your country's two-character code. If you don't include the lang= option, the Penguin Sleuth Kit defaults to the German keyboard mapping. The 2 option tells the system to use text mode. If you omit this option, the Penguin Sleuth Kit will use GUI mode, which can take a lot of memory and isn't necessary for the forensic duplication. If you need to change or add different options, you can press F2 to access the Help screen. However, I recommend that you don't change the options unless you're familiar with Linux.

After you type the knoppix command and press Enter, the computer boots and the basic Linux command prompt appears. That prompt looks like

root@ttyp1\[/\]#

You run Linux commands the same way that you run DOS commands: You type the command name, including any arguments, and press Enter. However, some differences exist in the command syntax. Linux uses a directory structure in which slashes (/) separate directories rather than the backslashes (\) you find in DOS directories, so you must remember to use slashes in any arguments that contain paths. In addition, the standard practice for Linux command arguments is to precede them with a hypen (-) instead of a slash. Finally, unlike DOS commands, Linux commands are case sensitive.

One command that will quickly become useful to novice Linux users is the man command, which is short for manual. To display information (e.g., uses, options) about a command, you can type man followed by the name of the command you want to learn about. For example, to display information about the dcfldd command (which I explain shortly), you type

man dcfldd

and press Enter. The information displayed on screen is called a man page. In a man page, you use the Up Arrow and Down Arrow keys to scroll through the text. You also can press the Spacebar to page down one screen (80 lines) at a time.

The most common method of forensic duplication on Linux systems is the dd command, which is a standard command that copies files at the byte level. In the Linux philosophy, everything is abstracted to one file, so you can use dd to copy an entire disk to one file, or image. The DoD Computer Forensics Laboratory (DCFL) rewrote the dd command specifically for forensic work. The result is the dcfldd command, which has the ability to hash the data it's copying at specified intervals to authenticate the data. In addition, the dcfldd command is considerably faster than the dd command. (However, despite dcfldd's increased efficiency, imaging an entire disk typically takes a long time.) Although I outline the necessary usage and options for dcfldd here, I suggest you read its man page to completely familiarize yourself with this command. It's a powerful tool that performs the same tasks as expensive commercial software.

To use dcfldd, you must first tell it which disk to copy. Under Linux, IDE disks are organized in this manner: You access each device by following the nomenclature /dev/hdx, where x is the drive letter that maps to each channel. The drive letters ascend based according to on channel priority and master/slave status. Therefore, the primary master disk is /dev/hda, the primary slave is /dev/hdb, the secondary master is /dev/hdc, and the secondary slave is /dev/hdd. The nomenclature for SCSI disks is almost the same as that of the IDE disks: It's /dev/sdx, where x can again be a, b, c, and so forth.

To access an individual partition on a disk, you append the number representing the partition you want to access. For example, the third partition on the primary master disk is /dev/hda3, and the first partition on the secondary slave is /dev/hdd1. You access SCSI partitions the same way.

Next, you need to tell dcfldd where to store the image. (Although your CD-ROM creates a file system in the RAM, this file system typically isn't large enough to store an image of a hard disk.) Whether you've opted to store the image on an external or internal hard disk, you'll need to mount the disk to write coherent data to it. Mounting is the Linux term for placing a disk in a state in which the OS knows about it at the file-system level. Although the /dev/hdx nomenclature provides raw, byte-level access to a disk, the mount command lets you treat a disk partition as a directory to which you can freely write. The command

mount /dev/hdb1 /mnt

mounts the primary slave's first partition in the /mnt disk, which means that you can write to or read from any files on that disk. Linux natively knows about FAT32 and NTFS and can easily mount hard disks in those formats. However, I suggest that your hard disk be FAT32 because poor NTFS specifications on Microsoft's part make writing to NTFS partitions a difficult task.

If you're using a USB- or FireWire-based external hard disk, Linux emulates that disk's connectivity with the SCSI system. If you have no SCSI drives, the first USB- or FireWire-based disk will be /dev/sda. If you have SCSI drives, the first USB or FireWire disk will be the next disk in the SCSI chain. Linux supports almost all USB and FireWire devices, but you can go to http://www.qbik.ch/usb/devices and click Devices (for USB devices) or http://www.linux1394.org/hcl.php (for FireWire devices) to ensure compatibility between your device and Linux.

You need to unmount the disk after you're finished writing to it; otherwise, data might be lost. You use the umount (not unmount) command

umount /dev/sda1

When you unmount the disk, no files—including the command interpreter (i.e., the shell)—can be accessing it. If you're in the working directory, you must use the command

cd /

to escape from that directory before you use the umount command.

Now you're ready to image the disk. Let's assume the compromised disk that you want to image is /dev/hda1 and the disk you want to write to is /dev/hdb1, which is mounted at /mnt. Because of a current limitation in Autopsy, you have to image a partition instead of the entire disk. If the compromised disk is running only Windows, /dev/hda1 is likely the C drive. If you know other partitions exist on that disk, image them as well. (If the partition limitation is fixed by the time you read this article, you can image the entire disk.) Run the command

dcfldd if=/dev/hda1
  of=/mnt/image.dd bs=4096
  hashwindow=732954624
  hashlog=/mnt/hashlog
  conv=noerror,sync

(Although the command appears on several lines here, you enter it on one line. The same holds true for the other multiline commands in this article.) The options in this command need some explanation. The if= and of= options specify the input file (in this case, /dev/hda1) and output file (/mnt/image.dd), respectively. The bs=4096 option tells dcfldd to read and write in block sizes of 4096 bytes, which will increase the efficiency of the data transfer. The dcfldd command doesn't add unnecessary data to the copied file if the block size is larger than the input file's final block of data. For example, if the last block that dcfldd copies is only 2096 bytes, dcfldd won't add 2000 bytes of unnecessary data. It stops when the data stops. Changing this behavior is possible but not recommended for a forensic duplication.

DCFL created the hashwindow= option specifically for forensic use. This option prints an MD5 hash (i.e., a hash that uses the MD5 algorithm) of the data copied for every chunk of specified bytes. This sample command specifies 732,954,624 bytes, which means dcfldd will create a hash for every 699MB of data written. If you want dcfldd to create only one hash, you specify hashwindow=0. The hashes are displayed on screen, unless you use the hashlog= option to specify a file to which to write them. In the sample command, dcfldd writes to the /mnt/hashlog file. Writing this information to a file makes comparing and verifying the hashes much easier. In addition, you should print a copy of the hash file after the imaging is completed and keep that hard copy in a safe place. (I explain how to view the hash file shortly.)

The last option tells dcfldd how to handle read errors. By default, if an error in the read process occurs, dcfldd stops execution. Setting the conv= option to noerror tells dcfldd to skip the bytes causing the read error and continue reading the disk. Adding the sync flag tells dcfldd to write a 0 value for each byte that it couldn't read, thus keeping the final file size the same, regardless of how many errors were encountered.

Alternatively, you can omit the of= option and instead use the Netcat (or Cryptcat) utility to pipe the command output to a server. For example, if you want to pipe the command output to port 5000 on the listening server (which has the IP address 192.168.1.100), the command to run is

dcfldd if=/dev/hda1 bs=4096
  hashwindow=732954624
  hashlog=/mnt/hashlog
  conv=noerror,sync | n
  -p 5000 192.168.1.100

After the dcfldd command is completed, you must use the Md5sum utility on the output file to ensure its integrity. (See "Building and Using an Incident Response Toolkit, Part 1" for details about how to use Md5sum, Netcat, and Cryptcat.)

You can use the less command to view the hashes that the dcfldd command creates. Run the command

less /mnt/hashlog

In the less command's output, you can press the Spacebar to scroll down 80 lines at a time, just as in a man page. You can also use the Up Arrow and Down Arrow keys to slowly scroll through the output.

The dcfldd command creates only one image file. Although not necessary, you can use the split command to break the file into smaller chunks. For this example, you'd use a command such as

split —bytes=699m
  /mnt/image.dd suscomp

In this command, the —bytes= option specifies how many bytes of the input file should go into each output file. In this case, 699MB is specified, which coincides with how many bytes were used to calculate each MD5 hash. The penultimate option is the name of the file to split (in this case, /mnt/image.dd). The final option is the prefix given to the output files. These files follow the format prefixaa, prefixab, prefixac, and so on. Thus, in this example, the output files are suscompaa, suscompab, suscompac, and so on. If necessary, you can later use the cat command to concatenate groups of files. For example, the command

cat suscompaa suscompab
  > combo1

combines the suscompaa and suscompab files into the combo1 file.

A Perfect Copy
You now have a digital copy of the compromised hard disk. You can perform a forensic analysis on that duplicate disk without fear of changing any data on the compromised machine. In "Performing Forensic Analyses, Part 2," I'll show you how to perform a thorough analysis.