Welcome to the first in a series of articles about writing secure code for the Win32 API, specifically for Windows 2000 (Win2K) and Windows NT. Although some of the topics also pertain to Windows 9x, these systems don't have the rich set of security features provided by the Windows NT-based line of OSs, so some columns in this series won’t apply.
In this column, I'll focus on writing secure code using C and C++ because I'm most familiar with these programming languages. In addition, Microsoft used C and C++ to write Windows 2000 and Windows NT, and you can most easily access some of the OSs' security features using these languages. Finally, these languages are also the ones developers most commonly use to code commercial software.
So why not discuss Java or Visual Basic (VB)? Both languages make writing secure code easy by properly bounds-checking arrays, and Java has the nice automatic garbage-collection feature. To some extent, my choice to focus on C and C++ is personal preference because only these languages provide such speed and full access to system calls that I use on a daily basis. Unfortunately, one of the reasons C and C++ executables provide such speed is because these languages give you plenty of room to shoot yourself in the foot. A friend of mine once said that if C lets programmers shoot themselves in the foot, then C++ just gives them a machine gun! My experience is that poorly written C++ code is often worse than poorly written C code. However, C++ provides some convenient features and lets you write code that is better organized. I encourage anyone entering a career as a developer to learn C++ thoroughly. This education is essential to learning Distributed COM (DCOM), which developers are increasingly using to interface with Win2K and NT.
The discussions in this column assume that you have some fundamental familiarity with basic programming techniques. If you’re new to programming in C or C++, I recommend that you start with a good reference. One of the best titles available is A Book on C: Programming in C by Al Kelly and Ira Pohl (ISBN: 0201183994). Another good reference is C: A Reference Manual by Samuel Harbison and Guy Steele (ISBN: 0133262243). Ira Pohl has also authored some excellent books on C++, and if you’re learning C++, make sure you spend some time getting to know the standard template library (STL). STL simplifies many difficult tasks—once you get past the learning curve. If you're already familiar with coding basics, a great reference that will improve your coding skills is Writing Solid Code by Steve Macguire (ISBN: 1556155514). Even if you’ve been programming in C for 15 years, this book is worth reading. Some of the best programmers I know have told me that they’ve learned from this text, and it's essential to anyone starting out.
Laziness, Impatience, and Hubris
The first topic I want to cover in this series is that secure code is really just solid code. The steps you take to make your code more secure are also going to make your code more robust and reliable. However, in Programming Perl by Larry Wall, Tom Christiansen, et al. (ISBN: 1565921496), the authors assert that the three characteristics of a great programmer are laziness, impatience, and hubris. On the face of it, this assertion seems odd—you would think we should be hard working, patient, and humble. Why would the authors say such a thing, and why would I agree with them? Let's look at all three.
The first characteristic, laziness, makes you go to great effort to reduce overall energy expenditure. Think of laziness in the long term—writing code correctly the first time is easier than going back and patching that code, especially if you’ve shipped a commercial product. One off-by-one error I made cost my company dozens of hours because we had to ship a dot release to put out a fix. What's more, our customers didn’t appreciate paying for a product that wasn’t working properly. Be lazy—do it right the first time.
The second characteristic, impatience, is what makes you write applications that are easy to use. An impatient programmer doesn’t put up with poorly constructed programs that perform poorly, or are hard to use. The third characteristic, hubris, is defined as excessive pride. It's what motivates you to write programs that people will like, and the programmers who look at your code once you move on won’t say nasty things about you. Although these factors are true, some of the worst programmers are the proudest of their work, and some of the best are quietly humble. Don’t let your desire to write the best code you can blind you to your ability to make mistakes. Assume that you will make mistakes, and constantly ask yourself how things can go wrong. If you’re dealing with user input, or anything outside of the direct control of your application, assume that the outside world is bent on your destruction. You’re not being paranoid—they really are out to get you. Get someone else to review your code in depth, pick it apart, and point out seemingly inconsequential details.
Rules for Writing Solid Code
Some aspects of writing solid code from a security standpoint are arcane, such as properly changing user context and setting permissions. However, most real-life security problems stem from code that isn’t robust and contains programming mistakes. The most common security concern results from buffer overruns. Guarding against these attacks isn't rocket science—exploiting the buffer overrun might take a fairly skilled hacker, but writing code that prevents these attacks only takes a careful programmer. Here are some rules I try to follow:
Write code that is easy to read. Use lots of white space and comments. Code that is easy to read is easy to review, and if you have to come back to it later, you’ll appreciate the notes to yourself. If you have to work on someone else’s code, or someone else has to work on your code, then comments can make the difference between making the correct changes and breaking something else. Make sure that the logic flow is easy to follow and well documented. If you can't easily follow the logic in a piece of code, the code will be more likely to do something unexpected. And unexpected behavior is often a component of a security problem.
Fail gracefully. I once showed one of my programmers a bug that passed a null pointer into one of his functions and, as a result, caused the application to fail. His response was "That’s never supposed to happen!" Unfortunately, it did happen. A function should never cause the entire application to fail, no matter what the input or how many layers are above the function that should be validating the input for you. A related topic is to use assert() liberally. Assert(), which operates only in a debug build, validates your assumptions by checking to see whether a conditional statement is true or false. A nice way to include this function is
if(arg == NULL)
Assert() works only when you run a debug build, but this function will gracefully throw an error in release mode. Further, if you're running a debug build, you’ll land in the debugger right where the problem first showed up. Just be aware that you should never use structured exception handling as a substitute for writing solid code in the first place.
Write simple functions. New programmers often make the mistake of creating functions that do too many things, are too long, and are too flexible. I’ve seen single functions that were longer than 1000 lines and repeated the same code multiple times within the function. You will also sometimes see this problem when a programmer extends an existing piece of code over time (a phenomenon known as code rot). A complex or long function is difficult to test, difficult to review, and more likely to contain security problems. Remember, keep your functions simple.
Encapsulate properly. Encapsulation is one of the properties of an object-oriented programming (OOP) language, but you can also use it with other languages. Encapsulation hides internal implementation details and provides public interfaces to an object. This property lets you make changes in one place, and as long as you don’t alter the external behavior, you don’t have to change any of the code where you use the public interfaces. For example, I may need to validate a certain piece of data multiple times in many different places. The best way to accomplish this task is to write a function that validates the data in one place. That way, if I ever need to change the maximum size of an input string, I only need to change it in one place.
If you use encapsulation properly, it can make writing robust code in C++ easier than writing the same code in C. For example, a common problem is parsing an input string, and the results may get passed down through several layers of functions. If you use C to perform the parsing, you either need to pass an argument for both the buffer and the length or check the length of the string on input and let the rest of the code assume that the string is a certain length—a sloppy and perilous approach. Several documented security bugs have come from similarly written code. If you use C++ to parse the string, you can easily make a class that provides all the operations you need done to the input string. You can then pass a reference (or a pointer) to the class down through the functions, and any assumptions about sizing are all done in one place.
Stay tuned. Next time I'll get into the details of how a buffer overrun works, and how we can handle strings safely so that our code isn’t the next topic of conversation on the security mailing lists!