Chapter 1: The Internet Is a Hostile Environment

Overview

Consider a system or a script. As with any other object in the world, its behavior depends on external and internal conditions. Among internal conditions are the server settings, the type of server, the type of database used in the system, the content of the environment variables, the information on the server's hard disk, and the content of the database.

External conditions are the data sent to the server using the HyperText Transfer Protocol (HTTP). Examples of such data are the GET, POST, and COOKIE parameters. In addition, some headers sent by a client to the server according to HTTP are examples of such data. These settings are specified and changed by the client, and the script will receive them asenvironment variables.

Fortunately, an external user, that is, a visitor to the Web site, cannot affect internal factors. However, he or she can change external factors.

Dynamics Causes "Holes"

Consider a complex system consisting of many interrelated components. For example, a Web system consists of a news system, a chat, a forum, and so on. It would be wise to assume that a system or a site has dynamic content.



Definition

In this context, content means the content of a HyperText Markup Language (HTML) page.

Dynamic content can be defined as a response of the system to changes in its external conditions. This response can be documented (i.e., explicitly described or logically implied) or not. In the latter case, it is a result of side effects in the system. These side effects are usually unpredictable, and they are called vulnerabilities or simply holes.

Dynamic content is fraught with threat. A site based completely on static content, that is, including only static HTML pages, will not be vulnerable to attacks on scripts because it has no scripts. By definition, a static system doesn't respond to changes in external conditions; therefore, it has a documented response.

However, you shouldn't think that a static site is invulnerable to all types of attacks. For example, it is possible for a malicious person to attack the site through other services, such as vulnerabilities in other Web sites that are physically located on the same server but are components of another system. In addition, attacks on the Web server are possible. In this book, I describe only Web attacks, that is, attacks on scripts and applications accessible using HTTP.

So, dynamic content is the origin of all holes in Web applications. One obvious solution to the problem could involve abandoning dynamics in the Web. However, the contemporary Web would be impossible without dynamics. Forums, guest books, newsgroups, and so on, would be missing from the Web. Therefore, you need to write secure Web applications and scripts and stable systems.

Stable Systems



Definition

A stable system is a system with a documented response to any change in external conditions.

It appears that this definition, which I learned as a student, is a clue to writing secure Web applications. A system can work well in normal conditions.

Messages will be added to a forum, a search in a database will return results, and so on. What's more, the system will pass all tests for functioning in normal conditions, that is, in conditions, in which a user doesn't interfere between the browser and the server but just clicks links and sends forms with valid data. In such conditions, the system will work well.

As you can see, interaction between a user and a system, or, in other words, changing the external conditions of the system, can be of two types.

The first type is valid HTTP requests that agree with common sense. For example, digits are used to specify an ID in a database, and letters are used to specify a person's name when searching in a database.



Definition

An HTTP request is a data set sent by a client to a Web server in accordance with HTTP. The data contain the address of the requested script, the server name, and, possibly, parameters such as GET, POST, and COOKIE. In addition, the client can send some secondary data as header fields.

It is recommended that you test the system's behavior in a situation, in which a user examines the HTML code received from the Web server and sends abnormal requests, for example, enters invalid data into form fields.

Consider a few examples.

The script http://localhost/1/1.php returns a person's name stored in a database with an ID. The ID is sent as a GET parameter with the name id.

The system will normally respond to valid ID values:

  • http://localhost/1/1.php?id=1

  • http://localhost/1/1.php?id=2

  • http://localhost/1/1.php?id=100

Therefore, you could say the system works correctly. To be more precise, it correctly responds to correct requests.

In this example, you can see that a request without parameters causes a field to appear, into which you should enter a person's ID. If you enter an integer, you'll see either the person's name or the message telling you that no record was found.

This is an implementation of a simple procedure of retrieving information from the simplest database, a table with two columns: integer id (a person's ID) and string name (his or her name).

How will this script behave in other conditions? What will happen if somebody enters data other than an integer into the ID field? The documentation to the system doesn't describe the system's response. You could expect the system to detect the invalid ID and return an error message. However, you should test it.

Try http://localhost/1/1.php?id=a, and you'll see the following message:

   Warning: mysql_fetch_object(): supplied argument is not a valid MySQL

result resource in x:\localhost\1\1.php on line 15



No records were found.



You might be wondering what this means, how an attacker can use this information, and how you should defend your system. I'll comprehensively explain these issues in subsequent chapters.

This warning message shows that the system improperly responds to an ID that isn't an integer.

Consider another example.

The script http://localhost/1/2.php produces almost the same result as the first one, but it looks for a name in a file rather than in a database. The file name is an ID with the TXT extension.

Test this script by sending the following requests:

  • http://localhost/1/2.php?id=1

  • http://localhost/1/2.php?id=2

  • http://localhost/1/2.php?id=3

You'll see that the script normally responds to normal requests that contain IDs of people whose files are available on the disk.

Test the script's behavior in abnormal situations:

  • http://localhost/1/2.php?id=9999

  • http://localhost/1/2.php?id=a

You'll get messages like the following:

   Warning: fopen(data/5.txt): failed to open stream: No such file or

directory in x:\localhost\1\2.php on line 12



Warning: fread(): supplied argument is not a valid stream resource in

x:\localhost\1\2.php on line 13



Warning: fclose(): supplied argument is not a valid stream resource

in x:\localhost\1\2.php on line 15

As you can see, the system responds improperly to a request containing an ID that isn't integer or an ID that doesn't correspond to any record.

How can an attacker use the information contained in these messages? Again, I'll provide answers in subsequent chapters.

Both examples demonstrate unstable systems. You can explain and predict these scripts' behavior if you examine their code. However, this cannot be called a documented response.

If the scripts would return messages that say requests are invalid, this would be a documented response. Instead, you receive the interpreter's messages that say scripts contained errors.

You could see a lot of such examples in everyday life. People focus attention on how a system works in normal external conditions and almost always ignore that the external conditions can be illogical.

Filtration is most important when writing stable systems.

Filtration

The notion of filtration is often used when discussing vulnerabilities.



Definition

Filtration involves changing the contents of a parameter to avoid an undocumented response from the script.

Sometimes, the script performs filtration before it uses the parameter; in other cases, filtration is performed by auxiliary modules.

A character or a sequence of characters received from a user can be filtered in various ways. For example, quotation marks in a string can affect the processing of a Structured Query Language (SQL) request and cause a syntax error. To avoid this, you could simply remove them from the string before processing the request. In my opinion, however, it would be best to add a backslash (\) before each quotation mark. In this case, the database server wouldn't treat a quotation mark as a string-terminating character and wouldn't treat the backslash as a character of the string.

To demonstrate how SQL responds to the backslash character, I suggest that you make a few SQL requests:

   mysql> select 'test - \'tested\' ';

+-----------------+

| test - 'tested' |

+-----------------+

| test - 'tested' |

+-----------------+

1 row in set (0.00 sec)

mysql>

As you can see, the quotation marks preceded by backslashes were displayed normally. In contrast, the following request will cause an error message:

   mysql> select 'test - 'tested' ';



ERROR 1064: You have an error in your SQL syntax. Check the manual

that corresponds to your MySQL server version for the right syntax to

use near '' '' at line 1



mysql>



Obviously, different parameters should be filtered differently. For example, an unmatched back quotation mark in a string can be crucial in some cases. In other cases, an improper parameter type can cause a system error. This happened in the first example. In yet another case, a parameter with a value outside a valid range can cause an error.

In essence, filtration can be of two types. These are filtration by barring suspicious parameter values and filtration by setting parameters to safe values.

Filtration by barring is a matter of halting the script execution when suspicious elements (such as a quotation mark or the < or > character delimiting HTML tags) are encountered in a parameter. In such cases, a user sees an error message. This type of filtration has disadvantages. For example, valid values can be barred. If a message in a forum contains a quotation mark, such a filtration will prohibit the publication of this message. Therefore, it will be impossible to send a message with a single quotation mark, even though such messages are likely.

This behavior of the protection would seem normal if you remember that a quotation mark makes an SQL request invalid. However, it cannot be justified by common sense.

In my opinion, filtration by setting to safe values is the best. However, it sets all suspicious parameters to a safe form, thus changing their values.

When Filtration Is Insufficient

You could think that filtration is a clue to the problem of Web application safety. However, this is not the case.

Consider an example: http://localhost/1/3.php. A design specification for this script could be as follows:

Write a script that displays the name of a person whose ID is entered. The data are stored in files that have names identical to IDs and the TXT extensions. For example, the data of a person whose ID is 3 are stored in the file Image from book 3.TXT.

If no person with the specified ID is found, an appropriate message should be returned.

The ID is sent using the HTTP GET method. If an ID is missing, the script should display a form suggesting that the user enter his or her ID.

Here is the code of this script:

   <?

if(empty($id))

{

echo "

<form>

enter id (integer)<input type=text name=id>

<input type=submit>

</form>

";

exit;

};

if(file_exists("data/$id.txt"))

{

$f=fopen("data/$id.txt", "r");

$s=fread($f, 1024);

echo $s;

fclose($f);

}

else

echo "records not found";

?>

Does this script conform to the design specification? It certainly does. The script is comprehensively described in the specification. In particular, its response to an abnormal situation is described: If no file is found, a message should be displayed.

However, the design specification doesn't tell whether the ID should be an integer.

The script completely implements the design specification. For example, if the ID is omitted, an appropriate form is displayed. When the script receives the ID, it looks for a file with the corresponding name.

If the file isn't found, the "records not found" message is displayed, and the script doesn't try to read any data.

Finally, it the file is found, its contents are sent to the browser.

This behavior seems invulnerable. It seems impossible to imagine a situation that would cause an error. If the file isn't found or the name is invalid, the script sends a message to the browser. Note that this message is generated by the script rather than by the interpreter.

You should test this. Make the following requests:

  • http://localhost/1/3.php?id=1

  • http://localhost/1/3.php?id=2

  • http://localhost/1/3.php?id=3

As a result, you'll receive corresponding records. Even if you send an ID that isn't integer but a corresponding file exists (e.g., http://localhost/1/3.php?id=abc), you'll receive the record you could expect.

Now specify IDs that are missing from the database or contain characters invalid in a file name (in the file allocation table, or FAT).

Try the following requests:

  • http://localhost/1/3.php?id=999

  • http://localhost/1/3.php?id=abcde

  • http://localhost/1/3.php?id=%3F

  • http://localhost/1/3.php?id=%3C

  • http://localhost/1/3.php?id=%7C

Note that the sequences %3F, %3C, and %7C code the characters ?, <, and I, respectively. So, these characters are sent as IDs.

As you can see, the system's responses are adequate. It returns an error message telling you that no record was found.

However, despite such a stable behavior, the script has a vulnerability related to how the file systems are designed.

Remember that some special character sequences are used to change the directory and that nothing prevents you from using them in file names. In a file name, such a sequence changes (or bypasses) a directory. The ../ sequence means "one level up." Look at how the script responds to it.

Suppose you know that the file TEST.TXT is located in the parent directory of the current subdirectory. You cannot access it using HTTP, but you're eager to get the contents of this file. Send the ../test sequence as a person's ID. Examine the code to find out what file will be checked for existence and the contents of which file will be sent to the browser. Obviously, this is DATA/../TEST.TXT. In other words, this is the desired file in the parent directory.

To test how this trick works, make the following request: http://localhost/1/3.php?id=../test. You'll see the contents of the file in the browser window. So, why did the protection let you read the file rather than return a message telling that the file hadn't been found? The reason is that the file is present in the system. What's more, this file name is valid for file functions such as file_exists() or fopen().

This is a crucial vulnerability. I'll try to explain the cause of this vulnerability. The system seems safe, all erroneous situations being excluded. Nevertheless, there is an obvious hole in the system.

The incorrect design specification is responsible for this hole. A perfect one would be as follows:

Write a script that displays the name of a person whose ID is entered. The data are stored in files that have names identical to IDs and the TXT extensions. For example, the data of a person whose ID is 3 are stored in the file Image from book 3.TXT. The ID is a sequence of digits, uppercase or lowercase letters, underscores, minuses, or periods. If an invalid ID is received, the script should return an error message.

You could specify more valid characters.

A script complying with the second design specification will be invulnerable to this type of attack.

The Main Principles of Secure Programming

I will now summarize the main principles of writing secure code and the main causes of vulnerabilities.

In fact, there is only one cause. A user can interfere between the browser and the server, and he or she can send illogical values of parameters to the server.

The principle that follows from this is simple: Don't trust the data received from outside the server.

A design specification for a script should be brief, but it should take into account all dangerous situations. A script that complies with a correct design specification will be invulnerable to Web attacks.

If a programmer decides to write a script on his or her own, or if a design specification is written by a person incompetent in security issues who uses the wrong terms, the programmer should write or at least keep in mind a detailed design specification that takes into account all security aspects.

All this entails the following principle: The security of a Web application should be thought out at the stage of writing design specification, before the first line of code is written.

A person who writes the design specification should be competent in Web security. He or she should clearly understand what data should be filtered and how. In addition, he or she should understand why a particular filtration is required.

From the next chapters, you'll learn what data can be considered safe, in what cases you should set data to correct values or halt script execution, how the data should be changed to use hidden features of a script, and how you can benefit by other people's programming flaws.

There are a few types of vulnerabilities that are entirely programmers' fault.

These vulnerabilities cannot be foreseen in a design specification mainly because they are specific to the programming language. As a rule, every programming language has features or functions that should be used carefully. Provision for these nuances is the responsibility of a programmer, not of a manager writing a design specification.

For example, in C and C++, such a slippery issue is the use printf(), strcpy(), and similar functions. They copy specified blocks of the memory without checking whether the copied data are within the allocated address space. However, this topic is beyond the scope of this book.

In PHP, a popular programming language for Web applications, a similar problem relates to automatic definition (registration) of global variables based on data received as GET, POST, and COOKIE parameters.

The next chapters describe how you can use vulnerabilities of this type, how you should eliminate them, and how you can write secure code.





더보기

댓글,