By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,519 Members | 2,258 Online
Bytes IT Community
Submit an Article
Got Smarts?
Share your bits of IT knowledge by writing an article on Bytes.

How to Make a File Download Script with Perl

KevinADC
Expert 2.5K+
P: 4,059
Note: You may skip to the end of the article if all you want is the perl code.

Introduction

Many websites have a form or a link you can use to download a file. You click a form button or click on a link and after a moment or two a file download dialog box pops-up in your web browser and prompts you for some instructions, such as “open” or “save“. I’m going to show you how to do that using a perl script.

What You Need

Any recent version of perl (5.06 or newer should be good) and a server to run the script on. A server that allows you to store files above the web root is preferable but not necessary. A little bit of prior HTML knowledge would be helpful but is not necessary. Typically you would upload the script into your cgi-bin folder and set the file permissions to 755. The folder where you upload the script to and the permissions may be different for your server.

The Perl Code

Just about all perl scripts that run as a CGI process need to start with what is called the shebang line. The most common shebang line is:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
It simply tells the server where to find perl. The shebang line your server requires might be different. Most web hosts will have that information posted on their site somewhere. In the interest of good perl coding practices and CGI security we are going to add a switch to the shebang line: -T. Note: it must be an uppercase T.


Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -T
The T stands for "taint" mode. This is really to prevent you, as the programmer of the script, from making a terrible mistake and allowing the users of your CGI form to send data to the server that can be used in an insecure way. All perl scripts that run as a CGI process should use the -T switch so I include it for that reason.

Modules

Modules are sort of like separate perl programs you can use in your perl program. Many people have written modules that have become standards that other perl programmers use all the time. We will be using these modules:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use CGI;
  4. # Uncomment the next line only for debugging the script.
  5. #use CGI::Carp qw/fatalsToBrowser/;
  6.  
The first two are not actually modules they are pragmas. They affect the way perl itself functions. I’m not going to explain them for the purpose of this article. You’ll need to trust me that they are important to use in nearly all of your perl programs. The "CGI" module is the module that will do most of the work for us: Process form data, print http headers, and more. The "CGI::Carp" module is really for debugging and may help you to get your script running if you have problems. If there are any fatal errors that cause the script to fail, it will print an error message to the screen. These are the same errors that will be printed in the server error log too.

The next two lines in the program establish some important parameters:

Expand|Select|Wrap|Line Numbers
  1. $CGI::POST_MAX = 1024;
  2. $CGI::DISABLE_UPLOADS = 1;
  3.  
“POST_MAX” sets the maximum limit in bytes of how much data will be considered too much and cause the script to return an error. I have set this limit low (1 kb) because this script will need little data sent to it to work. The second line tells the script to not accept file uploads. Makes sense since we want to download files, not upload them. This prevents users from attempting to use an altered form to send files to your script. All forms can be saved and the HTML code changed and the user can send anything he wants to your script, it’s up to you to prevent this on the server end. What the user does on their end is entirely out of your control.

Setting Paths and Options

Expand|Select|Wrap|Line Numbers
  1. ####################################
  2. #### User Configuration Section ####
  3. ####################################
  4.  
  5. # The path to where the downloadable files are. 
  6. # Preferably this should be above the web root folder.
  7. my $path_to_files = '/home/user/downloads/';
  8.  
  9. # The path to the error log file
  10. my $error_log     = '/home/user/downloads/logs/errors.txt';
  11.  
  12. # Option to log errors: 1 = yes, 0 = no
  13. my $log           = 1;
  14.  
  15. # To prevent hot-linking to your script
  16. my $url = 'http://www.yoursite.com';
  17. ########################################
  18. #### End User Configuration Section ####
  19. ########################################
  20.  
$path_to_files is the directory where you store the files to be downloaded. I recommend you store them in a folder that is not web accessible. This is commonly done by putting them in a folder parallel to your root web folder (public_html or www) or above it.

$error_log is the path to the errors.txt file that logs errors generated by the script.

$log turns the error log on or off.

$url should be the name of your website including the “http://” part.

Create the CGI object

Expand|Select|Wrap|Line Numbers
  1. my $q = CGI->new;
$q is the object we will use to execute various methods of the CGI module. I like to think of it as a butler. You tell the butler what you want and he knows how to get it done, you don’t have to worry about the details. Our “butler”, $q, will know what to do with the “commands” we will give him.

In reality, the CGI module has many “commands” you can give to the “butler”. We will use but a few of them. Learning to use the CGI module is almost like learning a small programming language. But the beauty is you only need to know what the commands do, not how they do it. Just like a real butler you have to trust that he knows what he is doing and will get the job done efficiently and effectively without looking over his shoulder. I recommend you take the time to read the CGI modules documentation, even if you don’t understand much of it, you should at least be familiar with the basic form processing methods. I leave that up to you.

Security Checkpoint

Never underestimate the need for security when running scripts as a CGI. We are going to use three “checkpoints” to detect any suspicious activity. The first is going to check the amount of data sent to the script. We give the cgi_error() command to our trusty butler “$q” and he returns a response, $error. “413” indicates the limit we set for $CGI::POST_MAX has been exceeded, so we are going to check for that response. Note: I use command and method interchangeably throughout the article to mean the same thing.

Expand|Select|Wrap|Line Numbers
  1. if (my $error = $q->cgi_error()){
  2.    if ($error =~ /^413\b/o) {
  3.       error('Maximum data limit exceeded.');
  4.    }
  5.    else {
  6.       error('An unknown error has occured.'); 
  7.    }
  8. }
  9.  
Next we check to see if someone has tried to upload a file to the script. “multi-part/form-data” must be used in a CGI forms “encypt” attribute in order to send files.

Expand|Select|Wrap|Line Numbers
  1. if ($ENV{'CONTENT_TYPE'} =~ m|^multipart/form-data|io ) {
  2.    error('Invalid Content-Type : multipart/form-data.')
  3. }
  4.  
Next we check that the request to use the script comes from your website.

Expand|Select|Wrap|Line Numbers
  1. if ($ENV{'HTTP_REFERER'} && $ENV{'HTTP_REFERER'} !~ m|^\Q$url|io) {
  2.    error('Access forbidden.')
  3. }
Get the Filename

I am going to use the Vars method to get all the parameters sent to the script into a hash. Once again, we call on “$q” to do the actual work.

Expand|Select|Wrap|Line Numbers
  1. my %IN = $q->Vars;
Now we make sure there is a parameter named “file”.

Expand|Select|Wrap|Line Numbers
  1. my $file = $IN{'file'} or error('No file selected.');
Validate, Validate, Validate

You can’t say it enough, all data sent to a CGI script has to be validated. If we allowed just any thing to be sent to the script someone could send something like this: /foo/bar and depending on the path you append that to, the script will obediently go find the foo directory and download the bar file. There are of course much worse things a person could try, but this is not an article about how to hack into a website using the front door. To prevent the user from getting away with such a dangerous stunt we need to validate the data sent to the script.

Expand|Select|Wrap|Line Numbers
  1. if ($file =~ /^(\w+[\w.-]+\.\w+)$/) {
  2.    $file = $1;
  3. }
  4. else {
  5.    error('Invalid characters in filename.');
  6. }    
  7.  
The cryptic looking part of that code, ($file =~ /^(\w+[\w.-]+\.\w+)$/), is called a regular expression (regexp). Typically a regexp is what you would use to validate/filter form data. Regular expressions are way beyond the scope of this article. If you are interested to understand that regexp you will have to read some regexp tutorials. See the online resources at the end of the article. Basically it is checking that the data is something like this: frog.gif, or puppy-dog.jpg, or meatloaf.txt. It checks for a restricted set of characters “a-zA-Z0-9_-.”, in a basic filename format, filename.ext, and rejects anything else as invalid.

The above code is also “untainting” the data. Since the data will be used to open a file on the server we must untaint it to satisfy the –T switch that we are not doing anything insecure. The only way to untaint data is to use a regexp. The parentheses in the regexp store the pattern match in memory, we get that value using $1. We then assign the value back to our variable $file and now the data we will use to open the file is internal to our script and the –T switch will consider it safe to use. It’s up to you to know that your validation/filtering is sufficient for the task. If, for example, you used this pattern in the: regexp /(.*)/ the –T switch will not complain but the data will be passed into the script just like it was entered in the form or sent via a hyperlink. That would be a silly thing to do.

If the data does not pass the validation routine a message is sent to the error subroutine and the user is alerted.

Ready for Downloading

Expand|Select|Wrap|Line Numbers
  1. download($file) or error('An unknown error has occured.');
If the file download fails a message is sent to the error subroutine and the user is alerted.

The download() Subroutine

Expand|Select|Wrap|Line Numbers
  1. sub download {
  2.    my $file = $_[0] or return(0);
  3.  
  4.    # Uncomment the next line only for debugging the script 
  5.    #open(my $DLFILE, '<', "$path_to_files/$file") or die "Can't open file '$path_to_files/$file' : $!";
  6.  
  7.    # Comment the next line if you uncomment the above line 
  8.    open(my $DLFILE, '<', "$path_to_files/$file") or return(0);
  9.  
  10.    # this prints the download headers with the file size included
  11.    # so you get a progress bar in the dialog box that displays during file downloads. 
  12.    print $q->header(-type            => 'application/x-download',
  13.                     -attachment      => $file,
  14.                     -Content_length  => -s "$path_to_files/$file",
  15.    );
  16.  
  17.    binmode $DLFILE;
  18.    print while <$DLFILE>;
  19.    undef ($DLFILE);
  20.    return(1);
  21. }
  22.  
The first line of the subroutine gets the filename or returns 0 (zero) back to the caller to indicate failure. There are two lines that open the file, one is for debugging purposes and one is for running the script when all is working properly. The next section of the code prints the headers that cause the web browser to download the file instead of trying to display it.

Expand|Select|Wrap|Line Numbers
  1.    print $q->header(-type            => 'application/x-download',
  2.                     -attachment      => $file,
  3.                     -Content_length  => -s "$path_to_files/$file",
  4.    );
  5.  
The “type” option in the header() method is the specific header that causes the download. The
“attachment” option defines the name of the file being downloaded. You could give the file any name you wanted to, it does not have to be the actual filename. That can be useful if you have a reason to hide the real name of the file or needed to give the downloaded file a name other than the real name. The “Content-length” option uses the –s file test operator to get the size of the file. This allows the file download dialog box to display the file size and a progress bar and estimate the time remaining to complete the file download.


The last four lines of the subroutine complete the download process.

Expand|Select|Wrap|Line Numbers
  1.    binmode $DLFILE;
  2.    print while <$DLFILE>;
  3.    undef ($DLFILE);
  4.    return(1);
  5.  
The binmode() function tells perl to transfer the file in “binary” mode. There is a small chance that using binary mode will corrupt the file on the receiving end. But in general there is no problem using it and in some cases it is necessary. If you experience problems when using binmode, remove or comment out the line. See the binmode functions documentation for more details. The “print” line is what actually transfers the file from the server to the client. “undef” closes the file because I used an indirect filehandle. We return 1 (one) at the end of the subroutine to indicate success.

Subroutines

The “error” subroutine is very simple. It uses a few html generating methods to print a basic html document that displays the error messages we send to it stored in $_[0]. Each of these methods are discussed in the CGI modules documentation. If you have error logging turned on the “log_error” function is also called.

Expand|Select|Wrap|Line Numbers
  1. sub error {
  2.    print $q->header(-type=>'text/html'),
  3.          $q->start_html(-title=>'Error'),
  4.          $q->h3("Error: $_[0]"),
  5.          $q->end_html;
  6.    log_error($_[0]) if $log;
  7.    exit(0);
  8. }
  9.  
Next is the “log_error” subroutine. Each error the script detects can be logged so you can see how visitors to your site are misusing the script. This is good information to keep track of. It might be overkill, but I am a great believer in tracking errors since they can help you write more secure scripts and alert you to bots or people trying to abuse the script. It appends the errors and some other information to a file. I personally like to record the name/value pairs that are sent to the script to see if the form or query string has been altered by the user. Those values will be in $params, formatted like so: “name=”value:::name=value:::name=value”. “scalar localtime()” is a convenience to you so you can easily read the date/time of the error. “time” records the date/time in epoch seconds which is a standard way of recording the date/time so computer programs and scripts can make sense of it. Its ultimately up to you to decide what, if anything, to do with this information. I suggest you check the error log once in a while. You can delete it and the script will create a new one. Or turn off error logging entirely in the User Configuration Section of the script.

Expand|Select|Wrap|Line Numbers
  1. sub log_error {
  2.    my $error = $_[0];
  3.  
  4.    # Uncomment the next line only for debugging the script
  5.    #open (my $log, ">>", $error_log) or die "Can't open error log: $!";
  6.  
  7.    # Comment the next line if you uncomment the above line
  8.    open (my $log, ">>", $error_log) or return(0);
  9.  
  10.    flock $log,2;
  11.    my $params = join(':::', map{"$_=$IN{$_}"} keys %IN) || 'no params';
  12.    print $log '"', join('","',time, 
  13.                       scalar localtime(),
  14.                       $ENV{'REMOTE_ADDR'},
  15.                       $ENV{'SERVER_NAME'},
  16.                       $ENV{'HTTP_HOST'},
  17.                       $ENV{'HTTP_REFERER'},
  18.                       $ENV{'HTTP_USER_AGENT'},
  19.                       $ENV{'SCRIPT_NAME'},
  20.                       $ENV{'REQUEST_METHOD'},
  21.                       $params,
  22.                       $error),
  23.                       "\"\n";
  24. }
  25.  
The Interface

The only thing left to decide is how to allow visitors to your site to access the download script. You could use hyperlinks or an HTML form or some combination of the two. You could have another script (or even the same script) generate the interface. The basic concept is to pass the download script the name of a file to download. An example using a hyperlink:

Expand|Select|Wrap|Line Numbers
  1. <a href=”cgi-bin/download.pl?file=frog.jpg”>Download the Frog Image</a>
I leave it up to you to discover other ways of creating an interface to the download script.

Conclusion

This is a pretty basic script. You could add more functionality to the script, for example, add a counter file that tracks the number of times each file is downloaded. You could add authentication so your users would have to login to download files. You could tie the script to a database instead of having files stored on the server.

Kevin (aka KevinADC)

This article is protected under the Creative Commons License

Resources

Perldoc Website All the perl documentation online.
CGI.pm The CGI module documentation (on perldoc).
Search CPAN Comprehensive Perl Archive Network. A gigantic repository of perl modules
and more.
CGI Security A primer on CGI security.

The Complete Script

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -T
  2.  
  3. ## Load pragmas and modules
  4. use strict;
  5. use warnings;
  6. use CGI;
  7. # Uncomment the next line only for debugging the script.
  8. #use CGI::Carp qw/fatalsToBrowser/;
  9.  
  10. # The next two lines are very important. Do not modify them
  11. # if you do not understand what they do.
  12. $CGI::POST_MAX = 1024;
  13. $CGI::DISABLE_UPLOADS = 1; 
  14.  
  15.  
  16. ####################################
  17. #### User Configuration Section ####
  18. ####################################
  19.  
  20. # The path to where the downloadable files are. 
  21. # Prefereably this should be above the web root folder.
  22. my $path_to_files = '/home/user/downloads/';
  23.  
  24. # The path to the error log file
  25. my $error_log     = '/home/user/downloads/logs/errors.txt';
  26.  
  27. # Option to log errors: 1 = yes, 0 = no
  28. my $log           = 1;
  29.  
  30. # To prevent hot-linking to your script
  31. my $url = 'http://www.yoursite.com';
  32.  
  33. ####################################
  34. ## End User Configuration Section ##
  35. ####################################
  36.  
  37. # Edit below here at your own risk
  38.  
  39. my $q = CGI->new;
  40.  
  41. ######################################
  42. ## This section checks for a number ##
  43. ## of possible errors or suspicious ##
  44. ## activity.                        ##
  45. ######################################
  46.  
  47. # check to see if data limit is exceeded
  48. if (my $error = $q->cgi_error()){
  49.    if ($error =~ /^413\b/o) {
  50.       error('Maximum data limit exceeded.');
  51.    }
  52.    else {
  53.       error('An unknown error has occured.'); 
  54.    }
  55. }
  56.  
  57. # Check to see if the content-type is acceptable.
  58. # multipart/form-data indicates someone is trying
  59. # to upload data to the script with a hacked form.
  60. # $CGI_DISABLE_UPLOADS prevents uploads. This routine
  61. # is to catch the attempt and log it. 
  62. if ($ENV{'CONTENT_TYPE'} =~ m|^multipart/form-data|io ) {
  63.    error('Invalid Content-Type : multipart/form-data.')
  64. }       
  65.  
  66. # Check if the request came from your website, if not
  67. # it indicates remote access or hot linking.
  68. if ($ENV{'HTTP_REFERER'} && $ENV{'HTTP_REFERER'} !~ m|^\Q$url|io) {
  69.    error('Access forbidden.')
  70. }
  71.  
  72. ################################
  73. ## End error checking section ##
  74. ################################
  75.  
  76. # Get the data sent to the script.
  77. my %IN = $q->Vars;
  78.  
  79. # Parse the "file" paramater sent to the script.
  80. my $file = $IN{'file'} or error('No file selected.');
  81.  
  82. # Here we untaint the filename and make sure there are no characters like '/' 
  83. # in the name that could be used to download files from any folder on the website.
  84. if ($file =~ /^(\w+[\w.-]+\.\w+)$/o) {
  85.    $file = $1;
  86. }
  87. else {
  88.    error('Invalid characters in filename.');
  89. }    
  90.  
  91. # Check if the download succeeded
  92. download($file) or error('An unknown error has occured.');
  93.  
  94.  
  95. #################
  96. ## SUBROUTINES ##
  97. #################
  98.  
  99. # download the file
  100. sub download {
  101.    my $file = $_[0] or return(0);
  102.  
  103.    # Uncomment the next line only for debugging the script 
  104.    #open(my $DLFILE, '<', "$path_to_files/$file") or die "Can't open file '$path_to_files/$file' : $!";
  105.  
  106.    # Comment the next line if you uncomment the above line 
  107.    open(my $DLFILE, '<', "$path_to_files/$file") or return(0);
  108.  
  109.    # This prints the download headers with the file size included
  110.    # so you get a progress bar in the dialog box that displays during file downlaods. 
  111.    print $q->header(-type            => 'application/x-download',
  112.                     -attachment      => $file,
  113.                     'Content-length' => -s "$path_to_files/$file",
  114.    );
  115.  
  116.    binmode $DLFILE;
  117.    print while <$DLFILE>;
  118.    undef ($DLFILE);
  119.    return(1);
  120. }
  121.  
  122. # This is a very generic error page. You should make a better one.
  123. sub error {
  124.    print $q->header(-type=>'text/html'),
  125.          $q->start_html(-title=>'Error'),
  126.          $q->h3("Error: $_[0]"),
  127.          $q->end_html;
  128.    log_error($_[0]) if $log;
  129.    exit(0);
  130. }
  131.  
  132. # Log the error to a file
  133. sub log_error {
  134.    my $error = $_[0];
  135.  
  136.    # Uncomment the next line only for debugging the script
  137.    #open (my $log, ">>", $error_log) or die "Can't open error log: $!";
  138.  
  139.    # Comment the next line if you uncomment the above line
  140.    open (my $log, ">>", $error_log) or return(0);
  141.  
  142.    flock $log,2;
  143.    my $params = join(':::', map{"$_=$IN{$_}"} keys %IN) || 'no params';
  144.    print $log '"', join('","',time, 
  145.                       scalar localtime(),
  146.                       $ENV{'REMOTE_ADDR'},
  147.                       $ENV{'SERVER_NAME'},
  148.                       $ENV{'HTTP_HOST'},
  149.                       $ENV{'HTTP_REFERER'},
  150.                       $ENV{'HTTP_USER_AGENT'},
  151.                       $ENV{'SCRIPT_NAME'},
  152.                       $ENV{'REQUEST_METHOD'},
  153.                       $params,
  154.                       $error),
  155.                       "\"\n";
  156. }
Dec 4 '08 #1
Share this Article
Share on Google+
1 Comment


KevinADC
Expert 2.5K+
P: 4,059
Comments, questions, or discussions welcome.
Dec 4 '08 #2