Introduction
Many websites have a form or a link you can use to download a file. You click a form button or click on a link and after a moment or two a file download dialog box pops-up in your web browser and prompts you for some instructions, such as “open” or “save“. I’m going to show you how to do that using a perl script.
What You Need
Any recent version of perl (5.06 or newer should be good) and a server to run the script on. A server that allows you to store files above the web root is preferable but not necessary. A little bit of prior HTML knowledge would be helpful but is not necessary. Typically you would upload the script into your cgi-bin folder and set the file permissions to 755. The folder where you upload the script to and the permissions may be different for your server.
The Perl Code
Just about all perl scripts that run as a CGI process need to start with what is called the shebang line. The most common shebang line is:
Expand|Select|Wrap|Line Numbers
- #!/usr/bin/perl
Expand|Select|Wrap|Line Numbers
- #!/usr/bin/perl -T
Modules
Modules are sort of like separate perl programs you can use in your perl program. Many people have written modules that have become standards that other perl programmers use all the time. We will be using these modules:
Expand|Select|Wrap|Line Numbers
- use strict;
- use warnings;
- use CGI;
- # Uncomment the next line only for debugging the script.
- #use CGI::Carp qw/fatalsToBrowser/;
The next two lines in the program establish some important parameters:
Expand|Select|Wrap|Line Numbers
- $CGI::POST_MAX = 1024;
- $CGI::DISABLE_UPLOADS = 1;
Setting Paths and Options
Expand|Select|Wrap|Line Numbers
- ####################################
- #### User Configuration Section ####
- ####################################
- # The path to where the downloadable files are.
- # Preferably this should be above the web root folder.
- my $path_to_files = '/home/user/downloads/';
- # The path to the error log file
- my $error_log = '/home/user/downloads/logs/errors.txt';
- # Option to log errors: 1 = yes, 0 = no
- my $log = 1;
- # To prevent hot-linking to your script
- my $url = 'http://www.yoursite.com';
- ########################################
- #### End User Configuration Section ####
- ########################################
$error_log is the path to the errors.txt file that logs errors generated by the script.
$log turns the error log on or off.
$url should be the name of your website including the “http://” part.
Create the CGI object
Expand|Select|Wrap|Line Numbers
- my $q = CGI->new;
In reality, the CGI module has many “commands” you can give to the “butler”. We will use but a few of them. Learning to use the CGI module is almost like learning a small programming language. But the beauty is you only need to know what the commands do, not how they do it. Just like a real butler you have to trust that he knows what he is doing and will get the job done efficiently and effectively without looking over his shoulder. I recommend you take the time to read the CGI modules documentation, even if you don’t understand much of it, you should at least be familiar with the basic form processing methods. I leave that up to you.
Security Checkpoint
Never underestimate the need for security when running scripts as a CGI. We are going to use three “checkpoints” to detect any suspicious activity. The first is going to check the amount of data sent to the script. We give the cgi_error() command to our trusty butler “$q” and he returns a response, $error. “413” indicates the limit we set for $CGI::POST_MAX has been exceeded, so we are going to check for that response. Note: I use command and method interchangeably throughout the article to mean the same thing.
Expand|Select|Wrap|Line Numbers
- if (my $error = $q->cgi_error()){
- if ($error =~ /^413\b/o) {
- error('Maximum data limit exceeded.');
- }
- else {
- error('An unknown error has occured.');
- }
- }
Expand|Select|Wrap|Line Numbers
- if ($ENV{'CONTENT_TYPE'} =~ m|^multipart/form-data|io ) {
- error('Invalid Content-Type : multipart/form-data.')
- }
Expand|Select|Wrap|Line Numbers
- if ($ENV{'HTTP_REFERER'} && $ENV{'HTTP_REFERER'} !~ m|^\Q$url|io) {
- error('Access forbidden.')
- }
I am going to use the Vars method to get all the parameters sent to the script into a hash. Once again, we call on “$q” to do the actual work.
Expand|Select|Wrap|Line Numbers
- my %IN = $q->Vars;
Expand|Select|Wrap|Line Numbers
- my $file = $IN{'file'} or error('No file selected.');
You can’t say it enough, all data sent to a CGI script has to be validated. If we allowed just any thing to be sent to the script someone could send something like this: /foo/bar and depending on the path you append that to, the script will obediently go find the foo directory and download the bar file. There are of course much worse things a person could try, but this is not an article about how to hack into a website using the front door. To prevent the user from getting away with such a dangerous stunt we need to validate the data sent to the script.
Expand|Select|Wrap|Line Numbers
- if ($file =~ /^(\w+[\w.-]+\.\w+)$/) {
- $file = $1;
- }
- else {
- error('Invalid characters in filename.');
- }
The above code is also “untainting” the data. Since the data will be used to open a file on the server we must untaint it to satisfy the –T switch that we are not doing anything insecure. The only way to untaint data is to use a regexp. The parentheses in the regexp store the pattern match in memory, we get that value using $1. We then assign the value back to our variable $file and now the data we will use to open the file is internal to our script and the –T switch will consider it safe to use. It’s up to you to know that your validation/filtering is sufficient for the task. If, for example, you used this pattern in the: regexp /(.*)/ the –T switch will not complain but the data will be passed into the script just like it was entered in the form or sent via a hyperlink. That would be a silly thing to do.
If the data does not pass the validation routine a message is sent to the error subroutine and the user is alerted.
Ready for Downloading
Expand|Select|Wrap|Line Numbers
- download($file) or error('An unknown error has occured.');
The download() Subroutine
Expand|Select|Wrap|Line Numbers
- sub download {
- my $file = $_[0] or return(0);
- # Uncomment the next line only for debugging the script
- #open(my $DLFILE, '<', "$path_to_files/$file") or die "Can't open file '$path_to_files/$file' : $!";
- # Comment the next line if you uncomment the above line
- open(my $DLFILE, '<', "$path_to_files/$file") or return(0);
- # this prints the download headers with the file size included
- # so you get a progress bar in the dialog box that displays during file downloads.
- print $q->header(-type => 'application/x-download',
- -attachment => $file,
- -Content_length => -s "$path_to_files/$file",
- );
- binmode $DLFILE;
- print while <$DLFILE>;
- undef ($DLFILE);
- return(1);
- }
Expand|Select|Wrap|Line Numbers
- print $q->header(-type => 'application/x-download',
- -attachment => $file,
- -Content_length => -s "$path_to_files/$file",
- );
“attachment” option defines the name of the file being downloaded. You could give the file any name you wanted to, it does not have to be the actual filename. That can be useful if you have a reason to hide the real name of the file or needed to give the downloaded file a name other than the real name. The “Content-length” option uses the –s file test operator to get the size of the file. This allows the file download dialog box to display the file size and a progress bar and estimate the time remaining to complete the file download.
The last four lines of the subroutine complete the download process.
Expand|Select|Wrap|Line Numbers
- binmode $DLFILE;
- print while <$DLFILE>;
- undef ($DLFILE);
- return(1);
Subroutines
The “error” subroutine is very simple. It uses a few html generating methods to print a basic html document that displays the error messages we send to it stored in $_[0]. Each of these methods are discussed in the CGI modules documentation. If you have error logging turned on the “log_error” function is also called.
Expand|Select|Wrap|Line Numbers
- sub error {
- print $q->header(-type=>'text/html'),
- $q->start_html(-title=>'Error'),
- $q->h3("Error: $_[0]"),
- $q->end_html;
- log_error($_[0]) if $log;
- exit(0);
- }
Expand|Select|Wrap|Line Numbers
- sub log_error {
- my $error = $_[0];
- # Uncomment the next line only for debugging the script
- #open (my $log, ">>", $error_log) or die "Can't open error log: $!";
- # Comment the next line if you uncomment the above line
- open (my $log, ">>", $error_log) or return(0);
- flock $log,2;
- my $params = join(':::', map{"$_=$IN{$_}"} keys %IN) || 'no params';
- print $log '"', join('","',time,
- scalar localtime(),
- $ENV{'REMOTE_ADDR'},
- $ENV{'SERVER_NAME'},
- $ENV{'HTTP_HOST'},
- $ENV{'HTTP_REFERER'},
- $ENV{'HTTP_USER_AGENT'},
- $ENV{'SCRIPT_NAME'},
- $ENV{'REQUEST_METHOD'},
- $params,
- $error),
- "\"\n";
- }
The only thing left to decide is how to allow visitors to your site to access the download script. You could use hyperlinks or an HTML form or some combination of the two. You could have another script (or even the same script) generate the interface. The basic concept is to pass the download script the name of a file to download. An example using a hyperlink:
Expand|Select|Wrap|Line Numbers
- <a href=”cgi-bin/download.pl?file=frog.jpg”>Download the Frog Image</a>
Conclusion
This is a pretty basic script. You could add more functionality to the script, for example, add a counter file that tracks the number of times each file is downloaded. You could add authentication so your users would have to login to download files. You could tie the script to a database instead of having files stored on the server.
Kevin (aka KevinADC)
This article is protected under the Creative Commons License
Resources
Perldoc Website All the perl documentation online.
CGI.pm The CGI module documentation (on perldoc).
Search CPAN Comprehensive Perl Archive Network. A gigantic repository of perl modules
and more.
CGI Security A primer on CGI security.
The Complete Script
Expand|Select|Wrap|Line Numbers
- #!/usr/bin/perl -T
- ## Load pragmas and modules
- use strict;
- use warnings;
- use CGI;
- # Uncomment the next line only for debugging the script.
- #use CGI::Carp qw/fatalsToBrowser/;
- # The next two lines are very important. Do not modify them
- # if you do not understand what they do.
- $CGI::POST_MAX = 1024;
- $CGI::DISABLE_UPLOADS = 1;
- ####################################
- #### User Configuration Section ####
- ####################################
- # The path to where the downloadable files are.
- # Prefereably this should be above the web root folder.
- my $path_to_files = '/home/user/downloads/';
- # The path to the error log file
- my $error_log = '/home/user/downloads/logs/errors.txt';
- # Option to log errors: 1 = yes, 0 = no
- my $log = 1;
- # To prevent hot-linking to your script
- my $url = 'http://www.yoursite.com';
- ####################################
- ## End User Configuration Section ##
- ####################################
- # Edit below here at your own risk
- my $q = CGI->new;
- ######################################
- ## This section checks for a number ##
- ## of possible errors or suspicious ##
- ## activity. ##
- ######################################
- # check to see if data limit is exceeded
- if (my $error = $q->cgi_error()){
- if ($error =~ /^413\b/o) {
- error('Maximum data limit exceeded.');
- }
- else {
- error('An unknown error has occured.');
- }
- }
- # Check to see if the content-type is acceptable.
- # multipart/form-data indicates someone is trying
- # to upload data to the script with a hacked form.
- # $CGI_DISABLE_UPLOADS prevents uploads. This routine
- # is to catch the attempt and log it.
- if ($ENV{'CONTENT_TYPE'} =~ m|^multipart/form-data|io ) {
- error('Invalid Content-Type : multipart/form-data.')
- }
- # Check if the request came from your website, if not
- # it indicates remote access or hot linking.
- if ($ENV{'HTTP_REFERER'} && $ENV{'HTTP_REFERER'} !~ m|^\Q$url|io) {
- error('Access forbidden.')
- }
- ################################
- ## End error checking section ##
- ################################
- # Get the data sent to the script.
- my %IN = $q->Vars;
- # Parse the "file" paramater sent to the script.
- my $file = $IN{'file'} or error('No file selected.');
- # Here we untaint the filename and make sure there are no characters like '/'
- # in the name that could be used to download files from any folder on the website.
- if ($file =~ /^(\w+[\w.-]+\.\w+)$/o) {
- $file = $1;
- }
- else {
- error('Invalid characters in filename.');
- }
- # Check if the download succeeded
- download($file) or error('An unknown error has occured.');
- #################
- ## SUBROUTINES ##
- #################
- # download the file
- sub download {
- my $file = $_[0] or return(0);
- # Uncomment the next line only for debugging the script
- #open(my $DLFILE, '<', "$path_to_files/$file") or die "Can't open file '$path_to_files/$file' : $!";
- # Comment the next line if you uncomment the above line
- open(my $DLFILE, '<', "$path_to_files/$file") or return(0);
- # This prints the download headers with the file size included
- # so you get a progress bar in the dialog box that displays during file downlaods.
- print $q->header(-type => 'application/x-download',
- -attachment => $file,
- 'Content-length' => -s "$path_to_files/$file",
- );
- binmode $DLFILE;
- print while <$DLFILE>;
- undef ($DLFILE);
- return(1);
- }
- # This is a very generic error page. You should make a better one.
- sub error {
- print $q->header(-type=>'text/html'),
- $q->start_html(-title=>'Error'),
- $q->h3("Error: $_[0]"),
- $q->end_html;
- log_error($_[0]) if $log;
- exit(0);
- }
- # Log the error to a file
- sub log_error {
- my $error = $_[0];
- # Uncomment the next line only for debugging the script
- #open (my $log, ">>", $error_log) or die "Can't open error log: $!";
- # Comment the next line if you uncomment the above line
- open (my $log, ">>", $error_log) or return(0);
- flock $log,2;
- my $params = join(':::', map{"$_=$IN{$_}"} keys %IN) || 'no params';
- print $log '"', join('","',time,
- scalar localtime(),
- $ENV{'REMOTE_ADDR'},
- $ENV{'SERVER_NAME'},
- $ENV{'HTTP_HOST'},
- $ENV{'HTTP_REFERER'},
- $ENV{'HTTP_USER_AGENT'},
- $ENV{'SCRIPT_NAME'},
- $ENV{'REQUEST_METHOD'},
- $params,
- $error),
- "\"\n";
- }