Note: You may skip to the end of the article if all you want is the perl code.
Introduction
Uploading files from a local computer to a remote web server has many useful purposes, the most obvious of which is the sharing of files. For example, you upload images to a server to share them with other people over the Internet. Perl comes ready equipped for uploading files via the CGI.pm module, which has long been a core module and allows users to do many things without the need to understand very much of the underlying technology.
This article is not a perl tutorial per se. I am not going to go into much of the behind the scenes details that makes it all work. Some sections will have to include short explanations and of course perl code, but the goal is not to teach the reader perl programming or general Internet concepts. I assume you have some experience with HTML and maybe have uploaded and installed a perl script or two in the past and know how to do that. Consider this article more of a "How To" lesson.
The perl script I am going to introduce you to uses only core modules that come with perl so there is no need for you to install any modules. All you will need is any fairly recent version of perl. Of course perl has to be installed on the server where you will run the script. There is also a list of online resources you can access for more general information concerning many of the details that will be discussed in this article.
Getting Started
First, you will need a form. You see them all the time on the Internet. You fill in some information and click a button to send data to a server. That is a form. The precise name is a CGI (Common Gateway Interface) form. CGI is a protocol, which simply means a set of rules that allows your computer to communicate with a remote server. As long as the server is setup to accept CGI form data, and most are, then all is well. You really don't need to know any of that, so don't worry if it sounds confusing.
The CGI Form
Following is a very simple CGI form: one file field, where the user chooses a file to upload, and one text field, where the user will enter their email address. I include the text field just as an example, it is not required if all you want is a file upload field. You can optionally add many other form fields, and even more "file" fields, that will send data to the server all at the same time.
-
<FORM ACTION="/cgi-bin/upload.pl" METHOD="post" ENCTYPE="multipart/form-data">
-
<INPUT TYPE="file" NAME="photo">
-
<INPUT TYPE="text" NAME="email">
-
<INPUT TYPE="submit" NAME="Submit" VALUE="Submit Form">
-
</FORM>
-
The above form would be embedded in an HTML document with all the appropriate tags that an HTML document should have. If you are unsure what those are you should read a basic HTML (or web page) tutorial. For a file upload script the ENCTYPE="multipart/form-data" and the TYPE="file" are the most important. The ACTION="/cgi-bin/upload.pl" tells the form where to send the data. In this case to your perl script named upload.pl in your cgi-bin folder where most cgi/perl scripts should be stored on a web server. Save the form as a web page named upload.html, or any name you prefer and upload it to your web host account. If you do not have a web host account there is not much need to continue from this point.
The CGI Script
Just about all perl scripts that run as a CGI process need to start with what is called the shebang line. The most common shebang line is:
It simply tells the server where to find perl. The shebang line your server requires might be different. Most web hosts will have that information posted on their site somewhere. In the interest of good perl coding practices and security we are going to add a switch to the shebang line: -T.
Note: it must be an uppercase T.
The T stands for "taint" mode. This is really to prevent you, as the programmer of the script, from making a terrible mistake and allowing the users of your CGI form to send data to the server that can be used in an insecure way. This file upload program is really very simple and will not allow users to do any such thing, but all perl scripts that run as a CGI process should use the -T switch so I include it for that reason.
Modules
Modules are sort of like separate perl programs you can use in your perl program. Many people have written modules that have become standards that other perl programmers use all the time. We will be using these modules:
-
use strict;
-
use warnings;
-
use CGI;
-
use CGI::Carp qw/fatalsToBrowser/;
-
use File::Basename;
-
The first two are not modules, they are pragmas. They affect the way perl itself functions. I am not going to explain them for the purpose of this article. You will have to trust me that they are important to use in nearly all of your perl programs. The "CGI" module is the module that will do all the heavy lifting for us. It will process the form data and store it on the server. The "CGI::Carp" module is really for debugging and may help you to get your script running if you have problems. If there are any fatal errors that cause the script to fail, it will print an error message to the screen. These are the same errors that will be printed in the server error log too. "File::Basename" is a module that can be used to split a filename/filepath into it's separate parts for easy processing by your script. We will use it to get just the filename and the file extension into our script.
Setting Some Limits
In my opinion, all CGI scripts should set a limit on how much data can be transferred to the server. Otherwise, some dork with a T1 line will start transferring gigabytes of data to your server and either make it unavailable to other users, or use up all your allotted disk space. Here is how we can set a limit using the CGI module:
-
$CGI::POST_MAX = 1024 * 5000; # adjust as needed (1024 * 5000 = 5MB)
-
$CGI::DISABLE_UPLOADS = 0; # 1 disables uploads, 0 enables uploads
-
The first line establishes the maximum amount of data (in bytes) the script will allow to be stored on the server. It's not perfect because it has to receive all the data before it knows there is too much. So a person can still try and send way more data than you want them to, but the script will reject it. But for the most part it works well enough. The second line allows you an easy way to completely disable file uploads. You may want to do this occasionally if you are doing some type of maintenance to your website and don't want people to upload files during that time.
Creating a new CGI Object or Hiring a Butler
Here is what is so great about using the CGI module (and many other modules). To access all the functions of the CGI module all you need to do is add this line to your script:
This creates the $query object that we will use for the rest of the script. The object is sort of our personal butler. We tell the butler what to do using simple commands and he goes and does it for us.
Note: $query can be any valid scalar name.
Note: CGI->new is not absolutely necessary. There is another way to use most of the functions of the CGI module called "standard". Read the CGI module documentation if you are interested.
Houston, We Have A Problem
I hope you know what that means, if not, don't worry about it. This next section of the code will check if the version of the CGI module is new enough to use in the manner we want to use it. Any server still using a version older than 2.47 should probably be shut down, but I include it to be complete.
-
unless ($CGI::VERSION >= 2.47) {
-
error('Your version of CGI.pm is too old. You must have version 2.47 or higher to use this script.')
-
}
-
}
-
If the version is too old the script sends an error message to the "error()" subroutine alerting the user of the error.
Where to Store Uploaded Files
This line might be your biggest cause for concern:
- my $upload_dir = '/home/you/public_html/uploads';
You have to determine what '/home/you/public_html/uploads' should be. Since you probably want the files to be visible to people that visit your website, you would want to place the files in a folder below the root web folder. Most servers callthe root web folder "public_html", some call it "www". The path to "public_html" is generally something like:
/home/you/public_html/
where "you" is your unique identifier. It might be part of your websites name or something different. Most web hosts will have that information posted on their website somewhere. Add "uploads" to the end:
/home/you/public_html/uploads
and that is the folder where file uploads will be stored. If you did not want people to see the files you would place them in a folder parrallel to the root web folder:
/home/you/uploads
or possibly above it:
/home/uploads Validating user Input or Keeping the Riff-Raff Out
The first rule of CGI programming is:
Never trust user input. Because of this lack of trust, all CGI scripts should do what is called validating user input. In our case we don't have much user input, just an email address and the file to upload. But you still have to check both of them. Filenames on the internet should really only consist of a limited group of characters, they are:
a thru z, A thru Z, 0 thru 9 _ (underscore) . (dot) - (dash)
written in perlish form: 'a-zA-Z0-9_.-'
We will use this list of characters to check (and change if necessary) filenames that are going to be uploaded. We define that list like so:
- my $filename_characters = 'a-zA-Z0-9_.-';
Note: Like much of perl, this could be done a number of different ways. I am trying to keep this fairly simple so I elected to use this method for readability and simplicity sake.
Reading in the Form Fields
Here we tell our butler, $query, to let our "visitors" into our "home".
-
my $file = $query->param("photo") or error('No file selected for upload.') ;
-
my $email_address = $query->param("email") || 'Annonymous';
-
Note how the names in parenthesis match the names in our form fields above:
-
<INPUT TYPE="file" NAME="photo">
-
<INPUT TYPE="text" NAME="email">
-
The case is also important. "Photo" is not the same as "photo". We put the values of the param() calls into some perl variables so we can alter them and use them later in the script.
If no file was selected to upload the script sends an error message to the "error()" subroutine alerting the user of the error. If no email address is entered $email_address will be assigned a value of "Annonymous".
Some browsers send the whole path to a file instead of just the filename. The next line will parse the filename and file extension (ie: .jpg .gif .txt, etc) from the filepath if necessary. "fileparse()" is a function of the File::Basename module:
- my ($filename,undef,$ext) = fileparse($file,qr{\..*});
If the browser sent something like "c:\windows\my documents\frog pictures\big frog!!!.gif" we will end up with only "big frog!!!" and ".gif".
The next lines validate and change the filename if necessary to make sure it complies with our list of characters:
-
# append extension to filename
-
$filename .= $ext;
-
-
# convert spaces to underscores "_"
-
$filename =~ tr/ /_/;
-
Now "big frog!!!.gif" will be "big_frog!!!.gif".
-
# remove illegal characters
-
$filename =~ s/[^$filename_characters]//g;
-
Now $filename will equal "big_frog.gif". Which is now our validated filename.
The next several lines will allow your script to use the filename to store the data on the server by "untainting" the filename. This is where the -T switch comes in the picture. Allowing user input to be used as a filename is considered insecure, and it is! I am not going to go into details. Suffice it to say, the perl program assumes that you, the programmer,knows what you are doing, and will not throw an error at you and abort the script for allowing insecure input to be used in an insecure way.
-
# satisfy taint checking
-
if ($filename =~ /^([$filename_characters]+)$/) {
-
$filename = $1;
-
}
-
If for some reason the filename did not meet the conditions above (and it always should) the "else" conditional will send an error message to the "error()" subroutine alerting the user of the error.
-
else{
-
error("The filename is not valid. Filenames can only contain these characters: $filename_characters")
-
}
-
The next "unless" condition does a very crude validation of the email address. To reliably validate an email address I recommend using the Email::Valid module. I did not include it in this script because it is not a core module.
-
unless ($email_address eq 'Annonymous' or ($email_address =~ /^[\w@.-]+$/ && length $email_address < 250)) {
-
error("The email address appears invalid or contains too many characters. Limit is 250 characters.")
-
}
-
Upload the File
If the script has got this far, it assumes all is good and will now upload the file to the destination directory and store it under the filename ($filename).
"$query->upload()" is the function that turns the file field name in the CGI form into a filehandle: <$upload_filehandle>.
- my $upload_filehandle = $query->upload("photo");
Here we open a new file in the $upload_dir using the filename.
- open (UPLOADFILE, ">$upload_dir/$filename") or error("Can't open/create \"$upload_dir/$filename\": $!");
This line tells perl to use binary mode on the filehandle:
The "while" loop reads in the file the user is uploading into the file we opened above.
-
while ( <$upload_filehandle> ) {
-
print UPLOADFILE;
-
}
-
Now close the file:
Print a "Thank you" message to alert the user all went well and display the image and the email address.
Our "butler", $query, handles all the HTML code printing chores for us using various commands.
-
print $query->header(),
-
$query->start_html(-title=>'Upload Successful'),
-
$query->p('Thanks for uploading your photo!'),
-
$query->p("Your email address: $email_address"),
-
$query->p("Your photo $filename:"),
-
$query->img({src=>"../uploads/$filename",alt=>''}),
-
$query->end_html;
-
The "error()" Subroutine
Last but not least is our only private subroutine. It simply prints the error messages sent to it from the script. Once again, $query handles the HTML code printing for us. The last line in the subroutine, exit(0), tells perl to end the script after printing the error message. If we did not include the exit() function perl would continue to try and process the rest of the script.
-
sub error {
-
my $error = shift;
-
print $query->header(),
-
$query->start_html(-title=>'Error'),
-
$error,
-
$query->end_html;
-
exit(0);
-
}
-
Review
As mentioned above, the first rule of CGI programming is: Never trust user input. CGI scripts should not allow users to send data that the script does not expect or handle data in insecure ways. CGI security is beyond the scope of this article but there is an online resource in the "Resources" section you can read for more details.
Uploading a file is fairly easy using the CGI module. Really, it's very easy once you are familiar with the many methods/functions the wide-ranging CGI module has to offer. With a small change to the code I wrote for this article you could allow users to upload multiple files at the same time and include lots of other CGI form data as well. The CGI module is one of the bigger perl modules there is and the documentation is extensive but can still be confusing to the novice perl programmer.
The script I posted can be used in a production environment, but keep in mind the email address field is not properly validated. As mentioned previously, use the Email::Valid module when you need to validate email addresses. You may need to install it, but that is a subject for another article.
Kevin (aka KevinADC)
Resources
Websites:
Perldoc Website All the perl documentation online.
Search CPAN Comprehensive Perl Archive Network. A gigantic repository of perl modules
and more.
CGI Security A primer on CGI security.
Perl Pragmas:
Strict The strict pragma documentation (on perldoc).
Warnings The Warnings pragma documentation (on perldoc).
Core Modules:
CGI The CGI module documentation (on perldoc).
CGI::Carp The CGI::Carp module documentation (on perldoc).
File::Basename The File::Basename module documentation (on perldoc).
Other Modules:
Email::Valid The Email::Valid module documentation (on cpan).
This article is protected under the
Creative Commons License.
The Complete Script -
#!/usr/bin/perl -T
-
-
use strict;
-
use warnings;
-
use CGI;
-
use CGI::Carp qw/fatalsToBrowser/;
-
use File::Basename;
-
-
$CGI::POST_MAX = 1024 * 5000; #adjust as needed (1024 * 5000 = 5MB)
-
$CGI::DISABLE_UPLOADS = 0; #1 disables uploads, 0 enables uploads
-
-
my $query = CGI->new;
-
-
unless ($CGI::VERSION >= 2.47) {
-
error('Your version of CGI.pm is too old. You must have verison 2.47 or higher to use this script.')
-
}
-
-
my $upload_dir = '/home/mywebsite/htdocs/upload';
-
-
# a list of valid characters that can be in filenames
-
my $filename_characters = 'a-zA-Z0-9_.-';
-
-
my $file = $query->param("photo") or error('No file selected for upload.') ;
-
my $email_address = $query->param("email") || 'Annonymous';
-
-
# get the filename and the file extension
-
# this could be used to filter out unwanted filetypes
-
# see the File::Basename documentation for details
-
my ($filename,undef,$ext) = fileparse($file,qr{\..*});
-
-
# append extension to filename
-
$filename .= $ext;
-
-
# convert spaces to underscores "_"
-
$filename =~ tr/ /_/;
-
-
# remove illegal characters
-
$filename =~ s/[^$filename_characters]//g;
-
-
# satisfy taint checking
-
if ($filename =~ /^([$filename_characters]+)$/) {
-
$filename = $1;
-
}
-
else{
-
error("The filename is not valid. Filenames can only contain these characters: $filename_characters")
-
}
-
-
# this is very crude but validating an email address is not an easy task
-
# and is beyond the scope of this article. To validate an email
-
# address properly use the Emaill::Valid module. I do not include
-
# it here because it is not a core module.
-
unless ($email_address =~ /^[\w@.-]+$/ && length $email_address < 250) {
-
error("The email address appears invalid or contains too many characters. Limit is 250 characters.")
-
}
-
-
my $upload_filehandle = $query->upload("photo");
-
-
open (UPLOADFILE, ">$upload_dir/$filename") or error($!);
-
binmode UPLOADFILE;
-
while ( <$upload_filehandle> ) {
-
print UPLOADFILE;
-
}
-
close UPLOADFILE;
-
-
print $query->header(),
-
$query->start_html(-title=>'Upload Successful'),
-
$query->p('Thanks for uploading your photo!'),
-
$query->p("Your email address: $email_address"),
-
$query->p("Your photo $filename:"),
-
$query->img({src=>"../uploads/$filename",alt=>''}),
-
$query->end_html;
-
-
-
sub error {
-
print $query->header(),
-
$query->start_html(-title=>'Error'),
-
shift,
-
$query->end_html;
-
exit(0);
-
}
-
This article is protected under the
Creative Commons License.