Introduction
Uploading files from a local computer to a remote web server has many useful purposes, the most obvious of which is the sharing of files. For example, you upload images to a server to share them with other people over the Internet. Perl comes ready equipped for uploading files via the CGI.pm module, which has long been a core module and allows users to do many things without the need to understand very much of the underlying technology.
This article is not a perl tutorial per se. I am not going to go into much of the behind the scenes details that makes it all work. Some sections will have to include short explanations and of course perl code, but the goal is not to teach the reader perl programming or general Internet concepts. I assume you have some experience with HTML and maybe have uploaded and installed a perl script or two in the past and know how to do that. Consider this article more of a "How To" lesson.
The perl script I am going to introduce you to uses only core modules that come with perl so there is no need for you to install any modules. All you will need is any fairly recent version of perl. Of course perl has to be installed on the server where you will run the script. There is also a list of online resources you can access for more general information concerning many of the details that will be discussed in this article.
Getting Started
First, you will need a form. You see them all the time on the Internet. You fill in some information and click a button to send data to a server. That is a form. The precise name is a CGI (Common Gateway Interface) form. CGI is a protocol, which simply means a set of rules that allows your computer to communicate with a remote server. As long as the server is setup to accept CGI form data, and most are, then all is well. You really don't need to know any of that, so don't worry if it sounds confusing.
The CGI Form
Following is a very simple CGI form: one file field, where the user chooses a file to upload, and one text field, where the user will enter their email address. I include the text field just as an example, it is not required if all you want is a file upload field. You can optionally add many other form fields, and even more "file" fields, that will send data to the server all at the same time.
Expand|Select|Wrap|Line Numbers
- <FORM ACTION="/cgi-bin/upload.pl" METHOD="post" ENCTYPE="multipart/form-data">
- <INPUT TYPE="file" NAME="photo">
- <INPUT TYPE="text" NAME="email">
- <INPUT TYPE="submit" NAME="Submit" VALUE="Submit Form">
- </FORM>
The CGI Script
Just about all perl scripts that run as a CGI process need to start with what is called the shebang line. The most common shebang line is:
Expand|Select|Wrap|Line Numbers
- #!/usr/bin/perl
Expand|Select|Wrap|Line Numbers
- #!/usr/bin/perl -T
Modules
Modules are sort of like separate perl programs you can use in your perl program. Many people have written modules that have become standards that other perl programmers use all the time. We will be using these modules:
Expand|Select|Wrap|Line Numbers
- use strict;
- use warnings;
- use CGI;
- use CGI::Carp qw/fatalsToBrowser/;
- use File::Basename;
Setting Some Limits
In my opinion, all CGI scripts should set a limit on how much data can be transferred to the server. Otherwise, some dork with a T1 line will start transferring gigabytes of data to your server and either make it unavailable to other users, or use up all your allotted disk space. Here is how we can set a limit using the CGI module:
Expand|Select|Wrap|Line Numbers
- $CGI::POST_MAX = 1024 * 5000; # adjust as needed (1024 * 5000 = 5MB)
- $CGI::DISABLE_UPLOADS = 0; # 1 disables uploads, 0 enables uploads
Creating a new CGI Object or Hiring a Butler
Here is what is so great about using the CGI module (and many other modules). To access all the functions of the CGI module all you need to do is add this line to your script:
Expand|Select|Wrap|Line Numbers
- my $query = CGI->new;
Note: $query can be any valid scalar name.
Note: CGI->new is not absolutely necessary. There is another way to use most of the functions of the CGI module called "standard". Read the CGI module documentation if you are interested.
Houston, We Have A Problem
I hope you know what that means, if not, don't worry about it. This next section of the code will check if the version of the CGI module is new enough to use in the manner we want to use it. Any server still using a version older than 2.47 should probably be shut down, but I include it to be complete.
Expand|Select|Wrap|Line Numbers
- unless ($CGI::VERSION >= 2.47) {
- error('Your version of CGI.pm is too old. You must have version 2.47 or higher to use this script.')
- }
- }
Where to Store Uploaded Files
This line might be your biggest cause for concern:
Expand|Select|Wrap|Line Numbers
- my $upload_dir = '/home/you/public_html/uploads';
/home/you/public_html/
where "you" is your unique identifier. It might be part of your websites name or something different. Most web hosts will have that information posted on their website somewhere. Add "uploads" to the end:
/home/you/public_html/uploads
and that is the folder where file uploads will be stored. If you did not want people to see the files you would place them in a folder parrallel to the root web folder:
/home/you/uploads
or possibly above it:
/home/uploads
Validating user Input or Keeping the Riff-Raff Out
The first rule of CGI programming is: Never trust user input. Because of this lack of trust, all CGI scripts should do what is called validating user input. In our case we don't have much user input, just an email address and the file to upload. But you still have to check both of them. Filenames on the internet should really only consist of a limited group of characters, they are:
a thru z, A thru Z, 0 thru 9 _ (underscore) . (dot) - (dash)
written in perlish form: 'a-zA-Z0-9_.-'
We will use this list of characters to check (and change if necessary) filenames that are going to be uploaded. We define that list like so:
Expand|Select|Wrap|Line Numbers
- my $filename_characters = 'a-zA-Z0-9_.-';
Reading in the Form Fields
Here we tell our butler, $query, to let our "visitors" into our "home".
Expand|Select|Wrap|Line Numbers
- my $file = $query->param("photo") or error('No file selected for upload.') ;
- my $email_address = $query->param("email") || 'Annonymous';
Expand|Select|Wrap|Line Numbers
- <INPUT TYPE="file" NAME="photo">
- <INPUT TYPE="text" NAME="email">
If no file was selected to upload the script sends an error message to the "error()" subroutine alerting the user of the error. If no email address is entered $email_address will be assigned a value of "Annonymous ".
Some browsers send the whole path to a file instead of just the filename. The next line will parse the filename and file extension (ie: .jpg .gif .txt, etc) from the filepath if necessary. "fileparse( )" is a function of the File::Basename module:
Expand|Select|Wrap|Line Numbers
- my ($filename,undef,$ext) = fileparse($file,qr{\..*});
The next lines validate and change the filename if necessary to make sure it complies with our list of characters:
Expand|Select|Wrap|Line Numbers
- # append extension to filename
- $filename .= $ext;
- # convert spaces to underscores "_"
- $filename =~ tr/ /_/;
Expand|Select|Wrap|Line Numbers
- # remove illegal characters
- $filename =~ s/[^$filename_characters]//g;
The next several lines will allow your script to use the filename to store the data on the server by "untainting " the filename. This is where the -T switch comes in the picture. Allowing user input to be used as a filename is considered insecure, and it is! I am not going to go into details. Suffice it to say, the perl program assumes that you, the programmer,know s what you are doing, and will not throw an error at you and abort the script for allowing insecure input to be used in an insecure way.
Expand|Select|Wrap|Line Numbers
- # satisfy taint checking
- if ($filename =~ /^([$filename_characters]+)$/) {
- $filename = $1;
- }
Expand|Select|Wrap|Line Numbers
- else{
- error("The filename is not valid. Filenames can only contain these characters: $filename_characters")
- }
Expand|Select|Wrap|Line Numbers
- unless ($email_address eq 'Annonymous' or ($email_address =~ /^[\w@.-]+$/ && length $email_address < 250)) {
- error("The email address appears invalid or contains too many characters. Limit is 250 characters.")
- }
If the script has got this far, it assumes all is good and will now upload the file to the destination directory and store it under the filename ($filename).
"$query->upload()" is the function that turns the file field name in the CGI form into a filehandle: <$upload_fileha ndle>.
Expand|Select|Wrap|Line Numbers
- my $upload_filehandle = $query->upload("photo");
Expand|Select|Wrap|Line Numbers
- open (UPLOADFILE, ">$upload_dir/$filename") or error("Can't open/create \"$upload_dir/$filename\": $!");
Expand|Select|Wrap|Line Numbers
- binmode UPLOADFILE;
Expand|Select|Wrap|Line Numbers
- while ( <$upload_filehandle> ) {
- print UPLOADFILE;
- }
Expand|Select|Wrap|Line Numbers
- close UPLOADFILE;
Our "butler", $query, handles all the HTML code printing chores for us using various commands.
Expand|Select|Wrap|Line Numbers
- print $query->header(),
- $query->start_html(-title=>'Upload Successful'),
- $query->p('Thanks for uploading your photo!'),
- $query->p("Your email address: $email_address"),
- $query->p("Your photo $filename:"),
- $query->img({src=>"../uploads/$filename",alt=>''}),
- $query->end_html;
The "error()" Subroutine
Last but not least is our only private subroutine. It simply prints the error messages sent to it from the script. Once again, $query handles the HTML code printing for us. The last line in the subroutine, exit(0), tells perl to end the script after printing the error message. If we did not include the exit() function perl would continue to try and process the rest of the script.
Expand|Select|Wrap|Line Numbers
- sub error {
- my $error = shift;
- print $query->header(),
- $query->start_html(-title=>'Error'),
- $error,
- $query->end_html;
- exit(0);
- }
As mentioned above, the first rule of CGI programming is: Never trust user input. CGI scripts should not allow users to send data that the script does not expect or handle data in insecure ways. CGI security is beyond the scope of this article but there is an online resource in the "Resources" section you can read for more details.
Uploading a file is fairly easy using the CGI module. Really, it's very easy once you are familiar with the many methods/functions the wide-ranging CGI module has to offer. With a small change to the code I wrote for this article you could allow users to upload multiple files at the same time and include lots of other CGI form data as well. The CGI module is one of the bigger perl modules there is and the documentation is extensive but can still be confusing to the novice perl programmer.
The script I posted can be used in a production environment, but keep in mind the email address field is not properly validated. As mentioned previously, use the Email::Valid module when you need to validate email addresses. You may need to install it, but that is a subject for another article.
Kevin (aka KevinADC)
Resources
Websites:
Perldoc Website All the perl documentation online.
Search CPAN Comprehensive Perl Archive Network. A gigantic repository of perl modules
and more.
CGI Security A primer on CGI security.
Perl Pragmas:
Strict The strict pragma documentation (on perldoc).
Warnings The Warnings pragma documentation (on perldoc).
Core Modules:
CGI The CGI module documentation (on perldoc).
CGI::Carp The CGI::Carp module documentation (on perldoc).
File::Basename The File::Basename module documentation (on perldoc).
Other Modules:
Email::Valid The Email::Valid module documentation (on cpan).
This article is protected under the Creative Commons License.
The Complete Script
Expand|Select|Wrap|Line Numbers
- #!/usr/bin/perl -T
- use strict;
- use warnings;
- use CGI;
- use CGI::Carp qw/fatalsToBrowser/;
- use File::Basename;
- $CGI::POST_MAX = 1024 * 5000; #adjust as needed (1024 * 5000 = 5MB)
- $CGI::DISABLE_UPLOADS = 0; #1 disables uploads, 0 enables uploads
- my $query = CGI->new;
- unless ($CGI::VERSION >= 2.47) {
- error('Your version of CGI.pm is too old. You must have verison 2.47 or higher to use this script.')
- }
- my $upload_dir = '/home/mywebsite/htdocs/upload';
- # a list of valid characters that can be in filenames
- my $filename_characters = 'a-zA-Z0-9_.-';
- my $file = $query->param("photo") or error('No file selected for upload.') ;
- my $email_address = $query->param("email") || 'Annonymous';
- # get the filename and the file extension
- # this could be used to filter out unwanted filetypes
- # see the File::Basename documentation for details
- my ($filename,undef,$ext) = fileparse($file,qr{\..*});
- # append extension to filename
- $filename .= $ext;
- # convert spaces to underscores "_"
- $filename =~ tr/ /_/;
- # remove illegal characters
- $filename =~ s/[^$filename_characters]//g;
- # satisfy taint checking
- if ($filename =~ /^([$filename_characters]+)$/) {
- $filename = $1;
- }
- else{
- error("The filename is not valid. Filenames can only contain these characters: $filename_characters")
- }
- # this is very crude but validating an email address is not an easy task
- # and is beyond the scope of this article. To validate an email
- # address properly use the Emaill::Valid module. I do not include
- # it here because it is not a core module.
- unless ($email_address =~ /^[\w@.-]+$/ && length $email_address < 250) {
- error("The email address appears invalid or contains too many characters. Limit is 250 characters.")
- }
- my $upload_filehandle = $query->upload("photo");
- open (UPLOADFILE, ">$upload_dir/$filename") or error($!);
- binmode UPLOADFILE;
- while ( <$upload_filehandle> ) {
- print UPLOADFILE;
- }
- close UPLOADFILE;
- print $query->header(),
- $query->start_html(-title=>'Upload Successful'),
- $query->p('Thanks for uploading your photo!'),
- $query->p("Your email address: $email_address"),
- $query->p("Your photo $filename:"),
- $query->img({src=>"../uploads/$filename",alt=>''}),
- $query->end_html;
- sub error {
- print $query->header(),
- $query->start_html(-title=>'Error'),
- shift,
- $query->end_html;
- exit(0);
- }