435,519 Members | 2,295 Online
Submit an Article
Got Smarts?
Share your bits of IT knowledge by writing an article on Bytes.

# How to Upload Files using the CGI.pm Module and Perl

Expert 2.5K+
P: 4,059
Note: You may skip to the end of the article if all you want is the perl code.

Introduction

Uploading files from a local computer to a remote web server has many useful purposes, the most obvious of which is the sharing of files. For example, you upload images to a server to share them with other people over the Internet. Perl comes ready equipped for uploading files via the CGI.pm module, which has long been a core module and allows users to do many things without the need to understand very much of the underlying technology.

This article is not a perl tutorial per se. I am not going to go into much of the behind the scenes details that makes it all work. Some sections will have to include short explanations and of course perl code, but the goal is not to teach the reader perl programming or general Internet concepts. I assume you have some experience with HTML and maybe have uploaded and installed a perl script or two in the past and know how to do that. Consider this article more of a "How To" lesson.

The perl script I am going to introduce you to uses only core modules that come with perl so there is no need for you to install any modules. All you will need is any fairly recent version of perl. Of course perl has to be installed on the server where you will run the script. There is also a list of online resources you can access for more general information concerning many of the details that will be discussed in this article.

Getting Started

First, you will need a form. You see them all the time on the Internet. You fill in some information and click a button to send data to a server. That is a form. The precise name is a CGI (Common Gateway Interface) form. CGI is a protocol, which simply means a set of rules that allows your computer to communicate with a remote server. As long as the server is setup to accept CGI form data, and most are, then all is well. You really don't need to know any of that, so don't worry if it sounds confusing.

The CGI Form

Following is a very simple CGI form: one file field, where the user chooses a file to upload, and one text field, where the user will enter their email address. I include the text field just as an example, it is not required if all you want is a file upload field. You can optionally add many other form fields, and even more "file" fields, that will send data to the server all at the same time.

Expand|Select|Wrap|Line Numbers
2. <INPUT TYPE="file" NAME="photo">
3. <INPUT TYPE="text" NAME="email">
4. <INPUT TYPE="submit" NAME="Submit" VALUE="Submit Form">
5. </FORM>
6.
The above form would be embedded in an HTML document with all the appropriate tags that an HTML document should have. If you are unsure what those are you should read a basic HTML (or web page) tutorial. For a file upload script the ENCTYPE="multipart/form-data" and the TYPE="file" are the most important. The ACTION="/cgi-bin/upload.pl" tells the form where to send the data. In this case to your perl script named upload.pl in your cgi-bin folder where most cgi/perl scripts should be stored on a web server. Save the form as a web page named upload.html, or any name you prefer and upload it to your web host account. If you do not have a web host account there is not much need to continue from this point.

The CGI Script

Just about all perl scripts that run as a CGI process need to start with what is called the shebang line. The most common shebang line is:

Expand|Select|Wrap|Line Numbers
1. #!/usr/bin/perl
It simply tells the server where to find perl. The shebang line your server requires might be different. Most web hosts will have that information posted on their site somewhere. In the interest of good perl coding practices and security we are going to add a switch to the shebang line: -T. Note: it must be an uppercase T.

Expand|Select|Wrap|Line Numbers
1. #!/usr/bin/perl -T
The T stands for "taint" mode. This is really to prevent you, as the programmer of the script, from making a terrible mistake and allowing the users of your CGI form to send data to the server that can be used in an insecure way. This file upload program is really very simple and will not allow users to do any such thing, but all perl scripts that run as a CGI process should use the -T switch so I include it for that reason.

Modules

Modules are sort of like separate perl programs you can use in your perl program. Many people have written modules that have become standards that other perl programmers use all the time. We will be using these modules:

Expand|Select|Wrap|Line Numbers
1. use strict;
2. use warnings;
3. use CGI;
4. use CGI::Carp qw/fatalsToBrowser/;
5. use File::Basename;
6.
The first two are not modules, they are pragmas. They affect the way perl itself functions. I am not going to explain them for the purpose of this article. You will have to trust me that they are important to use in nearly all of your perl programs. The "CGI" module is the module that will do all the heavy lifting for us. It will process the form data and store it on the server. The "CGI::Carp" module is really for debugging and may help you to get your script running if you have problems. If there are any fatal errors that cause the script to fail, it will print an error message to the screen. These are the same errors that will be printed in the server error log too. "File::Basename" is a module that can be used to split a filename/filepath into it's separate parts for easy processing by your script. We will use it to get just the filename and the file extension into our script.

Setting Some Limits

In my opinion, all CGI scripts should set a limit on how much data can be transferred to the server. Otherwise, some dork with a T1 line will start transferring gigabytes of data to your server and either make it unavailable to other users, or use up all your allotted disk space. Here is how we can set a limit using the CGI module:

Expand|Select|Wrap|Line Numbers
1. $CGI::POST_MAX = 1024 * 5000; # adjust as needed (1024 * 5000 = 5MB) 2.$CGI::DISABLE_UPLOADS = 0; # 1 disables uploads, 0 enables uploads
3.
The first line establishes the maximum amount of data (in bytes) the script will allow to be stored on the server. It's not perfect because it has to receive all the data before it knows there is too much. So a person can still try and send way more data than you want them to, but the script will reject it. But for the most part it works well enough. The second line allows you an easy way to completely disable file uploads. You may want to do this occasionally if you are doing some type of maintenance to your website and don't want people to upload files during that time.

Creating a new CGI Object or Hiring a Butler

Here is what is so great about using the CGI module (and many other modules). To access all the functions of the CGI module all you need to do is add this line to your script:

Expand|Select|Wrap|Line Numbers
1. my $query = CGI->new; This creates the$query object that we will use for the rest of the script. The object is sort of our personal butler. We tell the butler what to do using simple commands and he goes and does it for us.
Note: $query can be any valid scalar name. Note: CGI->new is not absolutely necessary. There is another way to use most of the functions of the CGI module called "standard". Read the CGI module documentation if you are interested. Houston, We Have A Problem I hope you know what that means, if not, don't worry about it. This next section of the code will check if the version of the CGI module is new enough to use in the manner we want to use it. Any server still using a version older than 2.47 should probably be shut down, but I include it to be complete. Expand|Select|Wrap|Line Numbers 1. unless ($CGI::VERSION >= 2.47) {
2.    error('Your version of CGI.pm is too old. You must have version 2.47 or higher to use this script.')
3. }
4. }
5.
If the version is too old the script sends an error message to the "error()" subroutine alerting the user of the error.

This line might be your biggest cause for concern:

Expand|Select|Wrap|Line Numbers
1. my $upload_dir = '/home/you/public_html/uploads'; You have to determine what '/home/you/public_html/uploads' should be. Since you probably want the files to be visible to people that visit your website, you would want to place the files in a folder below the root web folder. Most servers callthe root web folder "public_html", some call it "www". The path to "public_html" is generally something like: /home/you/public_html/ where "you" is your unique identifier. It might be part of your websites name or something different. Most web hosts will have that information posted on their website somewhere. Add "uploads" to the end: /home/you/public_html/uploads and that is the folder where file uploads will be stored. If you did not want people to see the files you would place them in a folder parrallel to the root web folder: /home/you/uploads or possibly above it: /home/uploads Validating user Input or Keeping the Riff-Raff Out The first rule of CGI programming is: Never trust user input. Because of this lack of trust, all CGI scripts should do what is called validating user input. In our case we don't have much user input, just an email address and the file to upload. But you still have to check both of them. Filenames on the internet should really only consist of a limited group of characters, they are: a thru z, A thru Z, 0 thru 9 _ (underscore) . (dot) - (dash) written in perlish form: 'a-zA-Z0-9_.-' We will use this list of characters to check (and change if necessary) filenames that are going to be uploaded. We define that list like so: Expand|Select|Wrap|Line Numbers 1. my$filename_characters = 'a-zA-Z0-9_.-';
Note: Like much of perl, this could be done a number of different ways. I am trying to keep this fairly simple so I elected to use this method for readability and simplicity sake.

Here we tell our butler, $query, to let our "visitors" into our "home". Expand|Select|Wrap|Line Numbers 1. my$file = $query->param("photo") or error('No file selected for upload.') ; 2. my$email_address = $query->param("email") || 'Annonymous'; 3. Note how the names in parenthesis match the names in our form fields above: Expand|Select|Wrap|Line Numbers 1. <INPUT TYPE="file" NAME="photo"> 2. <INPUT TYPE="text" NAME="email"> 3. The case is also important. "Photo" is not the same as "photo". We put the values of the param() calls into some perl variables so we can alter them and use them later in the script. If no file was selected to upload the script sends an error message to the "error()" subroutine alerting the user of the error. If no email address is entered$email_address will be assigned a value of "Annonymous".

Some browsers send the whole path to a file instead of just the filename. The next line will parse the filename and file extension (ie: .jpg .gif .txt, etc) from the filepath if necessary. "fileparse()" is a function of the File::Basename module:

Expand|Select|Wrap|Line Numbers
1. my ($filename,undef,$ext) = fileparse($file,qr{\..*}); If the browser sent something like "c:\windows\my documents\frog pictures\big frog!!!.gif" we will end up with only "big frog!!!" and ".gif". The next lines validate and change the filename if necessary to make sure it complies with our list of characters: Expand|Select|Wrap|Line Numbers 1. # append extension to filename 2.$filename .= $ext; 3. 4. # convert spaces to underscores "_" 5.$filename =~ tr/ /_/;
6.
Now "big frog!!!.gif" will be "big_frog!!!.gif".

Expand|Select|Wrap|Line Numbers
1. # remove illegal characters
2. $filename =~ s/[^$filename_characters]//g;
3.
Now $filename will equal "big_frog.gif". Which is now our validated filename. The next several lines will allow your script to use the filename to store the data on the server by "untainting" the filename. This is where the -T switch comes in the picture. Allowing user input to be used as a filename is considered insecure, and it is! I am not going to go into details. Suffice it to say, the perl program assumes that you, the programmer,knows what you are doing, and will not throw an error at you and abort the script for allowing insecure input to be used in an insecure way. Expand|Select|Wrap|Line Numbers 1. # satisfy taint checking 2. if ($filename =~ /^([$filename_characters]+)$/) {
3.    $filename =$1;
4. }
5.
If for some reason the filename did not meet the conditions above (and it always should) the "else" conditional will send an error message to the "error()" subroutine alerting the user of the error.

Expand|Select|Wrap|Line Numbers
1. else{
2.    error("The filename is not valid. Filenames can only contain these characters: $filename_characters") 3. } 4. The next "unless" condition does a very crude validation of the email address. To reliably validate an email address I recommend using the Email::Valid module. I did not include it in this script because it is not a core module. Expand|Select|Wrap|Line Numbers 1. unless ($email_address eq 'Annonymous' or ($email_address =~ /^[\w@.-]+$/ && length $email_address < 250)) { 2. error("The email address appears invalid or contains too many characters. Limit is 250 characters.") 3. } 4. Upload the File If the script has got this far, it assumes all is good and will now upload the file to the destination directory and store it under the filename ($filename).

"$query->upload()" is the function that turns the file field name in the CGI form into a filehandle: <$upload_filehandle>.

Expand|Select|Wrap|Line Numbers
1. my $upload_filehandle =$query->upload("photo");
Here we open a new file in the $upload_dir using the filename. Expand|Select|Wrap|Line Numbers 1. open (UPLOADFILE, ">$upload_dir/$filename") or error("Can't open/create \"$upload_dir/$filename\":$!");
This line tells perl to use binary mode on the filehandle:

Expand|Select|Wrap|Line Numbers
The "while" loop reads in the file the user is uploading into the file we opened above.

Expand|Select|Wrap|Line Numbers
1. while ( <$upload_filehandle> ) { 2. print UPLOADFILE; 3. } 4. Now close the file: Expand|Select|Wrap|Line Numbers 1. close UPLOADFILE; Print a "Thank you" message to alert the user all went well and display the image and the email address. Our "butler",$query, handles all the HTML code printing chores for us using various commands.

Expand|Select|Wrap|Line Numbers
1. print $query->header(), 2.$query->start_html(-title=>'Upload Successful'),
3.       $query->p('Thanks for uploading your photo!'), 4.$query->p("Your email address: $email_address"), 5.$query->p("Your photo $filename:"), 6.$query->img({src=>"../uploads/$filename",alt=>''}), 7.$query->end_html;
8.

The "error()" Subroutine

Last but not least is our only private subroutine. It simply prints the error messages sent to it from the script. Once again, $query handles the HTML code printing for us. The last line in the subroutine, exit(0), tells perl to end the script after printing the error message. If we did not include the exit() function perl would continue to try and process the rest of the script. Expand|Select|Wrap|Line Numbers 1. sub error { 2. my$error = shift;
3.    print $query->header(), 4.$query->start_html(-title=>'Error'),
5.          $error, 6.$query->end_html;
7.    exit(0);
8. }
9.
Review

As mentioned above, the first rule of CGI programming is: Never trust user input. CGI scripts should not allow users to send data that the script does not expect or handle data in insecure ways. CGI security is beyond the scope of this article but there is an online resource in the "Resources" section you can read for more details.

Uploading a file is fairly easy using the CGI module. Really, it's very easy once you are familiar with the many methods/functions the wide-ranging CGI module has to offer. With a small change to the code I wrote for this article you could allow users to upload multiple files at the same time and include lots of other CGI form data as well. The CGI module is one of the bigger perl modules there is and the documentation is extensive but can still be confusing to the novice perl programmer.

The script I posted can be used in a production environment, but keep in mind the email address field is not properly validated. As mentioned previously, use the Email::Valid module when you need to validate email addresses. You may need to install it, but that is a subject for another article.

Resources

Websites:

Perldoc Website All the perl documentation online.
Search CPAN Comprehensive Perl Archive Network. A gigantic repository of perl modules
and more.
CGI Security A primer on CGI security.

Perl Pragmas:

Strict The strict pragma documentation (on perldoc).
Warnings The Warnings pragma documentation (on perldoc).

Core Modules:

CGI The CGI module documentation (on perldoc).
CGI::Carp The CGI::Carp module documentation (on perldoc).
File::Basename The File::Basename module documentation (on perldoc).

Other Modules:

Email::Valid The Email::Valid module documentation (on cpan).

The Complete Script
Expand|Select|Wrap|Line Numbers
1. #!/usr/bin/perl -T
2.
3. use strict;
4. use warnings;
5. use CGI;
6. use CGI::Carp qw/fatalsToBrowser/;
7. use File::Basename;
8.
9. $CGI::POST_MAX = 1024 * 5000; #adjust as needed (1024 * 5000 = 5MB) 10.$CGI::DISABLE_UPLOADS = 0; #1 disables uploads, 0 enables uploads
11.
12. my $query = CGI->new; 13. 14. unless ($CGI::VERSION >= 2.47) {
15.    error('Your version of CGI.pm is too old. You must have verison 2.47 or higher to use this script.')
16. }
17.
18. my $upload_dir = '/home/mywebsite/htdocs/upload'; 19. 20. # a list of valid characters that can be in filenames 21. my$filename_characters = 'a-zA-Z0-9_.-';
22.
23. my $file =$query->param("photo") or error('No file selected for upload.') ;
24. my $email_address =$query->param("email") || 'Annonymous';
25.
26. # get the filename and the file extension
27. # this could be used to filter out unwanted filetypes
28. # see the File::Basename documentation for details
29. my ($filename,undef,$ext) = fileparse($file,qr{\..*}); 30. 31. # append extension to filename 32.$filename .= $ext; 33. 34. # convert spaces to underscores "_" 35.$filename =~ tr/ /_/;
36.
37. # remove illegal characters
38. $filename =~ s/[^$filename_characters]//g;
39.
40. # satisfy taint checking
41. if ($filename =~ /^([$filename_characters]+)$/) { 42.$filename = $1; 43. } 44. else{ 45. error("The filename is not valid. Filenames can only contain these characters:$filename_characters")
46. }
47.
48. # this is very crude but validating an email address is not an easy task
49. # and is beyond the scope of this article. To validate an email
50. # address properly use the Emaill::Valid module. I do not include
51. # it here because it is not a core module.
52. unless ($email_address =~ /^[\w@.-]+$/ && length $email_address < 250) { 53. error("The email address appears invalid or contains too many characters. Limit is 250 characters.") 54. } 55. 56. my$upload_filehandle = $query->upload("photo"); 57. 58. open (UPLOADFILE, ">$upload_dir/$filename") or error($!);
60. while ( <$upload_filehandle> ) { 61. print UPLOADFILE; 62. } 63. close UPLOADFILE; 64. 65. print$query->header(),
66.       $query->start_html(-title=>'Upload Successful'), 67.$query->p('Thanks for uploading your photo!'),
68.       $query->p("Your email address:$email_address"),
69.       $query->p("Your photo$filename:"),
70.       $query->img({src=>"../uploads/$filename",alt=>''}),
71.       $query->end_html; 72. 73. 74. sub error { 75. print$query->header(),
76.          $query->start_html(-title=>'Error'), 77. shift, 78.$query->end_html;
79.    exit(0);
80. }
81.
Attached Files
Jul 4 '07 #1

 Expert 2.5K+ P: 4,059 Comments and discussion are welcome. Proof readers are also needed. Jul 4 '07 #2

 P: 1 Comments and discussion are welcome. Proof readers are also needed. Hi, I tried this script but was unable to upload file through firefox. I was able to upload files using IE. Is there any dependency on browser? Sep 24 '07 #3

 Expert 2.5K+ P: 4,059 Hi, I tried this script but was unable to upload file through firefox. I was able to upload files using IE. Is there any dependency on browser? Works for me with FireFox (v 2.0.0.7) but I assume it will work with any version of any browser. The html code is very a very simple form that should be supported even in very old browsers. What happens when you use FireFox? Quote from article: The above form would be embedded in an HTML document with all the appropriate tags that an HTML document should have. Did you embed the form inside an html document? form code here Sep 25 '07 #4

 Expert 100+ P: 971 Kevin, I want to say that your article is nice and well written. Look forward to seeing more articles from you in the future. ---eWish Oct 16 '07 #5

 Expert 100+ P: 176 Kevin, I want to say that your article is nice and well written. Look forward to seeing more articles from you in the future. ---eWish I would only point out (and I could be wrong) that I believe you would have to escape the "." in the validating user input section. ie: Expand|Select|Wrap|Line Numbers my $filename_characters = 'a-zA-Z0-9_.-'; I think it should be: Expand|Select|Wrap|Line Numbers my$filename_characters = 'a-zA-Z0-9_\.-';   Since dot represents any character. ?? Otherwise, incredible article that will come in very useful for me personally soon. Thanks!! Oct 24 '07 #6

 Expert 100+ P: 971 When a metacharacter is used in a character class (hence the [] brackets) then it becomes literal and does not have to be escaped. Expand|Select|Wrap|Line Numbers if ($filename =~ /^([$filename_characters]+)$/) { Oct 25 '07 #7  Expert 2.5K+ P: 4,059 I would only point out (and I could be wrong) that I believe you would have to escape the "." in the validating user input section. ie: Expand|Select|Wrap|Line Numbers my$filename_characters = 'a-zA-Z0-9_.-';   I think it should be: Expand|Select|Wrap|Line Numbers my $filename_characters = 'a-zA-Z0-9_\.-'; Since dot represents any character. ?? Otherwise, incredible article that will come in very useful for me personally soon. Thanks!! The dot in a character class [] is just a dot. It's not a wild card. You can easily test that: Expand|Select|Wrap|Line Numbers$foo = 'test'; print $foo =~ /[.]/ ? 'true' : 'false'; The above prints 'false' because there is no dot in$foo. Remove the character class brackets and try again: Expand|Select|Wrap|Line Numbers $foo = 'test'; print$foo =~ /./ ? 'true' : 'false';   Now it prints 'true' because the dot is a wild card. Oct 28 '07 #8

 Expert 2.5K+ P: 4,059 When a metacharacter is used in a character class (hence the [] brackets) then it becomes literal and does not have to be escaped. Expand|Select|Wrap|Line Numbers if ($filename =~ /^([$filename_characters]+)$/) { Not all meta characters are treated as literal characters in a character class. This is one area where you have to learn what the rule is for each character and how it is interpreted inside of square brackets. Oct 28 '07 #9  Expert 100+ P: 971 That is true. I should have worded that differently. Oct 28 '07 #10  Expert 100+ P: 176 I am interested in how I can detect the size of the image, and give error accordingly. Would that be$ENV{CONTENT_LENGTH}, when using post method? Feb 6 '08 #11

 Expert 2.5K+ P: 4,059 I am interested in how I can detect the size of the image, and give error accordingly. Would that be $ENV{CONTENT_LENGTH}, when using post method? After you write the image to a file you can check the size. Feb 6 '08 #12  P: 1 Also Kevin, As you pointed out in a separate forum, if your uploading to a linux server then there will be issues with the back slash and forward slash interpretation, so for linux servers this needs to be included. Expand|Select|Wrap|Line Numbers my$file = $query->param("photo") or error('No file selected for upload.') ;$file  =~ tr/\\/\//;   Jan 23 '09 #13

 P: 1 hi i went through the code understood it. I am using tomcat 5.5.20 as my webserver. I dont understand "Where to Store Uploaded Files" part. like i gave a path that points to the root folder of tomcat. Its not working though Jan 24 '09 #14

 Expert 2.5K+ P: 4,059 @yeshwanth I don't know anything about the Tomcat server, sorry. All I can suggest is you make sure the path to the folder is correct (use the full path, not a relative path) and has read/write permissions. Jan 24 '09 #15

 P: 1 hi i am getting the following error "'malformed multipart POST' " while using this script any ideas regarding this regards, rahul Feb 2 '09 #16

 Expert 100+ P: 174 Well written and explained, great work Feb 3 '09 #17

 Expert 2.5K+ P: 4,059 @rahulm I all I can do is quote the CGI modules documentation" There are occasionally problems involving parsing the uploaded file. This usually happens when the user presses "Stop" before the upload is finished. In this case, CGI.pm will return undef for the name of the uploaded file and set cgi_error() to the string "400 Bad request (malformed multipart POST)". But I don't know why you are getting the error message and I have nothing to suggest. Feb 3 '09 #18

 P: 2 Awesome article.. thank you for sharing, I got it working :) Just one question - in the form it requests their email, when they upload their file, WHERE do we find their email details? Thanks heaps..! Jul 12 '10 #19

 Expert Mod 2.5K+ P: 3,503 @Debs You should read a tutorial on Perl CGI. The email address would be available in the $_POST array in the key$_POST{'email'} or, in the case of @KevinADC's code, it would be \$query->param("email"). Regards, Jeff Jul 12 '10 #20

 P: 2 @Jeff, thanks for the reply :) I followed KevinADC's code and got it all up and working - so when I browse to a file, enter my email and hit submit - it all works great. Then I check on my server and the file is there. But I'm wondering where I find the email that I submitted... (for example if my client uploads a file, and enters their email - it would be helpful if I can then see that and identify who sent what... rather than just have all these uploads in there and not knowing what belongs to who) sorry am a newbie at this! lol (KevinADC has explained it very well - cos I got the upload working!) :) Cheers Deb Jul 12 '10 #21

 Expert Mod 2.5K+ P: 3,503 It looks like Kevin did not actually store it anywhere, but instead validated it and then used it in the output to show what you submitted for information and a file. You could easily extend this script to store the information in a database, if you wish, but that would be questions that should be posted in the answers area and not the insights. :) Regards, Jeff Jul 12 '10 #22