473,385 Members | 1,693 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Is there a way to validate a pdf file?

I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
Sep 10 '08 #1
16 14315
*** firewoodtim escribió/wrote (Wed, 10 Sep 2008 16:53:51 -0400):
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
On every upload it's always a good idea to do some server-side MIME type
guessing. In PHP you have a deprecated builtin function:

http://es2.php.net/manual/en/functio...ntent-type.php

.... and a PECL extension (see link in the above page)

If it's a Unix host, you can probably use the excellent "file" program:

$ file -bi README
text/plain; charset=utf-8

It's easy to execute it from PHP.
Fighting malicious uploads is another field of knowledge.
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor en cubitos: http://www.demogracia.com
--
Sep 10 '08 #2
On Sep 10, 4:53*pm, firewoodtim <firewood...@cavtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
Sep 10 '08 #3
On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us>
wrote:
>On Sep 10, 4:53*pm, firewoodtim <firewood...@cavtel.netwrote:
>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF
Sep 10 '08 #4
On Wed, 10 Sep 2008 17:22:24 -0400, firewoodtim
<fi*********@cavtel.netwrote:
>On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us>
wrote:
>>On Sep 10, 4:53*pm, firewoodtim <firewood...@cavtel.netwrote:
>>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H

This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF
OOPS, I meant %PDF-
Sep 10 '08 #5
Bill H wrote:
On Sep 10, 4:53 pm, firewoodtim <firewood...@cavtel.netwrote:
>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
Unless someone faked the first 5 characters...

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Sep 10 '08 #6
firewoodtim wrote:
On Wed, 10 Sep 2008 17:22:24 -0400, firewoodtim
<fi*********@cavtel.netwrote:
>On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us>
wrote:
>>On Sep 10, 4:53 pm, firewoodtim <firewood...@cavtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF

OOPS, I meant %PDF-
Not in PHP or Perl, anyway. But not knowing which languages are
available on your server makes it difficult to tell.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Sep 10 '08 #7
On Sep 10, 5:22*pm, firewoodtim <firewood...@cavtel.netwrote:
On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <b...@ts1000.us>
wrote:
On Sep 10, 4:53*pm, firewoodtim <firewood...@cavtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )
Bill H

This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = *%PDF
Firewood - It would be hard since the first few bytes in an executable
are important also.

I pulled that from the pdf spec on http://www.wotsit.org/list.asp?search=pdf.
There are more "magic" things in a pdf that you can check also.

Jerry - technically you could check it in perl if you use PDF:API2 to
import a page from the pdf, you will get an error on non-pdf pages
(may not be best way of checking).

Bill H
Sep 10 '08 #8
firewoodtim wrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
I had some other issues with PDF: the fact that my site (a reporting
tool where you can upload attachments to a generated PDF file) cannot
deal with encrypted PDFs. As I use the FPDF and FPDI libraries to
generate the files and include the attachments, my upload checking
routine was fairly simple:
- Create an empty PDF document,
- attach the uploaded file.
If that does not generate any errors, the uploaded file is a valid PDF.

Best regards.
Sep 10 '08 #9
firewoodtim <fi*********@cavtel.netwrote:
>
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF
Not on Linux. On Windows, I suppose it is theoretically possible to create
a .COM file that starts with those bytes, but if it has an extension of
..PDF, Windows will hand it to Acrobat for processing. It won't try to
execut it as an application.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Sep 11 '08 #10
Isn't the whole point:

Is it possible to upload a malformed pdf file, so that a weakness in the pdf
handler can be eploited. So, the file may not be a true pdf, but it starts
off with %PDF, but then contains arbitary 'code' that will/may end up being
executed by the pdf handler.

So, I assumed what the poster wanted to know was: is there a way of
pre-parsing an entire pdf file to determine that it is wholely to spec and
truly a valid and clean pdf file.
"Tim Roberts" <ti**@probo.comwrote in message
news:6s********************************@4ax.com...
firewoodtim <fi*********@cavtel.netwrote:
>>
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF

Not on Linux. On Windows, I suppose it is theoretically possible to
create
a .COM file that starts with those bytes, but if it has an extension of
.PDF, Windows will hand it to Acrobat for processing. It won't try to
execut it as an application.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.

Sep 11 '08 #11
Hi,

Joe Butler wrote:
So, I assumed what the poster wanted to know was: is there a way of
pre-parsing an entire pdf file to determine that it is wholely to
spec and truly a valid and clean pdf file.
there are of course commercial tools, used e.g. for prepress document
preparation, that would do (one of those is licensed by Adobe and built
into Acrobat, named »Preflight«). I don't know of a single free one.

There are some PDF processing tools which you can use to run a number
of tests. But that's not remotely the same test coverage. Other tools
might be metadata extractors like JHove.

However, for what the OP wants there is no tool. You can't keep the
»evil« PDFs out, since they are (in most part) valid, well-formed PDFs.
They just happen to exploit some security zone model bugs or trigger
other implementation glitches in common PDF readers. You cannot filter
these out easily. An option, however, might to render the PDFs in a
server-side sandbox. Convert to Flash (á la scribd.com) or to plain
images would be possible options. Or redistill (e.g. using ghostscript)
to get out javascript and reproduce compressed streams and stuff.

-hwh
Sep 11 '08 #12
Oops. Making it executable... I do not know enough about Linux to really
provide a detailed answer on this. But I know one can accomplish such a
thing, especially on an unsecured computer. I ran into a hosting company
once that proclaimed loudly that their computers were secure. They also
advertised it in all of their advertisements. However, when I downloaded
a particular file one day, my own virus scanners picked up a virus. I
did not place the file onto their servers. I noticed something funny
going on when I deleted it (connected via a FrontPage connection). The
file momentarily disappeared, and when I came back to that folder where
it existed, it reappeared again. So I sent a message to the people who
ran the servers, the normal support @ example dot com type email and
the guys who answered called me a liar and they stated that they ran
extensive antivirus... blah blah blah deny deny deny. So I asked for
management's email address, that guy replied in like fashion. I even
told them the file name, which folder it existed in, where it existed,
and fed it to them like you feed a child with a spoon. And I received
the same answer. Such goobs. I'm not going to mention the name of the
hosting company, as I do not desire to damage their reputation, even
though they had two such incidences that I witnessed. Lesson learned for
me, and there are ways to make a .pdf an executable, but it all depends
upon knowledge and skill, some of which I personally lack. It involves
more than I care to get involved in and I'm prepared to suggest but not
defend it, that even valid, real .pdf files can run as a script horror.

My apologies for not providing full details, as I personally do not
know the full details.

--
Jim Carlock
You Have More Than Five Senses
http://www.associatedcontent.com/art...ve_senses.html

Sep 11 '08 #13
Hans-Werner Hilse wrote:
However, for what the OP wants there is no tool. You can't keep the
»evil« PDFs out, since they are (in most part) valid, well-formed PDFs.
They just happen to exploit some security zone model bugs or trigger
other implementation glitches in common PDF readers. You cannot filter
these out easily. An option, however, might to render the PDFs in a
server-side sandbox. Convert to Flash (á la scribd.com) or to plain
images would be possible options. Or redistill (e.g. using ghostscript)
to get out javascript and reproduce compressed streams and stuff.
This is why you should run the file through your favorite Anti-Virus SW.
Clam-AV has a socket interface you can run the file through. I sell a
calendar application that supports attachments. For the hosted version,
every uploaded file gets scanned for problems.

--
George Sexton
http://www.mhsoftware.com/connectdaily.htm
Sep 11 '08 #14
..oO(Bill H)
>If by executable the OP meant being able to upload a "pdf" that was in
actuality a php (or other) script that could be run on their server,
he could obfuscate the file name so that the uploader 1, can't find it
and 2, it won't execute. I do this with file uploads on sites I host,
I prepend %%%. and append .%%% to the filename when storing it so that
the server won't see it as a script. Also I store it outside of the
webspace.
If stored outside the document root, you don't have to obfuscate the
filename. The server can't reach the file directly, so he can't execute
it. The only risk is if uploaded files are stored within the document
root and directly reachable via a URL - this may become a huge security
hole.

Micha
Sep 12 '08 #15
Hi,

Michael Fesser wrote:
If stored outside the document root, you don't have to obfuscate the
filename. The server can't reach the file directly, so he can't
execute it.
The server can, but the client cannot. Except he can guess a
exploitable parameter, say, something that ends without sanity check in
a include()/require().
The only risk is if uploaded files are stored within the
document root and directly reachable via a URL - this may become a
huge security hole.
True.

-hwh
Sep 12 '08 #16
Hans-Werner Hilse wrote:
The server can, but the client cannot. Except he can guess a
exploitable parameter, say, something that ends without sanity check in
a include()/require().
There is no reason for doit! ;-)

I know there are a lot of websites with this and other bad security
holes but this sites mostly comes with a lot of problems.

Often its not necessary to think so complicated for break this sites.

MfG, Ulf
Sep 12 '08 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: GAK | last post by:
Hello, Can anyone help me to generate a regular expression for a file that looks like this: "-rw-r--r-- 1 ftponly 38 Jul 12 11:20 testfile.ctl" its failing to validate with this...
0
by: Don | last post by:
I'm looking for a quick method via some .NET code to validate that .doc files that I receive for processing are valid (or at least have some high likelyhood of being valid) Word documents. I've got...
7
by: Chris Kennedy | last post by:
Does anyone know a regular expression that will validate the file extension but also allow multiple file extensions if necessary. It also needs to be case insensitive. Basically, what I want is to...
3
by: sheeeng | last post by:
Hi all, I know that we can check for invalid file name characters as shown at http://www.codeproject.com/useritems/reallyusefulpath.asp. But how can we check a string for invalid file name...
6
by: Mohammad Omer | last post by:
Hi, I tried to validate file path without calling file creating functions. Is it possible? How? Regards, -aims
0
by: CesarLV | last post by:
Hi... I hope you could help me with this, is the first time I use a setup project, my application includes framework, SQL Server 2005, my application, etc.... If one of these files already exists,...
4
by: ghjk | last post by:
I want to validate my php site. I created javascript file for the validate functions. Eg: function validateFormOnSubmit(theForm) { var reason = ""; reason +=...
4
by: govind161986 | last post by:
How do I validate a file size using javascript? Thanks in advance, Govind
2
by: lenniekuah | last post by:
Hullo Good Guys, I have an intesting problem and I need your help. Please help me. I am using Vb.NET2008 to develop Window Application. When the user enter the File Name on the textbox, I need to...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.