By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,751 Members | 1,156 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,751 IT Pros & Developers. It's quick & easy.

Is there a way to validate a pdf file?

P: n/a
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
Sep 10 '08 #1
Share this Question
Share on Google+
16 Replies


P: n/a
*** firewoodtim escribió/wrote (Wed, 10 Sep 2008 16:53:51 -0400):
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
On every upload it's always a good idea to do some server-side MIME type
guessing. In PHP you have a deprecated builtin function:

http://es2.php.net/manual/en/functio...ntent-type.php

.... and a PECL extension (see link in the above page)

If it's a Unix host, you can probably use the excellent "file" program:

$ file -bi README
text/plain; charset=utf-8

It's easy to execute it from PHP.
Fighting malicious uploads is another field of knowledge.
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor en cubitos: http://www.demogracia.com
--
Sep 10 '08 #2

P: n/a
On Sep 10, 4:53*pm, firewoodtim <firewood...@cavtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
Sep 10 '08 #3

P: n/a
On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us>
wrote:
>On Sep 10, 4:53*pm, firewoodtim <firewood...@cavtel.netwrote:
>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF
Sep 10 '08 #4

P: n/a
On Wed, 10 Sep 2008 17:22:24 -0400, firewoodtim
<fi*********@cavtel.netwrote:
>On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us>
wrote:
>>On Sep 10, 4:53*pm, firewoodtim <firewood...@cavtel.netwrote:
>>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H

This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF
OOPS, I meant %PDF-
Sep 10 '08 #5

P: n/a
Bill H wrote:
On Sep 10, 4:53 pm, firewoodtim <firewood...@cavtel.netwrote:
>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
Unless someone faked the first 5 characters...

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Sep 10 '08 #6

P: n/a
firewoodtim wrote:
On Wed, 10 Sep 2008 17:22:24 -0400, firewoodtim
<fi*********@cavtel.netwrote:
>On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us>
wrote:
>>On Sep 10, 4:53 pm, firewoodtim <firewood...@cavtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF

OOPS, I meant %PDF-
Not in PHP or Perl, anyway. But not knowing which languages are
available on your server makes it difficult to tell.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Sep 10 '08 #7

P: n/a
On Sep 10, 5:22*pm, firewoodtim <firewood...@cavtel.netwrote:
On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <b...@ts1000.us>
wrote:
On Sep 10, 4:53*pm, firewoodtim <firewood...@cavtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )
Bill H

This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = *%PDF
Firewood - It would be hard since the first few bytes in an executable
are important also.

I pulled that from the pdf spec on http://www.wotsit.org/list.asp?search=pdf.
There are more "magic" things in a pdf that you can check also.

Jerry - technically you could check it in perl if you use PDF:API2 to
import a page from the pdf, you will get an error on non-pdf pages
(may not be best way of checking).

Bill H
Sep 10 '08 #8

P: n/a
firewoodtim wrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
I had some other issues with PDF: the fact that my site (a reporting
tool where you can upload attachments to a generated PDF file) cannot
deal with encrypted PDFs. As I use the FPDF and FPDI libraries to
generate the files and include the attachments, my upload checking
routine was fairly simple:
- Create an empty PDF document,
- attach the uploaded file.
If that does not generate any errors, the uploaded file is a valid PDF.

Best regards.
Sep 10 '08 #9

P: n/a
firewoodtim <fi*********@cavtel.netwrote:
>
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF
Not on Linux. On Windows, I suppose it is theoretically possible to create
a .COM file that starts with those bytes, but if it has an extension of
..PDF, Windows will hand it to Acrobat for processing. It won't try to
execut it as an application.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Sep 11 '08 #10

P: n/a
Isn't the whole point:

Is it possible to upload a malformed pdf file, so that a weakness in the pdf
handler can be eploited. So, the file may not be a true pdf, but it starts
off with %PDF, but then contains arbitary 'code' that will/may end up being
executed by the pdf handler.

So, I assumed what the poster wanted to know was: is there a way of
pre-parsing an entire pdf file to determine that it is wholely to spec and
truly a valid and clean pdf file.
"Tim Roberts" <ti**@probo.comwrote in message
news:6s********************************@4ax.com...
firewoodtim <fi*********@cavtel.netwrote:
>>
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters" = %PDF

Not on Linux. On Windows, I suppose it is theoretically possible to
create
a .COM file that starts with those bytes, but if it has an extension of
.PDF, Windows will hand it to Acrobat for processing. It won't try to
execut it as an application.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.

Sep 11 '08 #11

P: n/a
Hi,

Joe Butler wrote:
So, I assumed what the poster wanted to know was: is there a way of
pre-parsing an entire pdf file to determine that it is wholely to
spec and truly a valid and clean pdf file.
there are of course commercial tools, used e.g. for prepress document
preparation, that would do (one of those is licensed by Adobe and built
into Acrobat, named »Preflight«). I don't know of a single free one.

There are some PDF processing tools which you can use to run a number
of tests. But that's not remotely the same test coverage. Other tools
might be metadata extractors like JHove.

However, for what the OP wants there is no tool. You can't keep the
»evil« PDFs out, since they are (in most part) valid, well-formed PDFs.
They just happen to exploit some security zone model bugs or trigger
other implementation glitches in common PDF readers. You cannot filter
these out easily. An option, however, might to render the PDFs in a
server-side sandbox. Convert to Flash (á la scribd.com) or to plain
images would be possible options. Or redistill (e.g. using ghostscript)
to get out javascript and reproduce compressed streams and stuff.

-hwh
Sep 11 '08 #12

P: n/a
Oops. Making it executable... I do not know enough about Linux to really
provide a detailed answer on this. But I know one can accomplish such a
thing, especially on an unsecured computer. I ran into a hosting company
once that proclaimed loudly that their computers were secure. They also
advertised it in all of their advertisements. However, when I downloaded
a particular file one day, my own virus scanners picked up a virus. I
did not place the file onto their servers. I noticed something funny
going on when I deleted it (connected via a FrontPage connection). The
file momentarily disappeared, and when I came back to that folder where
it existed, it reappeared again. So I sent a message to the people who
ran the servers, the normal support @ example dot com type email and
the guys who answered called me a liar and they stated that they ran
extensive antivirus... blah blah blah deny deny deny. So I asked for
management's email address, that guy replied in like fashion. I even
told them the file name, which folder it existed in, where it existed,
and fed it to them like you feed a child with a spoon. And I received
the same answer. Such goobs. I'm not going to mention the name of the
hosting company, as I do not desire to damage their reputation, even
though they had two such incidences that I witnessed. Lesson learned for
me, and there are ways to make a .pdf an executable, but it all depends
upon knowledge and skill, some of which I personally lack. It involves
more than I care to get involved in and I'm prepared to suggest but not
defend it, that even valid, real .pdf files can run as a script horror.

My apologies for not providing full details, as I personally do not
know the full details.

--
Jim Carlock
You Have More Than Five Senses
http://www.associatedcontent.com/art...ve_senses.html

Sep 11 '08 #13

P: n/a
Hans-Werner Hilse wrote:
However, for what the OP wants there is no tool. You can't keep the
»evil« PDFs out, since they are (in most part) valid, well-formed PDFs.
They just happen to exploit some security zone model bugs or trigger
other implementation glitches in common PDF readers. You cannot filter
these out easily. An option, however, might to render the PDFs in a
server-side sandbox. Convert to Flash (á la scribd.com) or to plain
images would be possible options. Or redistill (e.g. using ghostscript)
to get out javascript and reproduce compressed streams and stuff.
This is why you should run the file through your favorite Anti-Virus SW.
Clam-AV has a socket interface you can run the file through. I sell a
calendar application that supports attachments. For the hosted version,
every uploaded file gets scanned for problems.

--
George Sexton
http://www.mhsoftware.com/connectdaily.htm
Sep 11 '08 #14

P: n/a
..oO(Bill H)
>If by executable the OP meant being able to upload a "pdf" that was in
actuality a php (or other) script that could be run on their server,
he could obfuscate the file name so that the uploader 1, can't find it
and 2, it won't execute. I do this with file uploads on sites I host,
I prepend %%%. and append .%%% to the filename when storing it so that
the server won't see it as a script. Also I store it outside of the
webspace.
If stored outside the document root, you don't have to obfuscate the
filename. The server can't reach the file directly, so he can't execute
it. The only risk is if uploaded files are stored within the document
root and directly reachable via a URL - this may become a huge security
hole.

Micha
Sep 12 '08 #15

P: n/a
Hi,

Michael Fesser wrote:
If stored outside the document root, you don't have to obfuscate the
filename. The server can't reach the file directly, so he can't
execute it.
The server can, but the client cannot. Except he can guess a
exploitable parameter, say, something that ends without sanity check in
a include()/require().
The only risk is if uploaded files are stored within the
document root and directly reachable via a URL - this may become a
huge security hole.
True.

-hwh
Sep 12 '08 #16

P: n/a
Hans-Werner Hilse wrote:
The server can, but the client cannot. Except he can guess a
exploitable parameter, say, something that ends without sanity check in
a include()/require().
There is no reason for doit! ;-)

I know there are a lot of websites with this and other bad security
holes but this sites mostly comes with a lot of problems.

Often its not necessary to think so complicated for break this sites.

MfG, Ulf
Sep 12 '08 #17

This discussion thread is closed

Replies have been disabled for this discussion.