473,597 Members | 2,196 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Is there a way to validate a pdf file?

I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
Sep 10 '08 #1
16 14349
*** firewoodtim escribió/wrote (Wed, 10 Sep 2008 16:53:51 -0400):
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
On every upload it's always a good idea to do some server-side MIME type
guessing. In PHP you have a deprecated builtin function:

http://es2.php.net/manual/en/functio...ntent-type.php

.... and a PECL extension (see link in the above page)

If it's a Unix host, you can probably use the excellent "file" program:

$ file -bi README
text/plain; charset=utf-8

It's easy to execute it from PHP.
Fighting malicious uploads is another field of knowledge.
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor en cubitos: http://www.demogracia.com
--
Sep 10 '08 #2
On Sep 10, 4:53*pm, firewoodtim <firewood...@ca vtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
Sep 10 '08 #3
On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us >
wrote:
>On Sep 10, 4:53*pm, firewoodtim <firewood...@ca vtel.netwrote:
>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters " = %PDF
Sep 10 '08 #4
On Wed, 10 Sep 2008 17:22:24 -0400, firewoodtim
<fi*********@ca vtel.netwrote:
>On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us >
wrote:
>>On Sep 10, 4:53*pm, firewoodtim <firewood...@ca vtel.netwrote:
>>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H

This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters " = %PDF
OOPS, I meant %PDF-
Sep 10 '08 #5
Bill H wrote:
On Sep 10, 4:53 pm, firewoodtim <firewood...@ca vtel.netwrote:
>I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?

You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
Unless someone faked the first 5 characters...

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Sep 10 '08 #6
firewoodtim wrote:
On Wed, 10 Sep 2008 17:22:24 -0400, firewoodtim
<fi*********@ca vtel.netwrote:
>On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <bi**@ts1000.us >
wrote:
>>On Sep 10, 4:53 pm, firewoodtim <firewood...@ca vtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )

Bill H
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters " = %PDF

OOPS, I meant %PDF-
Not in PHP or Perl, anyway. But not knowing which languages are
available on your server makes it difficult to tell.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Sep 10 '08 #7
On Sep 10, 5:22*pm, firewoodtim <firewood...@ca vtel.netwrote:
On Wed, 10 Sep 2008 14:17:15 -0700 (PDT), Bill H <b...@ts1000.us >
wrote:
On Sep 10, 4:53*pm, firewoodtim <firewood...@ca vtel.netwrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension? * *
You can check the first 5 characters of the file, they contain %PDF-
if it is a real pdf. The next 4 contain the version (ie 1.2 )
Bill H

This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters " = *%PDF
Firewood - It would be hard since the first few bytes in an executable
are important also.

I pulled that from the pdf spec on http://www.wotsit.org/list.asp?search=pdf.
There are more "magic" things in a pdf that you can check also.

Jerry - technically you could check it in perl if you use PDF:API2 to
import a page from the pdf, you will get an error on non-pdf pages
(may not be best way of checking).

Bill H
Sep 10 '08 #8
firewoodtim wrote:
I want to give my customers a way to upload pdf's onto their sites. Is
there any script that can be used to distinguish between a true pdf
file and a malicious file with a ".pdf" extension?
I had some other issues with PDF: the fact that my site (a reporting
tool where you can upload attachments to a generated PDF file) cannot
deal with encrypted PDFs. As I use the FPDF and FPDI libraries to
generate the files and include the attachments, my upload checking
routine was fairly simple:
- Create an empty PDF document,
- attach the uploaded file.
If that does not generate any errors, the uploaded file is a valid PDF.

Best regards.
Sep 10 '08 #9
firewoodtim <fi*********@ca vtel.netwrote:
>
This probably brands me as a newbie, but is it possible to create an
executable file with the first five "characters " = %PDF
Not on Linux. On Windows, I suppose it is theoretically possible to create
a .COM file that starts with those bytes, but if it has an extension of
..PDF, Windows will hand it to Acrobat for processing. It won't try to
execut it as an application.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Sep 11 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
940
by: GAK | last post by:
Hello, Can anyone help me to generate a regular expression for a file that looks like this: "-rw-r--r-- 1 ftponly 38 Jul 12 11:20 testfile.ctl" its failing to validate with this expression @"(?<dir>)(?<permission>(){3})\s+\d+\s+\d+\s+(?<size>\d+)\s+(?<timestamp>\w+\s+\d+\s+\d{4})\s+(?<name>.+)" Thanks, GAK
0
1404
by: Don | last post by:
I'm looking for a quick method via some .NET code to validate that .doc files that I receive for processing are valid (or at least have some high likelyhood of being valid) Word documents. I've got a method to determine if PDF and TIFF files that I receive are likely good files by looking at the first few bytes of each of those file types. Is there an equivalent type check for Office files, especially Word documents. I'd like to avoid...
7
25370
by: Chris Kennedy | last post by:
Does anyone know a regular expression that will validate the file extension but also allow multiple file extensions if necessary. It also needs to be case insensitive. Basically, what I want is to validate a file input box to check if the extension is the correct type, i.e. .doc for a Word Document etc. Also I would like to check multiple file types, for instance allow a gif or a jpeg or a jpg. Regards, Chris.
3
18090
by: sheeeng | last post by:
Hi all, I know that we can check for invalid file name characters as shown at http://www.codeproject.com/useritems/reallyusefulpath.asp. But how can we check a string for invalid file name such as CLOCK$, AUX, CON, NUL, PRN, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9? There is a C++ implementation of it at
6
9246
by: Mohammad Omer | last post by:
Hi, I tried to validate file path without calling file creating functions. Is it possible? How? Regards, -aims
0
935
by: CesarLV | last post by:
Hi... I hope you could help me with this, is the first time I use a setup project, my application includes framework, SQL Server 2005, my application, etc.... If one of these files already exists, the setup stops. How can I specify the "if exists-overwrite"? Excuse my English Regards...
4
2327
by: ghjk | last post by:
I want to validate my php site. I created javascript file for the validate functions. Eg: function validateFormOnSubmit(theForm) { var reason = ""; reason += validateEmpty(theForm.from);
4
2564
by: govind161986 | last post by:
How do I validate a file size using javascript? Thanks in advance, Govind
2
3237
by: lenniekuah | last post by:
Hullo Good Guys, I have an intesting problem and I need your help. Please help me. I am using Vb.NET2008 to develop Window Application. When the user enter the File Name on the textbox, I need to check the file name to ensure that it has the File Extension. (Eg. Sales.XLS) I have not done the coding before and stuggling with it. Please help me with source coding. Thank You. Cheers,
0
7969
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8381
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8035
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8258
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6688
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
5847
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5431
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
2404
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1494
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.