473,402 Members | 2,064 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,402 software developers and data experts.

I am looking for some python script to compare two files

hi:

The file can be PDF or Word format. Any help?

thx

Nov 9 '05 #1
2 2902
Hello david,
The file can be PDF or Word format. Any help?

If just like to know if they differ you can compare their md5 signature (or
any other digital signature).

If you want a real diff then convert them to text.
* For PDF you can use pdftotext (comes with xpdf) or Acrobat COM object (if
you're on windows). There are also some commercial pdf2txt programs.
* For word you can use antiword and friends and again the word COM object
if you're on windows.

See the diffutils package for diffing text files.

HTH.
--
------------------------------------------------------------------------
Miki Tebeka <mt*****@qualcomm.com>
http://tebeka.bizhat.com
The only difference between children and adults is the price of the toys

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)

iD8DBQFDcaVV8jAdENsUuJsRAvP/AJ43qqe4Z1roQOklxodhtqZtjmcAcQCeNLN9
bMAQZ23sJCewCYW53CcVjdw=
=ycXe
-----END PGP SIGNATURE-----

Nov 9 '05 #2
On 8 Nov 2005 19:35:07 -0800, "david" <yo************@gmail.com>
declaimed the following in comp.lang.python:
hi:

The file can be PDF or Word format. Any help?
Install an ASCII-only print driver
Print "to file" using this driver
Compare text files.

PDF is a variation of PostScript -- that is, the contents are a
specially optimized programming language for rendering text. The text
could be identical but all the code surrounding it could be different.
(I've seen PostScript drivers for word processors vary between passing a
single line of text to a function that then character spaced the text
for justification, vs another that computed starting locations for each
word, passing words to the rendering function).

Word documents, too, could "look" identical when printed, but be
completely different internally. Word files may have things like linked
sections as one has edited, and remnants of such things as unused style
tags.

-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.netcom.com/> <

Nov 9 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: WX | last post by:
I love Python, and the unicode support is wonderful. The character set I am using is the Hindi/Devanagari character set at unicode range U+901.) I have TWO newbie questions: (#1) If I paste...
1
by: Adriaan Renting | last post by:
I think the point you want to make is that Python needs vastly less lines of code as a similar application written in C++. I think Python might on average be 50-60% of comparable C++ code, but not...
0
by: yys2000 | last post by:
hi: I want to compare two PDF or WORD files. Any Help? thx
2
by: Kenneth McDonald | last post by:
I'm not trying to persuade my company to offer Python as a scripting language for their product, but I am trying to give them examples of things that Python can do easily that cannot be done easily...
2
by: yinglcs | last post by:
I am new to python. How can I compare if 2 files has duplicate entries in python? Is there an example for that? What if the files are big and I don't want to read the whole file in memory. ...
0
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 417 open ( -6) / 3565 closed (+12) / 3982 total ( +6) Bugs : 960 open ( -3) / 6498 closed (+19) / 7458 total (+16) RFE : 266 open...
20
by: ram.rachum | last post by:
Hey, I'm looking for a good Python environment. That is, at least an editor and a debugger, and it should run on Windows. Does anyone have any idea?
1
by: bruce | last post by:
hi... new to python, and can't seem to find an answer to this via google.. of course i'm not even sure what to callit.. so i might be searching in the wrong places... a python script foo.py,...
0
by: Robert Kern | last post by:
dudeja.rajat@gmail.com wrote: There are a couple of ways to do #3. One would be to use the difflib module from the standard library. The Differ.compare() method will give you a sequence of lines...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.