473,402 Members | 2,072 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,402 software developers and data experts.

Slight discrepancy with filecmp.cmp

Hi All--
I noticed recently that a few of the jpgs from my digital cameras have
developed bitrot. Not a real problem, because the cameras are CD
Mavicas, and I can simply copy the original from the cd. Except for the
fact that I've got nearly 25,000 images to check. So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong? If I am, what then is the source
of the problem in my jpg images where it looks like a bit or two has
been shifted or added; suddenly, there's a line going through the
picture above which it's normal, and below it either the color has
changed (usually to pinkish) or the remaining raster lines are all
shifted either right or left?

Any ideas?

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/worksh...oceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
Jul 19 '05 #1
3 1719
On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
<iv*****@pauahtun.org> wrote:
[snip]
So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong?


According to the docs:

"""
cmp( f1, f2[, shallow[, use_statcache]])

Compare the files named f1 and f2, returning True if they seem equal,
False otherwise.
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal
"""

and what is an os.stat() signature, you ask? So did I.

According to the code itself:

def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)

Looks like it assumes two files are the same if they are of the same
type, same size, and same time-last-modified. Normally I guess that's
good enough, but maybe the phantom bit-toggler is bypassing the file
system somehow. What OS are you running?

You might like to do two things: (1) run your comparison again with
shallow=False (2) submit a patch to the docs.

(-:
You have of course attempted to eliminate other variables by checking
that the bit-rot effect is apparent using different display software,
a different computer, an observer who's not on the same medication as
you, ... haven't you?
:-)
HTH,
John

Jul 19 '05 #2
Hi All--

John Machin wrote:

On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
<iv*****@pauahtun.org> wrote:
[snip]
So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong?
According to the docs:

"""
cmp( f1, f2[, shallow[, use_statcache]])

Compare the files named f1 and f2, returning True if they seem equal,
False otherwise.
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal
"""

and what is an os.stat() signature, you ask? So did I.

According to the code itself:

def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)

Looks like it assumes two files are the same if they are of the same
type, same size, and same time-last-modified. Normally I guess that's
good enough, but maybe the phantom bit-toggler is bypassing the file
system somehow. What OS are you running?


WinXP, SP2
You might like to do two things: (1) run your comparison again with
shallow=False (2) submit a patch to the docs.

You know, I read that doc, tried it, and it made absolutely no
difference. Then I read your message, read the docs again, and finally
realized I had flipped the sense of shallow in my head. Sheesh. So
then I tried it with shallow=False, not True, and it runs about ten
times slower, but it works. Beautifully.

Now I have to go back and redo the first five thousand, but it's worth
it. Thanks. Shows how much you need another set of eyeballs to debug
your brain;-)
(-:
You have of course attempted to eliminate other variables by checking
that the bit-rot effect is apparent using different display software,
a different computer, an observer who's not on the same medication as
you, ... haven't you?
:-)


;-) Absolutely. Several different viewers and several different OSs.
And my wife never sees anything the way I do;-)

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/worksh...oceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
Jul 19 '05 #3
On Mon, 18 Apr 2005 09:02:44 -0600,
Ivan Van Laningham <iv*****@pauahtun.org> wrote:
... Shows how much you need another set of eyeballs to debug your
brain;-)
+1 QOTW
... And my wife never sees anything the way I do;-)


There's probably a rude joke in there somewhere about your wife's eyes
debugging your brain, but since I would like to remain married, I will
not make it. :-/

Regards,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>
μ₀ × ε₀ × c² = 1
Jul 19 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Robert Brewer | last post by:
I've got a webapp which is behaving oddly. I want to include an image on the webpage, but only if the physical file exists; if it doesn't, I don't want an img element to generate an ugly blank box....
3
by: Tina Harris | last post by:
I ran the following query in Query Analyzer for a 7 column table. SELECT c.name,c.colid FROM syscolumns c WHERE c.id=925962375 ORDER BY c.colid The results were: I_CSD 1 X_STE_XML 2...
3
by: Shmuel (Seymour J.) Metz | last post by:
I've got a document that StarOffice converted into HTML, and I'm trying to clean up the cuft that it generated. I put together some basic styles, and tied using them: the results are puzzling. The...
1
by: Csaba Gabor | last post by:
In the following (IE only) mini page (a table which is supposed to have a fixed header), when you first do anything with the scroll bar, you will see the top two rows of the table jump slightly (a...
1
by: KiwiBrian | last post by:
On my page at http://www.hibiscuslink.co.nz/rbsl/ when viewed in IE6 there is a desired dark brown border surrounding the horizontal menu. When viewed in FF the border seems to only be displayed...
1
by: Pr | last post by:
Hi - I have a code that works very well on 4.2, but fails on 5.3. Here's the description of the problem. I use 'gperf' generated C++ files in my software that give diffrent behavior with CC...
4
by: JK | last post by:
In my Windows form, I have a discrepancy between what the CrystaReport viewer displays and what the dataset contains. The Report displays data from three fields from same table but the...
5
by: =?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?= | last post by:
Hello! I have a question about filecmp.cmp(). The short code snippet blow does not bahave as I would expect: import filecmp f0 = "foo.dat" f1 = "bar.dat"
2
by: nicola | last post by:
hello, I need a simple and slight forum write in c# exume me, for the ot, but I'm desperated
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.