By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,587 Members | 1,702 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,587 IT Pros & Developers. It's quick & easy.

Slight discrepancy with filecmp.cmp

P: n/a
Hi All--
I noticed recently that a few of the jpgs from my digital cameras have
developed bitrot. Not a real problem, because the cameras are CD
Mavicas, and I can simply copy the original from the cd. Except for the
fact that I've got nearly 25,000 images to check. So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong? If I am, what then is the source
of the problem in my jpg images where it looks like a bit or two has
been shifted or added; suddenly, there's a line going through the
picture above which it's normal, and below it either the color has
changed (usually to pinkish) or the remaining raster lines are all
shifted either right or left?

Any ideas?

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/worksh...oceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
Jul 19 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
<iv*****@pauahtun.org> wrote:
[snip]
So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong?


According to the docs:

"""
cmp( f1, f2[, shallow[, use_statcache]])

Compare the files named f1 and f2, returning True if they seem equal,
False otherwise.
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal
"""

and what is an os.stat() signature, you ask? So did I.

According to the code itself:

def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)

Looks like it assumes two files are the same if they are of the same
type, same size, and same time-last-modified. Normally I guess that's
good enough, but maybe the phantom bit-toggler is bypassing the file
system somehow. What OS are you running?

You might like to do two things: (1) run your comparison again with
shallow=False (2) submit a patch to the docs.

(-:
You have of course attempted to eliminate other variables by checking
that the bit-rot effect is apparent using different display software,
a different computer, an observer who's not on the same medication as
you, ... haven't you?
:-)
HTH,
John

Jul 19 '05 #2

P: n/a
Hi All--

John Machin wrote:

On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
<iv*****@pauahtun.org> wrote:
[snip]
So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version. Works fine.
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot. I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison. Am I wrong?
According to the docs:

"""
cmp( f1, f2[, shallow[, use_statcache]])

Compare the files named f1 and f2, returning True if they seem equal,
False otherwise.
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal
"""

and what is an os.stat() signature, you ask? So did I.

According to the code itself:

def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)

Looks like it assumes two files are the same if they are of the same
type, same size, and same time-last-modified. Normally I guess that's
good enough, but maybe the phantom bit-toggler is bypassing the file
system somehow. What OS are you running?


WinXP, SP2
You might like to do two things: (1) run your comparison again with
shallow=False (2) submit a patch to the docs.

You know, I read that doc, tried it, and it made absolutely no
difference. Then I read your message, read the docs again, and finally
realized I had flipped the sense of shallow in my head. Sheesh. So
then I tried it with shallow=False, not True, and it runs about ten
times slower, but it works. Beautifully.

Now I have to go back and redo the first five thousand, but it's worth
it. Thanks. Shows how much you need another set of eyeballs to debug
your brain;-)
(-:
You have of course attempted to eliminate other variables by checking
that the bit-rot effect is apparent using different display software,
a different computer, an observer who's not on the same medication as
you, ... haven't you?
:-)


;-) Absolutely. Several different viewers and several different OSs.
And my wife never sees anything the way I do;-)

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/worksh...oceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
Jul 19 '05 #3

P: n/a
On Mon, 18 Apr 2005 09:02:44 -0600,
Ivan Van Laningham <iv*****@pauahtun.org> wrote:
... Shows how much you need another set of eyeballs to debug your
brain;-)
+1 QOTW
... And my wife never sees anything the way I do;-)


There's probably a rude joke in there somewhere about your wife's eyes
debugging your brain, but since I would like to remain married, I will
not make it. :-/

Regards,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>
μ₀ × ε₀ × c² = 1
Jul 19 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.