473,382 Members | 1,635 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

filecmp.cmp() cache

Hello!

I have a question about filecmp.cmp(). The short code snippet blow
does not bahave as I would expect:

import filecmp

f0 = "foo.dat"
f1 = "bar.dat"

f = open(f0, "w")
f.write("1:2")
f.close()

f = open(f1, "w")
f.write("1:2")
f.close()

print "cmp 1: " + str(filecmp.cmp(f0, f1, False))

f = open(f1, "w")
f.write("2:3")
f.close()

print "cmp 2: " + str(filecmp.cmp(f0, f1, False))

I would expect the second comparison to return False instead of True.
Looking at the docs for filecmp.cmp() I found the following: "This
function uses a cache for past comparisons and the results, with a
cache invalidation mechanism relying on stale signatures.". I guess
that this is the reason for my test case failing.

Is there someone here that can tell me how I should invalidate this
cache? If that is not possible, what workaround could I use? I guess
that I can write my own file comparison function, but I would not like
to have to do that since we have filecmp.

Any ideas?

Regards,
Mattias

Feb 15 '07 #1
5 2419
Mattias Brändström wrote:
I have a question about filecmp.cmp(). The short code snippet blow
does not bahave as I would expect:

import filecmp

f0 = "foo.dat"
f1 = "bar.dat"

f = open(f0, "w")
f.write("1:2")
f.close()

f = open(f1, "w")
f.write("1:2")
f.close()

print "cmp 1: " + str(filecmp.cmp(f0, f1, False))

f = open(f1, "w")
f.write("2:3")
f.close()

print "cmp 2: " + str(filecmp.cmp(f0, f1, False))

I would expect the second comparison to return False instead of True.
Looking at the docs for filecmp.cmp() I found the following: "This
function uses a cache for past comparisons and the results, with a
cache invalidation mechanism relying on stale signatures.". I guess
that this is the reason for my test case failing.

Is there someone here that can tell me how I should invalidate this
cache? If that is not possible, what workaround could I use? I guess
that I can write my own file comparison function, but I would not like
to have to do that since we have filecmp.

Any ideas?
You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.
If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(self, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()
Alternatively an update to Python 2.5 might work as the type of
os.stat(filename).st_mtime was changed from int to float and now offers
subsecond resolution.

Peter
Feb 15 '07 #2
On Feb 15, 5:56 pm, Peter Otten <__pete...@web.dewrote:
You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.
You are right, a quick glance would have enlighten me. Next time I
will RTFS first. :-)
If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(self, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()
Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

:.:: mattias

Feb 15 '07 #3
Mattias Brändström wrote:
On Feb 15, 5:56 pm, Peter Otten <__pete...@web.dewrote:
>You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.

You are right, a quick glance would have enlighten me. Next time I
will RTFS first. :-)
>If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(self, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()

Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?
I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.

Peter

Feb 15 '07 #4
On Feb 15, 11:43 pm, Peter Otten <__pete...@web.dewrote:
Mattias Brändström wrote:
Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.
Thanks for the insight! Right now I need this for a unit test, so in
this case I'm quite happy to use the NoCache solution you suggested.

:.:: brasse

Feb 15 '07 #5
Peter Otten wrote:
Mattias Brändström wrote:
>On Feb 15, 5:56 pm, Peter Otten <__pete...@web.dewrote:
>>You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.
You are right, a quick glance would have enlighten me. Next time I
will RTFS first. :-)
>>If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(self, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()
Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.
It would probably be a good idea to add a clear_cache() function to the
module API for 2.6 to avoid such issues.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

Feb 16 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Borek | last post by:
In my project I have to access in one business method of session bean usually more then 10 CMP Beans. I would like to have some utils classes, which could get me right instance of CMP or create...
3
by: Ivan Van Laningham | last post by:
Hi All-- I noticed recently that a few of the jpgs from my digital cameras have developed bitrot. Not a real problem, because the cameras are CD Mavicas, and I can simply copy the original from...
1
by: iqbal | last post by:
Hi Folks, If someone could please help me out with my frustration trying to install the Visual Age C++ libraries (vacpp.cmp.lib) V5.0.2 on AIX 5.1, ML1. I keep getting the following message from...
39
by: Antoon Pardon | last post by:
I was wondering how people would feel if the cmp function and the __cmp__ method would be a bit more generalised. The problem now is that the cmp protocol has no way to indicate two objects are...
4
by: Schüle Daniel | last post by:
Hello, first question In : cmp("ABC",) Out: 1 against what part of the list is the string "ABC" compared? second question
26
by: Ping | last post by:
Hi, I'm wondering if it is useful to extend the count() method of a list to accept a callable object? What it does should be quite intuitive: count the number of items that the callable returns...
6
by: xkenneth | last post by:
Looking to do something similair. I'm working with alot of timestamps and if they're within a couple seconds I need them to be indexed and removed from a list. Is there any possible way to index...
9
by: George Sakkis | last post by:
I want to sort sequences of strings lexicographically but those with longer prefix should come earlier, e.g. for s = , the sorted sequence is . Currently I do it with: s.sort(cmp=lambda x,y: 0 if...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.