473,908 Members | 6,609 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

filecmp.cmp() cache

Hello!

I have a question about filecmp.cmp(). The short code snippet blow
does not bahave as I would expect:

import filecmp

f0 = "foo.dat"
f1 = "bar.dat"

f = open(f0, "w")
f.write("1:2")
f.close()

f = open(f1, "w")
f.write("1:2")
f.close()

print "cmp 1: " + str(filecmp.cmp (f0, f1, False))

f = open(f1, "w")
f.write("2:3")
f.close()

print "cmp 2: " + str(filecmp.cmp (f0, f1, False))

I would expect the second comparison to return False instead of True.
Looking at the docs for filecmp.cmp() I found the following: "This
function uses a cache for past comparisons and the results, with a
cache invalidation mechanism relying on stale signatures.". I guess
that this is the reason for my test case failing.

Is there someone here that can tell me how I should invalidate this
cache? If that is not possible, what workaround could I use? I guess
that I can write my own file comparison function, but I would not like
to have to do that since we have filecmp.

Any ideas?

Regards,
Mattias

Feb 15 '07 #1
5 2442
Mattias Brändström wrote:
I have a question about filecmp.cmp(). The short code snippet blow
does not bahave as I would expect:

import filecmp

f0 = "foo.dat"
f1 = "bar.dat"

f = open(f0, "w")
f.write("1:2")
f.close()

f = open(f1, "w")
f.write("1:2")
f.close()

print "cmp 1: " + str(filecmp.cmp (f0, f1, False))

f = open(f1, "w")
f.write("2:3")
f.close()

print "cmp 2: " + str(filecmp.cmp (f0, f1, False))

I would expect the second comparison to return False instead of True.
Looking at the docs for filecmp.cmp() I found the following: "This
function uses a cache for past comparisons and the results, with a
cache invalidation mechanism relying on stale signatures.". I guess
that this is the reason for my test case failing.

Is there someone here that can tell me how I should invalidate this
cache? If that is not possible, what workaround could I use? I guess
that I can write my own file comparison function, but I would not like
to have to do that since we have filecmp.

Any ideas?
You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.
If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(sel f, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()
Alternatively an update to Python 2.5 might work as the type of
os.stat(filenam e).st_mtime was changed from int to float and now offers
subsecond resolution.

Peter
Feb 15 '07 #2
On Feb 15, 5:56 pm, Peter Otten <__pete...@web. dewrote:
You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.
You are right, a quick glance would have enlighten me. Next time I
will RTFS first. :-)
If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(sel f, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()
Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

:.:: mattias

Feb 15 '07 #3
Mattias Brändström wrote:
On Feb 15, 5:56 pm, Peter Otten <__pete...@web. dewrote:
>You can clear the cache with

filecmp._cac he = {}

as a glance into the filecmp module would have shown.

You are right, a quick glance would have enlighten me. Next time I
will RTFS first. :-)
>If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(sel f, key, value):
pass
def get(self, key):
return None
filecmp._cac he = NoCache()

Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?
I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.

Peter

Feb 15 '07 #4
On Feb 15, 11:43 pm, Peter Otten <__pete...@web. dewrote:
Mattias Brändström wrote:
Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.
Thanks for the insight! Right now I need this for a unit test, so in
this case I'm quite happy to use the NoCache solution you suggested.

:.:: brasse

Feb 15 '07 #5
Peter Otten wrote:
Mattias Brändström wrote:
>On Feb 15, 5:56 pm, Peter Otten <__pete...@web. dewrote:
>>You can clear the cache with

filecmp._cach e = {}

as a glance into the filecmp module would have shown.
You are right, a quick glance would have enlighten me. Next time I
will RTFS first. :-)
>>If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(sel f, key, value):
pass
def get(self, key):
return None
filecmp._cach e = NoCache()
Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.
It would probably be a good idea to add a clear_cache() function to the
module API for 2.6 to avoid such issues.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

Feb 16 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1838
by: Borek | last post by:
In my project I have to access in one business method of session bean usually more then 10 CMP Beans. I would like to have some utils classes, which could get me right instance of CMP or create one. Instead of making the same thing again and again. My proposition looks like: 1) Class for taking home interface: public final class JNDILookup { private static Context initialContext; private static HashMap homeInterfaces; private static...
3
1744
by: Ivan Van Laningham | last post by:
Hi All-- I noticed recently that a few of the jpgs from my digital cameras have developed bitrot. Not a real problem, because the cameras are CD Mavicas, and I can simply copy the original from the cd. Except for the fact that I've got nearly 25,000 images to check. So I wrote a set of programs to both index the disk versions with the cd versions, and to compare, using filecmp.cmp(), the cd and disk version. Works fine. Turned up...
1
2543
by: iqbal | last post by:
Hi Folks, If someone could please help me out with my frustration trying to install the Visual Age C++ libraries (vacpp.cmp.lib) V5.0.2 on AIX 5.1, ML1. I keep getting the following message from installp (using smitty): /vacpp.cmp.lib.post_i: TYPERML: 0403-009 The specified number is not valid for this command. instal: Failed while executing the ./vacpp.cmp.lib.post_i script.
39
2834
by: Antoon Pardon | last post by:
I was wondering how people would feel if the cmp function and the __cmp__ method would be a bit more generalised. The problem now is that the cmp protocol has no way to indicate two objects are incomparable, they are not equal but neither is one less or greater than the other. So I thought that either cmp could return None in this case or throw a specific exception. People writing a __cmp__ method could do the same.
4
1539
by: Schüle Daniel | last post by:
Hello, first question In : cmp("ABC",) Out: 1 against what part of the list is the string "ABC" compared? second question
26
2472
by: Ping | last post by:
Hi, I'm wondering if it is useful to extend the count() method of a list to accept a callable object? What it does should be quite intuitive: count the number of items that the callable returns True or anything logically equivalent (non-empty sequence, non-zero number, etc). This would return the same result as len(filter(a_callable, a_list)), but without constructing an intermediate list which is thrown away after len() is done.
6
2191
by: xkenneth | last post by:
Looking to do something similair. I'm working with alot of timestamps and if they're within a couple seconds I need them to be indexed and removed from a list. Is there any possible way to index with a custom cmp() function? I assume it would be something like... list.index(something,mycmp) Thanks!
9
1841
by: George Sakkis | last post by:
I want to sort sequences of strings lexicographically but those with longer prefix should come earlier, e.g. for s = , the sorted sequence is . Currently I do it with: s.sort(cmp=lambda x,y: 0 if x==y else -1 if x.startswith(y) else +1 if y.startswith(x) else cmp(x,y)) Can this be done with an equivalent key function instead of cmp ?
0
10031
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9875
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
11042
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10536
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
8094
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5930
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6134
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4770
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3355
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.