how to know if folder contents have changed

devnew

hi
i am trying to create a cache of digitized values of around 100
image files in a folder..In my program i would like to know from time
to time if a new image has been added or removed from the folder..

one scheme suggested was to create a string from the names of sorted
image files and give it as the cache name..
ie ,if i have one.jpg,three.jpg,new.jpg ,
i will name the cache as 'newonethree.cache' and everytime i want to
check if new image added/removed i wd create a string from the
contents of folder and compare it with cachename.

this scheme is ok for a small number of files,..

can someone suggest a better way? i know it is a general programming
problem..but i wish to know if a python solution exists

Nov 12 '07 #1

Subscribe Post Reply

4320

Marc 'BlackJack' Rintsch

On Sun, 11 Nov 2007 21:03:33 -0800, de****@gmail.com wrote:

one scheme suggested was to create a string from the names of sorted
image files and give it as the cache name..
ie ,if i have one.jpg,three.jpg,new.jpg ,
i will name the cache as 'newonethree.cache' and everytime i want to
check if new image added/removed i wd create a string from the
contents of folder and compare it with cachename.

this scheme is ok for a small number of files,..

Not really.

`xxx.jpg` -`xxx.cache`

Now `xxx.jpg` is deleted and `x.jpg` and `xx.jpg` are created.

`x.jpg`, `xx.jpg` -`xxx.cache`

can someone suggest a better way? i know it is a general programming
problem..but i wish to know if a python solution exists

Don't store the names in the cache file name but in the cache file. Take
a look at the `set()` type for operations to easily find out the
differences between two set of names and the `pickle` module to store
Python objects in files.

Ciao,
Marc 'BlackJack' Rintsch

Nov 12 '07 #2

Jorge Godoy

de****@gmail.com wrote:

can someone suggest a better way? i know it is a general programming
problem..but i wish to know if a python solution exists

Use pyfam. I believe all docs are in fam but it integrates with that.

Nov 12 '07 #3

davisn90210

On Nov 11, 11:03 pm, "dev...@gmail.com" <dev...@gmail.comwrote:

hi
i am trying to create a cache of digitized values of around 100
image files in a folder..In my program i would like to know from time
to time if a new image has been added or removed from the folder..

Why not use the file creation/modification timestamps?

Nov 12 '07 #4

Martin Marcher

2007/11/12, da*********@gmail.com <da*********@gmail.com>:

Why not use the file creation/modification timestamps?

because you'd have to

a) create a thread that pulls all the time for changes or
b) test everytime for changes

fam informs in a notification like way.

Personally I'd create a "hidden" cache file parsable by configparser
and have filename = $favorite_checksum_algo - key value pairs in it if
it's not a long running process.

Otherwise I'd probably go with fam (or hal i think that's the other
thing that does that)

hth
martin

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

Nov 12 '07 #5

davisn90210

On Nov 12, 11:27 am, "Martin Marcher" <mar...@marcher.namewrote:

2007/11/12, davisn90...@gmail.com <davisn90...@gmail.com>:

Why not use the file creation/modification timestamps?

because you'd have to

a) create a thread that pulls all the time for changes or

Given that it would only involve a check of one timestamp (the
directory the files are located in), I don't think polling "from time
to time" would be unreasonable. The modification timestamp of the
directory should be sufficient given the use case. Even if it's not,
tracking modification times for the files in the directory would not
be unreasonable.

b) test everytime for changes

Checking a timestamp should be a very quick operation. Unless
"everytime" occurs *very* frequently, it's certainly not unreasonable.

fam informs in a notification like way.

FAM would work too. However,
1) According to http://oss.sgi.com/projects/fam/faq.html#what_os_fam,
FAM "should be fairly easy to port to ... Unix-like operating
systems ....". If the original poster is a user of a "Uniix-like
operating system" he/she may actually be able to use it. Regardless,
it seems to me that you would lose a great deal of portability (i.e.,
is there a Windows port?), which may or may not be important to the
poster.
2) FAM undoubtedly uses some system resources. Probably very little,
but it's still an overhead that must be taken into account.
3) You still need to use another method for maintaining state across
program invocations, do you not?

Using timestamps are:
1) Portable. Can you name one OS that does not provide timestamps?
Last I checked, even Windows does :-)
2) Storage efficient. I don't have to actually *store* the
timestamps. I can just check to see if a file/directory was modified
after the last time I checked.
3) Easy to maintain persistent state -- just store the timestamp!

Personally I'd create a "hidden" cache file parsable by configparser
and have filename = $favorite_checksum_algo - key value pairs in it if
it's not a long running process.

What is your reasoning for this? It seems to me that it is
inefficient and unreliable. First of all you have to compute the
checksum (which undoubtedly would involve reading every byte the file)
-- not just once, but "everytime" (or however often you perform the
check). Secondly, it is possible for the checksum to be the same even
if the file has changed. Unlikely? Perhaps (depends on checksum
algorithm used). Impossible? No. So, in effect, you are using a
"slow" algorithm that is known to give incorrect results in certain
cases -- all to replace something as basic as timestamps?

Otherwise I'd probably go with fam (or hal i think that's the other
thing that does that)

hth
martin

--http://noneisyours.marcher.namehttp://feeds.feedburner.com/NoneIsYours

Thanks for the critique -- feel free to punch holes.

--Nathan Davis

Nov 14 '07 #6

Martin Marcher

I think that without further information from the OP about the
requirements all we can do is guessing. So both of our solutions are
just theory after all (just my personal opinion)

2007/11/14, da*********@gmail.com <da*********@gmail.com>:

On Nov 12, 11:27 am, "Martin Marcher" <mar...@marcher.namewrote:
2007/11/12, davisn90...@gmail.com <davisn90...@gmail.com>:

a) create a thread that pulls all the time for changes or

Given that it would only involve a check of one timestamp (the
directory the files are located in), I don't think polling "from time
to time" would be unreasonable. The modification timestamp of the
directory should be sufficient given the use case. Even if it's not,
tracking modification times for the files in the directory would not
be unreasonable.

Not for the 400 Files but the OP asks about more files too. How about
40.000 files or 400.000 files? That could be a problem...

b) test everytime for changes

Checking a timestamp should be a very quick operation. Unless
"everytime" occurs *very* frequently, it's certainly not unreasonable.

See above I think it also depends on the number of files

fam informs in a notification like way.

FAM would work too. However,
1) According to http://oss.sgi.com/projects/fam/faq.html#what_os_fam,
FAM "should be fairly easy to port to ... Unix-like operating
systems ....". If the original poster is a user of a "Uniix-like
operating system" he/she may actually be able to use it. Regardless,
it seems to me that you would lose a great deal of portability (i.e.,
is there a Windows port?), which may or may not be important to the
poster.

I don't use windows so speaking about portability you are right. It
may be a personal thing but I stopped providing solution (or trying to
think about them) for windows (another discussion probably best placed
in a forum about social interests or something....)

2) FAM undoubtedly uses some system resources. Probably very little,
but it's still an overhead that must be taken into account.

Both is true but most Linux distributions do use FAM at some point
anyway so the overhead is actually very little. Also I think that on
most OSs there is a similiar thing like FAM that could be used...

3) You still need to use another method for maintaining state across
program invocations, do you not?

You need some method no matter wether your program is a long running
process or just invoked in irregular intervals.

After all I'm pretty sure that there is something FAM like that is
available on most OSs. FAM isn't probably available on OSX either but
I guess they provide some mechanism. If you want it really portable
I'd use an abstraction layer that tries to communicate with some
notification daemon which is probably available on the host os and if
all that fails provide a fallback implementation that does naive
tests. All accessible thru the same abstraction interface.

Using timestamps are:
1) Portable. Can you name one OS that does not provide timestamps?
Last I checked, even Windows does :-)
2) Storage efficient. I don't have to actually *store* the
timestamps. I can just check to see if a file/directory was modified
after the last time I checked.

read below, a changed timestamp isn't necessarily a sign that a file
has indeed changed (backups, ....)

3) Easy to maintain persistent state -- just store the timestamp!

Well >>>I don't have to actually *store* the timestamps.<<< and

>>>just store the timestamp!<<< are a bit confusing. I think you

absolutely need to store the timestamp since between runs you won't
know what to check for anyway (new files, deleted files, changed files
- if these cases are important to you)

Personally I'd create a "hidden" cache file parsable by configparser
and have filename = $favorite_checksum_algo - key value pairs in it if
it's not a long running process.

What is your reasoning for this?

because all I need to do to check for changes is getCache(configFile)
and compare the results to getActual(os.listdir) and those 2 methods
would give me the needed info (of course I'm just blindly guessing as
I don't know anything about the further requirements)

Of course with a lot of files this could be a problem. I wouldn't want
a configparser object with 40.000 (or even just a few thousand)
entries to be alive all the time. You'd probably have to create some
iterator for the file so that you can check thru the entries in a
memory efficient way...

It seems to me that it is
inefficient and unreliable. First of all you have to compute the
checksum (which undoubtedly would involve reading every byte the file)
-- not just once, but "everytime" (or however often you perform the
check). Secondly, it is possible for the checksum to be the same even
if the file has changed. Unlikely? Perhaps (depends on checksum
algorithm used). Impossible? No. So, in effect, you are using a
"slow" algorithm that is known to give incorrect results in certain
cases -- all to replace something as basic as timestamps?

It seems you are absolutely linking checksum with something like md5 or sha...

Maybe that was badly stated, depending on the use case of course a
timestamp could also be considered a valid checksum. However to be
safe some timestamp isn't really giving me the information. A lot of
backup tools do update the timestamp (atime in unix, dunno about
windows) and that could lead to even more wasting of resources.

Consider you are checking some CSV files with timestampt which upon
change initiate some real intensive number crunching. Now you do that
because you figured "Hey the timestamp has changed, I need to redo my
calculations..." while in fact just the backup programm was running.
But as I said it depends on the use case what you consider a valid to
know that a file changed...

So the checksum algo is something that should be chosen depending on

a) interval of checks (like you say)
b) need to be sure that 2 files don't actually have the same checksum

I guess a simple approach could be something like the Message-ID
header in emails, a bit adapted to local use cases.

Otherwise I'd probably go with fam (or hal i think that's the other
thing that does that)

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

Nov 17 '07 #7

Martin Marcher

I just found this for win32 which seems to be the same as FAM provides:
http://tgolden.sc.sabren.com/python/...r_changes.html
So it's not about FAM as a definitive product to be used but more like
something nearer to the OS that is there anyway and will tell you
about it...

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

Nov 17 '07 #8

by: ragha | last post by:

Dear friends I am emulating thr tree structure mentioned in the article http://www.15seconds.com/issue/010921.htm I have succesfully created the folder structure for level 2 I need this...

.NET Framework

How to know the memory pointed by a ptr is freed?

by: ravi | last post by:

I have a situation where i want to free the memory pointed by a pointer, only if it is not freed already. Is there a way to know whether the memory is freed or not?

C / C++

c# program to read Internet headers from public folder emails

by: john bailo | last post by:

I am attempting to create a c# program to iterate through the messages in an Outlook/Exchange public folder and extract the headers. My apologies to the VB/VBA groups, but you seem to have more...

C# / C Sharp

Does your assembly folder show contents?

by: Larry | last post by:

I installed Visual C# .net 2003 Standard and the c:\windows\assembly folder will not let me view the contents using XP. At the dos prompt I can see all the files and directories. Just not inside...

C# / C Sharp

Open a folder using asp.net

by: santel_helvis | last post by:

Hi there, I wanna open the folder using asp.net. I am using anchor link to do that. But in href part how shall I give the location. Anyone plz help me out. Thanks in advance

ASP.NET

VB Print folder contents

by: tula123 | last post by:

Hi - I'm new to forums so I hope I'm going about this the right way... I wrote a 'quick' VB program to print the filenames within a folder. After the Print Dialog Box comes up and you select your...

Visual Basic 4 / 5 / 6

Rename Folder

by: rn5a | last post by:

An inquisitive question...... A ListBox lists all the directories & files residing in a directory on the server. Assume that the ListBox lists 2 directories & 4 files. Also assume that one of...

ASP.NET

move contents of inbox folder to another folder

by: Mike P | last post by:

I am trying to move the contents of the inbox folder into a folder for archiving. I have found a few examples on MSDN (eg http://msdn2.microsoft.com/en-us/library/bb206765.aspx), but I don't...

C# / C Sharp

Get contents of a remote folder and automatically update the webpage

by: Bell Deep | last post by:

I wanted to create a webpage(using HTML), where I want to display the contents of a remote folder (the folder is in the server). I need to display the contents (i.e, files or subfolders) of that...

HTML / CSS

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

how to know if folder contents have changed

Similar topics