473,766 Members | 2,035 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Determining when a file has finished copying

Hi all,

I'm writing some code that monitors a directory for the appearance of
files from a workflow. When those files appear I write a command file
to a device that tells the device how to process the file. The
appearance of the command file triggers the device to grab the
original file. My problem is I don't want to write the command file to
the device until the original file from the workflow has been copied
completely. Since these files are large, my program has a good chance
of scanning the directory while they are mid-copy, so I need to
determine which files are finished being copied and which are still
mid-copy.

I haven't seen anything on Google talking about this, and I don't see
an obvious way of doing this using the os.stat() method on the
filepath. Anyone have any ideas about how I might accomplish this?

Thanks in advance!
Doug
Jul 9 '08
13 10722
On Jul 9, 5:34*pm, keith <ke...@keithper kins.netwrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ethan Furman wrote:
writeson wrote:
Guys,
Thanks for your replies, they are helpful. I should have included in
my initial question that I don't have as much control over the program
that writes (pgm-W) as I'd like. Otherwise, the write to a different
filename and then rename solution would work great. There's no way to
tell from the os.stat() methods to tell when the file is finished
being copied? I ran some test programs, one of which continously
copies big files from one directory to another, and another that
continously does a glob.glob("*.pd f") on those files and looks at the
st_atime and st_mtime parts of the return value of os.stat(filenam e).
From that experiment it looks like st_atime and st_mtime equal each
other until the file has finished being copied. Nothing in the
documentation about st_atime or st_mtime leads me to think this is
true, it's just my observations about the two test programs I've
described.
Any thoughts? Thanks!
Doug
The solution my team has used is to monitor the file size. *If the file
has stopped growing for x amount of time (we use 45 seconds) the file is
done copying. *Not elegant, but it works.
--
Ethan

Also I think that matching the md5sums may work. *Just set up so that it
checks the copy's md5sum every couple of seconds (or whatever time
interval you want) and matches against the original's. *When they match
copying's done. I haven't actually tried this but think it may work.
Any more experienced programmers out there let me know if this is
unworkable please.
K
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla -http://enigmail.mozdev .org

iD8DBQFIdVkX8vm NfzrLpqoRAsJ2AK Cp8wMz93Vz8y9K+ MDSP33kH/WHngCgl/wM
qTFBfyIEGhu/dNSQzeRrwYQ=
=Xvjq
-----END PGP SIGNATURE-----
I use a combination of both the os.stat() on filesize, and md5.
Checking md5s works, but it can take a long time on big files. To fix
that, I wrote a simple sparse md5 sum generator. It takes a small
number bytes from various areas of the file, and creates an md5 by
combining all the sections. This is, in fact, the only solution I have
come up with for watching a folder for windows copys.

The filesize solution doesn't work when a user copies into the watch
folder using drag and drop on Windows because it allocates all the
attributes of the file before any data is written. The filesize will
always show the full size of the file.

~Sean
Jul 11 '08 #11
Sean DiZazzo wrote:
On Jul 9, 5:34 pm, keith <ke...@keithper kins.netwrote:
>-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ethan Furman wrote:
>>writeson wrote:
Guys,
Thanks for your replies, they are helpful. I should have included in
my initial question that I don't have as much control over the program
that writes (pgm-W) as I'd like. Otherwise, the write to a different
filename and then rename solution would work great. There's no way to
tell from the os.stat() methods to tell when the file is finished
being copied? I ran some test programs, one of which continously
copies big files from one directory to another, and another that
continousl y does a glob.glob("*.pd f") on those files and looks at the
st_atime and st_mtime parts of the return value of os.stat(filenam e).
From that experiment it looks like st_atime and st_mtime equal each
other until the file has finished being copied. Nothing in the
documentatio n about st_atime or st_mtime leads me to think this is
true, it's just my observations about the two test programs I've
described.
Any thoughts? Thanks!
Doug
The solution my team has used is to monitor the file size. If the file
has stopped growing for x amount of time (we use 45 seconds) the file is
done copying. Not elegant, but it works.
--
Ethan
Also I think that matching the md5sums may work. Just set up so that it
checks the copy's md5sum every couple of seconds (or whatever time
interval you want) and matches against the original's. When they match
copying's done. I haven't actually tried this but think it may work.
Any more experienced programmers out there let me know if this is
unworkable please.
K
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla -http://enigmail.mozdev .org

iD8DBQFIdVkX8v mNfzrLpqoRAsJ2A KCp8wMz93Vz8y9K +MDSP33kH/WHngCgl/wM
qTFBfyIEGhu/dNSQzeRrwYQ=
=Xvjq
-----END PGP SIGNATURE-----

I use a combination of both the os.stat() on filesize, and md5.
Checking md5s works, but it can take a long time on big files. To fix
that, I wrote a simple sparse md5 sum generator. It takes a small
number bytes from various areas of the file, and creates an md5 by
combining all the sections. This is, in fact, the only solution I have
come up with for watching a folder for windows copys.

The filesize solution doesn't work when a user copies into the watch
folder using drag and drop on Windows because it allocates all the
attributes of the file before any data is written. The filesize will
always show the full size of the file.

~Sean
While a lot depends on HOW the copying program does its copy, I've recently been
able to get pyinotify to watch folders. By watching for IN_CLOSE_WRITE events I
can see when files are closed by the writer and then process them instantly
after they have been written. Now if the writer does something like:

open
write
close
open append
write
close
..
..
..

This won't work as well.

FYI,
Larry
Jul 13 '08 #12
Sean DiZazzo wrote:
On Jul 9, 5:34 pm, keith <ke...@keithper kins.netwrote:
>>-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ethan Furman wrote:
>>>writeson wrote:

Guys,
>>>>Thanks for your replies, they are helpful. I should have included in
my initial question that I don't have as much control over the program
that writes (pgm-W) as I'd like. Otherwise, the write to a different
filename and then rename solution would work great. There's no way to
tell from the os.stat() methods to tell when the file is finished
being copied? I ran some test programs, one of which continously
copies big files from one directory to another, and another that
continous ly does a glob.glob("*.pd f") on those files and looks at the
st_atime and st_mtime parts of the return value of os.stat(filenam e).

>From that experiment it looks like st_atime and st_mtime equal each

other until the file has finished being copied. Nothing in the
documentati on about st_atime or st_mtime leads me to think this is
true, it's just my observations about the two test programs I've
described .
>>>>Any thoughts? Thanks!
Doug
>>>The solution my team has used is to monitor the file size. If the file
has stopped growing for x amount of time (we use 45 seconds) the file is
done copying. Not elegant, but it works.
--
Ethan

Also I think that matching the md5sums may work. Just set up so that it
checks the copy's md5sum every couple of seconds (or whatever time
interval you want) and matches against the original's. When they match
copying's done. I haven't actually tried this but think it may work.
Any more experienced programmers out there let me know if this is
unworkable please.
K
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla -http://enigmail.mozdev .org

iD8DBQFIdVkX8 vmNfzrLpqoRAsJ2 AKCp8wMz93Vz8y9 K+MDSP33kH/WHngCgl/wM
qTFBfyIEGhu/dNSQzeRrwYQ=
=Xvjq
-----END PGP SIGNATURE-----


I use a combination of both the os.stat() on filesize, and md5.
Checking md5s works, but it can take a long time on big files. To fix
that, I wrote a simple sparse md5 sum generator. It takes a small
number bytes from various areas of the file, and creates an md5 by
combining all the sections. This is, in fact, the only solution I have
come up with for watching a folder for windows copys.

The filesize solution doesn't work when a user copies into the watch
folder using drag and drop on Windows because it allocates all the
attributes of the file before any data is written. The filesize will
always show the full size of the file.

~Sean
Good info, Sean, thanks. One more option may be to attempt to rename
the file -- if it's still open for copying, that will fail; success
indicates the copy is done. Of course, as Larry Bates pointed out, this
could fail if the copy is followed by a re-open and appending.
Hopefully that's not an issue for the OP.
--
Ethan
Jul 14 '08 #13
You could also copy to a different name on the same disk, and when the copying
has been finished just 'move' (mv) the file to the filename the other
application expects. E.g. QMail works this way, writing incoming mails in
folders.

Met vriendelijke groet,
Wilbert Berendsen

--
http://www.wilbertberendsen.nl/
"You must be the change you wish to see in the world."
-- Mahatma Gandhi
Jul 19 '08 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
3772
by: Mike MacSween | last post by:
On each workstation there's a front end, when the front end opens it checks a 'version number' held in a table of properties against the version number in a copy of the front end held on the server. If it's the same it opens normally. If not it shells out to an UpdateClient.mdb and quits. The UpdateClient.mdb overwrites the FE on the workstation with the one on the server. But I was getting an access error (70, I think). Presuming that the...
5
13712
by: Trowa | last post by:
Hi, I'm trying to use FileSystemWatcher to determine when a new file has been added to a directory. However, after the file has been added, I need to process the file. This results in a problem if the file is large and takes time to copy, as the events appear to get received before the copy is finished. I've tried listening to the Created event, as well as the Changed event (which gets fired multiple times) based on the last write...
8
18359
by: Randy | last post by:
Hi, is it possible to show the progress of a big file being copied e.g. in a "progressbar"? I tried to use file.copy - but this seems to make no sense :-( Thanks in advance, Randy
5
2353
by: TB | last post by:
Hi All: This news group is proving to be great help on my path towards mastering ASP.NET thanks to all of you helpful souls out there. I am looking forward to the day when I can contribute with a few ounces of hard-won knowledge as well. Meanwhile I have this (hopefully small) problem.
0
1125
by: bloggs | last post by:
I am writting an application that runs on an FTP server. The purpose of the app is to uncompress gzip files that are sent to the server and then move the file to another folder according to it's contents. The challenge I am having is preventing the application from uncompressing a file before the file has landed on the server in its entirety. Currently, I am using the following statement to check for incoming files For Each sFile In...
6
6030
by: elake | last post by:
I found this thread about a pst file in Windows being locked and I am having the same issue. http://groups.google.com/group/comp.lang.python/browse_thread/thread/d3dee5550b6d3652/ed00977acf62484f?lnk=gst&q=%27copying+locked+files%27&rnum=1 The problem is that I have a script that can find the pst files on every machine in my network and back them up to a server for safe keeping. The problem is that when Outlook is running it locks the...
2
1209
by: jeffc | last post by:
I'm maintaining some older code, apparently created in Vis Studio 2003. I have Vis Studio 2005. When I open this solution, it says it has to convert all the projects. There are some problems with the application and what I'm trying to do is recreate the original application from scratch in 2005, and then copying the code over. It's not clear how to reverse engineer this. For example, when I add a new item to a project, you can choose...
1
4299
by: =?Utf-8?B?UmFkZW5rb19aZWM=?= | last post by:
I am using standard File.Copy(source,dest,true) method in C# and I have problem with copying large number of files. Here is my code: foreach (FileInfo file in files) { File.Copy(file.FullName,destPath+ "\\" + file.Name, true); } This code copies only 5 or 10 files but in "files" collection there is 60 files.
0
185
by: Manuel Vazquez Acosta | last post by:
Cameron Simpson wrote: Hum, what about the last file in the sequence? I think polling file's size maybe a good indicator, as Ethan proposed. Best regards, Manuel.
0
9404
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10008
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9837
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7381
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5279
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5423
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3929
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3532
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2806
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.