467,081 Members | 990 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,081 developers. It's quick & easy.

how can i check if a file is completely uploaded...???

dear all,

i've created an application for a customer where the customer can upload
..csv-files into a specified ftp-directory. on the server, a php-script,
triggered by a cronjob, reads all the data, imports it into a mySQL database
and deletes the .csv-file after the import.

so far, so good.

but in some cases the cronjobs starts running when the file is not completely
uploaded. so a part of the data is truncated. :-(

is there any method in php to check if a file is completely uploaded via ftp
or not? i didn't find anything about this on php.net and it would be great if
someone could help me with this problem.

thanks a lot in advance,
dino

Jul 17 '05 #1
  • viewed: 7759
Share:
18 Replies
hmm maybe u can check the file size? using php function....

I also had experience same with you ... what I do is manually do the
cronjonb twice.. so it will make sure the file will finished ftp
first..

may be we need more experience php user to help on this matter..
thannks in advance

Jul 17 '05 #2
On 19 Mar 2005 04:30:36 -0800, "badz" <ba****@gmail.com> wrote:
hmm maybe u can check the file size? using php function....

I also had experience same with you ... what I do is manually do the
cronjonb twice.. so it will make sure the file will finished ftp
first..

may be we need more experience php user to help on this matter..
thannks in advance


hello,

thanks for your input, the problem is, that the filesizes changes
depandant on the number of line in the .csv file.

i thought about 2 possible solutions, but they are kind of "cheap
workarounds" instead of a clean solutions:

1st idea: i ask the customer to add an additional line at the bottom
of the .csv, e.g. "###endoffile###". before the php cronjob starts
working it checks the .csv for the existance of this
"end-of-document-flag".

2nd idea: i ask the customer to upload another small files whith a
special filename e.g. "startjob.txt" _AFTER_ the uploading of the main
file. if the php conjob does not find this file, it exits immediately.
(after the import -of course- both will will be deleted automatically)

although this may work, i don't think it's the perfect solution at
all. maybe someone here has a "native" method to check if a file is
completey uploaded.

best regards,
dino

Jul 17 '05 #3
Dino <no****@yahoo.com> wrote:
although this may work, i don't think it's the perfect solution at
all. maybe someone here has a "native" method to check if a file is
completey uploaded.


You are using flock to get an exclusive lock, aren't you?

Jul 17 '05 #4
>>
but in some cases the cronjobs starts running when the file is not completely uploaded. so a part of the data is truncated. :-(


Please excuse me if 'm absolutly wrong,

as far, as I heard off, you must upload a file, than as first thing, copy it
to a posistion, where it belongs to and only than, use it.
Is than certain, that the file exists?
I never did it, so if I'm wrong, sorry.

Lothar
Jul 17 '05 #5

"Daniel Tryba" <pa**********@invalid.tryba.nl> wrote in message
news:42*********************@news6.xs4all.nl...
Dino <no****@yahoo.com> wrote:
although this may work, i don't think it's the perfect solution at
all. maybe someone here has a "native" method to check if a file is
completey uploaded.


You are using flock to get an exclusive lock, aren't you?


I wonder if flock() works in this kind of situations. The description in the
manually reads: "PHP supports a portable way of locking complete files in an
advisory way (which means all accessing programs have to use the same way of
locking or it will not work)." In this case, the FTP daemon might not use
the same locking mechanism.

If flock() doesn't work, I would suggest opening the file in read/write mode
(fopen flag "a+"). If the FTP program is still writing to the file, the OS
wouldn't open it for writing a second time.
Jul 17 '05 #6
Chung Leong <ch***********@hotmail.com> wrote:
You are using flock to get an exclusive lock, aren't you?

I wonder if flock() works in this kind of situations. The description in the
manually reads: "PHP supports a portable way of locking complete files in an
advisory way (which means all accessing programs have to use the same way of
locking or it will not work)." In this case, the FTP daemon might not use
the same locking mechanism.


You are correct, if either php or the ftpd would be completly broken by
using it's own locking mechanisme instead the one provided by the OS
If flock() doesn't work, I would suggest opening the file in read/write mode
(fopen flag "a+"). If the FTP program is still writing to the file, the OS
wouldn't open it for writing a second time.


Doens't work on this way my OS:
$ echo -n > out ; php4 ./in.php & sleep 5 ; php4 ./in.php

results in out containing:
0 1 2 3 4 0 5 1 6 2 7 3 8 4 9 5 6 7 8 9
Jul 17 '05 #7
Chung Leong <ch***********@hotmail.com> wrote:
You are using flock to get an exclusive lock, aren't you?

I wonder if flock() works in this kind of situations. The description in the
manually reads: "PHP supports a portable way of locking complete files in an
advisory way (which means all accessing programs have to use the same way of
locking or it will not work)." In this case, the FTP daemon might not use
the same locking mechanism.


You are correct, if either php or the ftpd would be completly broken by
using it's own locking mechanisme instead the one provided by the OS
If flock() doesn't work, I would suggest opening the file in read/write mode
(fopen flag "a+"). If the FTP program is still writing to the file, the OS
wouldn't open it for writing a second time.


Doens't work on this way my OS:
$ echo -n > out ; php4 ./in.php & sleep 5 ; php4 ./in.php

results in out containing:
0 1 2 3 4 0 5 1 6 2 7 3 8 4 9 5 6 7 8 9

if in.php is:

<?php
if($fp=fopen('out','a+'))
{
for ($i=0;$i<10;$i++)
{
fputs($fp,"$i ");
sleep(1);
}
fclose($fp);
}
else
{
echo "fopen failed\n";
}
?>
Jul 17 '05 #8
"Daniel Tryba" <pa**********@invalid.tryba.nl> wrote in message
news:42*********************@news6.xs4all.nl...
You are correct, if either php or the ftpd would be completly broken by
using it's own locking mechanisme instead the one provided by the OS


Or if it's locking the file at all.
If flock() doesn't work, I would suggest opening the file in read/write mode (fopen flag "a+"). If the FTP program is still writing to the file, the OS wouldn't open it for writing a second time.


Doens't work on this way my OS:
$ echo -n > out ; php4 ./in.php & sleep 5 ; php4 ./in.php


Hmmm...Doesn't work on mine either :-) It appears that PHP on Windows set
the share mode to share-read and share-write.

Maybe an attempt to rename the file is a more reliable test?
Jul 17 '05 #9
Chung Leong <ch***********@hotmail.com> wrote:

Maybe an attempt to rename the file is a more reliable test?


That will also not work on unix type filesystem since only the name f
the inode will change.

Jul 17 '05 #10
On Sun, 20 Mar 2005 12:34:54 +0100, Daniel Tryba wrote
(in message <42*********************@news6.xs4all.nl>):
Chung Leong <ch***********@hotmail.com> wrote:

Maybe an attempt to rename the file is a more reliable test?


That will also not work on unix type filesystem since only the name f
the inode will change.


i think i found a solution for the problem (at least for the problem
described in my original posting).

i simply take the filemtime() of the file which is detected by the cronjob
and then check the difference between the current time and the filemtime() of
the file. if the difference between the filemtime and and current time is
smaller than 30seconds (just as an example) i assume that the upload is still
in progress, otherwise i start the processing job.
something like this:
<?php
$timedifference = (time() - filemtime($import_directory.$file));

if($timedifference > 30)
{
// let's rock
// ...
}
else
{
// exit, file upload could be still in progress...
}
?>
best regards,
dino

Jul 17 '05 #11

"Daniel Tryba" <pa**********@invalid.tryba.nl> wrote in message
news:42*********************@news6.xs4all.nl...
Chung Leong <ch***********@hotmail.com> wrote:

Maybe an attempt to rename the file is a more reliable test?


That will also not work on unix type filesystem since only the name f
the inode will change.


My understanding of Unix is rather limited. So you are saying the rename
operation would succeed even when there's an open handle to the file? How
about delete? Surely you can't delete the file while it's still open, right?
If so, we can create a hard link to the file then try deleting it under the
original name and see what happens.
Jul 17 '05 #12
In article <Se********************@comcast.com>,
"Chung Leong" <ch***********@hotmail.com> wrote:
"Daniel Tryba" <pa**********@invalid.tryba.nl> wrote in message
news:42*********************@news6.xs4all.nl...
Chung Leong <ch***********@hotmail.com> wrote:

Maybe an attempt to rename the file is a more reliable test?


That will also not work on unix type filesystem since only the name f
the inode will change.


My understanding of Unix is rather limited. So you are saying the rename
operation would succeed even when there's an open handle to the file? How
about delete? Surely you can't delete the file while it's still open, right?
If so, we can create a hard link to the file then try deleting it under the
original name and see what happens.


The actual data structure that describes the file is called an inode on
Unix filesystems. It has information on the file's size, blocksize,
access times, permissions, ownership and link count (more on that
later).

When you creat() a file, the number of an inode is entered into a
directory file along with the file's name and the various fields in the
inode are filled in. You can rename a file that's open if you have
write permission on the directory where it's located. This is because
you're changing the filename field in a directory file and not touching
the file's inode or data itself. The contents of the file rename
unchanged as the process that's accessing the file does so by inode.

When a process opens a file, it ultimately uses the inode as a
reference. There's no mandatory file locking mechanism built into a
Unix filesystem. All that locking stuff came later, IIRC. AFAIK, the
file locking API is advisory and it's up the the program to check that a
file is locked using the locking calls available from the OS (flock, et
al) or come up with it's own methodology.

It's completely possible to delete a file that's open for writing since
you're just removing an inode/filename entry in a directory file. The
space allocated by the file is not returned to the free space pool until
the file is closed by the program that's opened it. It's a frequent
newbie sysadmin mistake to delete a system log file that's filling a
filesystem without determining and killing the process that's got the
file open. Unless you already know what process is writing
/var/adm/log/system.log, you're only option at that point is to reboot
to recover the space.

An additional feature that Unix filesystems have is the ability to have
multiple directory entries all pointing to the same inode. These are
called 'hard links'. When you create a file, a inode is allocated and
the information specific to the file is inserted into the file. A
corresponding entry is made to a directory file for that inode. You can
create a link in another directory or the same directory with a
different name to the same inode. The "link count" is incremented
saying there are 2 directory entries for the file. When a file is
deleted, the inode's link count is decremented. If it's 0, the inode is
returned to the free pool to be reused. Note that each Unix filesystem
has it's own inode table. Hard links cannot span filesystems.

A hard link is not the same thing as a "shortcut" or "file alias" in
other OSes. "Soft links" were created to allow spanning filesystems.
They are simply entries in directories that point to a real file
anywhere in the Unix directory tree. When created, the ln command
checks for the files existence, but that's it. If you delete the
underlying real file, the soft link is still there and now "broken".

Now, to your problem. Unless you have some way of knowing the length of
the file being transferred or that the transfer is completed, I don't
see a way to do this sort of thing without writing your own protocol and
doing it yourself. There's curl, which can be built into php. I like
the idea of sending a file titled "999lastfile.txt" and checking for
that before the cron job proceeds.

--
DeeDee, don't press that button! DeeDee! NO! Dee...

Jul 17 '05 #13
Chung Leong wrote:
That will also not work on unix type filesystem since only the name f
the inode will change.


My understanding of Unix is rather limited. So you are saying the rename
operation would succeed even when there's an open handle to the file? How
about delete? Surely you can't delete the file while it's still open,
right? If so, we can create a hard link to the file then try deleting it
under the original name and see what happens.


You cannot delete a file in Unix at all. There is no Unix function delete().

You can decreate the number of names a file has, using the unlink() system
call. The kernel will "delete" the file (free the inode and all data
blocks), when a file has no names and is no longer being used (all
processes have closed the file).

You can very well unlink() an open file.

In Unix, all access checking is done when you open() a file. If the open
succeeds, the name is being translated into an open inode, and a file
handle pointing to that inode and some additional stuff is being given to
the process calling open(). After that, the name of a file is irrelevant,
and can be unlinke)ed, rename()d or otherwise manipulated. Also, the access
permission of the file can be changed - this again will not affect any
process holding an open handle on that file.

Similarly, you cannot check if you have permission to open a file. There is
a function called access(), but it is dangerous and should not be used. If
you are using access(), you knew if you had access rights to a file at the
time you called access(), but that may have changed the moment your process
returns from the access() system call. So the result of any access() system
call is always invalid and inaccurate, which is why the call should not be
used.

Instead, you should try to open() the file and see if you succees. If you
do, you have permission to open the file, and using the handle created by
open, neither permission nor file can be taken away from you until you
decide to close() the handle.

Kristian

Jul 17 '05 #14
Dino wrote:
is there any method in php to check if a file is completely uploaded via
ftp or not? i didn't find anything about this on php.net and it would be
great if someone could help me with this problem.


There is no such method in PHP, nor in Unix.
There are several solutions to your problem, though.

One is to upload two files, the actual data file and a marker file. Upload
the data file "D.timestamp" first, and then upload the marker file
"M.timestamp" afterwards. The marker file may be empty. If the marker file
is present, you can process the data file.
A variation on this is uploading to a staging directory, then move the file
into a processing directory using ftp commands. Your cron job only reads
files from the processing directory.

A rename() system call in Unix is atomic, if performed on the same file
system, so this is uninterruptible even in a concurrent environment. That
means, you can use this even to have multiple processing cronjobs. Here,
each cronjob would fetch files from the processing directory and move them
into a private working directory for this particular cronjob (named
"work.<pid>", where <pid> is the process id of the cronjob handling the
file). If the move into the per-cronjob working directory succeeds, this
particular cronjob owns the file and can handle it independently from any
other jobs running at the same time.
There are even ftp servers such as pureftpd, which do the rename thing
automatically for you. Upon upload, they create an invisible uniquely named
dotfile, and only rename the file to the target name when the upload is
complete and terminates successfully. pureftp also has other interesting
features, so I suggest you look into it - it will make your code work
without any changes to your application.

Kristian

Jul 17 '05 #15
On Mon, 21 Mar 2005 11:45:46 +0100, Kristian Köhntopp wrote
(in message <d1**********@xn--abcdefghijklmnopqrstuvwxyzss-vnc45c5f.de>):
Dino wrote:
is there any method in php to check if a file is completely uploaded via
ftp or not? i didn't find anything about this on php.net and it would be
great if someone could help me with this problem.


There is no such method in PHP, nor in Unix.
There are several solutions to your problem, though.


hello kristian,

thanks a lot for your input and your suggestions. meanwhile i've implented a
solution i descibed yesterday here in this thread using the
filemtime-function and it works perfectly with only the original file without
any marker file:

<?php
$timedifference = (time() - filemtime($import_directory.$file));

if($timedifference > 30)
{
// let's rock
// ...
}
else
{
// exit, file upload could be still in progress...
}
?>

thanks again and best regards,
dino

Jul 17 '05 #16
Dino wrote:
if($timedifference > 30)
{


Will fail, if your ftp timeout is larger than 30 seconds and you have a slow
uploader somewhere.

Kristian

Jul 17 '05 #17
On Mon, 21 Mar 2005 15:24:53 +0100, Kristian Köhntopp wrote
(in message <d1**********@xn--abcdefghijklmnopqrstuvwxyzss-vnc45c5f.de>):
Dino wrote:
if($timedifference > 30)
{


Will fail, if your ftp timeout is larger than 30 seconds and you have a slow
uploader somewhere.


hi, this is correct, but the timeout can also be set to 60secs or 3 minutes
or whatever. it is just to make sure that a 5MB .csv is uploaded completely
before the import jobs starts.

i don't think that it depends on the upload speed. the timestamp that can be
shown with filemtime() changes with every packet which is uploaded to the
server, so this value seems to change continuesly during the upload process
(of course i tested this only on redhat linux and mac os x, i don't know if
the behaviour is the same on other operating systems).

so i think it doesn't matter of the uploader uses a T1 connection or an old
analogue modem. i doensn't seem to be important if the whole upload is done
within 3 minutes or 30 minutes.

but i'm happy for every hint about the weaknesses of this method :-)

greetinX and thanks,
dino
Jul 17 '05 #18
"Michael Vilain" <vi****@spamcop.net> wrote in message
news:vi**************************@comcast.dca.giga news.com...
Now, to your problem. Unless you have some way of knowing the length of
the file being transferred or that the transfer is completed, I don't
see a way to do this sort of thing without writing your own protocol and
doing it yourself. There's curl, which can be built into php. I like
the idea of sending a file titled "999lastfile.txt" and checking for
that before the cron job proceeds.


Here's another idea: use a separate partition to store the upload files and
try to unmount it before running the script. IIRC you cannot unmount a
volume when there are files still open.
Jul 17 '05 #19

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by geradeaus | last post: by
2 posts views Thread by vishal | last post: by
4 posts views Thread by Gianpiero Colagiacomo | last post: by
1 post views Thread by Helixpoint | last post: by
4 posts views Thread by darrel | last post: by
7 posts views Thread by xerc | last post: by
4 posts views Thread by giftson.john@gmail.com | last post: by
2 posts views Thread by matt@londonstudent.co.uk | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.