FileSystemWatcher question

=?Utf-8?B?Tmljaw==?=

Hello,

I've got a FileSystemWatcher object setup to check for new files in a
folder. I've never done much with threading, but I think I mght need to here.
When I find a new file there are times when I need to do some time consuming
tasks. When this happens, as it is now, the FileSystemWatcher doesn't seem to
see all of the files if I move a large batch of files in the watch folder at
once. Could someone give me a push in the right direction? Thanks for any
help, I really appreciate it.

Thanks,
Nick

Nov 19 '08 #1

Subscribe Post Reply

2546

Peter Morris

public class FileProcessor
{
private object SyncRoot = new object();
private Queue<stringQueue = new Queue<string>;
private bool Terminated = false;

public FileProcessor()
{
var threadStart = new ThreadStart(Execute);
new Thread(threadStart).Start();
}

public void QueueFile(string fileName)
{
lock (SyncRoot)
{
Queue.Enqueue(fileName);
Monitor.Pulse(SyncRoot);
}
}

private void Execute()
{
while (!Terminated)
{
string fileName = null;
lock(SyncRoot)
{
if (Queue.Count 0 && !Terminated)
fileName = Queue.Dequeue();
}

if (fileName != null)
ProcessFile(fileName);
else
Monitor.WaitFor(SyncRoot);
}
}

public void Terminate()
{
Terminated = true;
lock (SyncRoot)
{
Monitor.Pulse(SyncRoot);
}
}

}
Something like that anyway. Probably wont even compile, but you never know
your luck :-)

--
Pete
====
http://mrpmorris.blogspot.com
http://www.capableobjects.com

Nov 19 '08 #2

Peter Duniho

On Wed, 19 Nov 2008 15:04:35 -0800, Peter Morris
<mr*********@spamgmail.comwrote:

Something like that anyway. Probably wont even compile, but you never
know your luck :-)

Nope, it won't. And even if it did, it would throw an exception. But you
were _really_ close. :)

Here's how I might have done it, were I to implement a class like that
(note that other than fixing the compile and run-time errors, I also
changed the termination condition so as to simplify the class a bit, and
added a possible implementation for the "ProcessFile()" method) (also,
more comments at the end):

public class FileProcessor
{
private object SyncRoot = new object();
private Queue<stringqueue = new Queue<string>;

public FileProcessor()
{
new Thread(Execute).Start();
}

public void QueueFile(string fileName)
{
if (fileName == null)
{
throw new ArgumentNullException("fileName");
}

lock (SyncRoot)
{
queue.Enqueue(fileName);
Monitor.Pulse(SyncRoot);
}
}

private void Execute()
{
lock (SyncRoot)
{
while (true)
{
if (queue.Count 0)
{
string fileName = queue.Dequeue();

if (fileName == null)
{
break;
}

ProcessFile(fileName);
}
else
{
Monitor.Wait(SyncRoot);
}
}

}
}

public void Terminate()
{
lock (SyncRoot)
{
queue.Enqueue(null);
Monitor.Pulse(SyncRoot);
}
}

// Subscribe to the FileFound event to be able to process filenames
// registered with this class
public event Action<stringFileFound;

private void ProcessFile(string fileName)
{
Action<stringhandler = FileFound;

if (handler != null)
{
handler(fileName);
}
}
}

Of course, I make no guarantees that this will compile either. :) Just
because I fixed some of the previous mistakes, that doesn't mean I didn't
introduce my own.

To the OP: I think the point of the above is that Pete M. is suggesting
that you don't do lengthy processing in your event handler(s) for the
FileSystemWatcher event(s), but rather just queue filenames to a separate
consumer class and let it handle things as it can. The above would allow
that.

Keep in mind, however, that while doing this should improve things, by
keeping the FileSystemWatcher class ready and available to monitor other
changes, it simply cannot be relied upon to give 100% notification for
every change that you are trying to look for. Part of your processing
when the FileSystemWatcher class notifies you of a change should be to
look for other changes in the same directory, in case changes came too
quickly for the FileSystemWatcher class to notice them all (it relies on a
fixed-sized buffer than if filled before additions can be processed will
cause some notifications to be lost).

So even with a producer/consumer setup as Pete M. suggests, you are not
guaranteed 100% reliable behavior. But you should be able to get a lot
closer than if you are doing lengthy processing in your FileSystemWatcher
event handlers.

Pete

Nov 19 '08 #3

=?Utf-8?B?Tmljaw==?=

Many thanks to both of you for your help. I'll give this a try.

"Peter Duniho" wrote:

On Wed, 19 Nov 2008 15:04:35 -0800, Peter Morris
<mr*********@spamgmail.comwrote:

Something like that anyway. Probably wont even compile, but you never
know your luck :-)

Nope, it won't. And even if it did, it would throw an exception. But you
were _really_ close. :)

Here's how I might have done it, were I to implement a class like that
(note that other than fixing the compile and run-time errors, I also
changed the termination condition so as to simplify the class a bit, and
added a possible implementation for the "ProcessFile()" method) (also,
more comments at the end):

public class FileProcessor
{
private object SyncRoot = new object();
private Queue<stringqueue = new Queue<string>;

public FileProcessor()
{
new Thread(Execute).Start();
}

public void QueueFile(string fileName)
{
if (fileName == null)
{
throw new ArgumentNullException("fileName");
}

lock (SyncRoot)
{
queue.Enqueue(fileName);
Monitor.Pulse(SyncRoot);
}
}

private void Execute()
{
lock (SyncRoot)
{
while (true)
{
if (queue.Count 0)
{
string fileName = queue.Dequeue();

if (fileName == null)
{
break;
}

ProcessFile(fileName);
}
else
{
Monitor.Wait(SyncRoot);
}
}

}
}

public void Terminate()
{
lock (SyncRoot)
{
queue.Enqueue(null);
Monitor.Pulse(SyncRoot);
}
}

// Subscribe to the FileFound event to be able to process filenames
// registered with this class
public event Action<stringFileFound;

private void ProcessFile(string fileName)
{
Action<stringhandler = FileFound;

if (handler != null)
{
handler(fileName);
}
}
}

Of course, I make no guarantees that this will compile either. :) Just
because I fixed some of the previous mistakes, that doesn't mean I didn't
introduce my own.

To the OP: I think the point of the above is that Pete M. is suggesting
that you don't do lengthy processing in your event handler(s) for the
FileSystemWatcher event(s), but rather just queue filenames to a separate
consumer class and let it handle things as it can. The above would allow
that.

Keep in mind, however, that while doing this should improve things, by
keeping the FileSystemWatcher class ready and available to monitor other
changes, it simply cannot be relied upon to give 100% notification for
every change that you are trying to look for. Part of your processing
when the FileSystemWatcher class notifies you of a change should be to
look for other changes in the same directory, in case changes came too
quickly for the FileSystemWatcher class to notice them all (it relies on a
fixed-sized buffer than if filled before additions can be processed will
cause some notifications to be lost).

So even with a producer/consumer setup as Pete M. suggests, you are not
guaranteed 100% reliable behavior. But you should be able to get a lot
closer than if you are doing lengthy processing in your FileSystemWatcher
event handlers.

Pete

Nov 20 '08 #4

Peter Morris

Hi Pete

Any chance you could explain a few things?

private object SyncRoot = new object();
private Queue<stringqueue = new Queue<string>;

public FileProcessor()
{
new Thread(Execute).Start();
}

Yeah, I was tempted to do the same thread start but couldn't remember if I
needed ThreadStart or not, and I wasn't willing to start up VS :_)

public void QueueFile(string fileName)
{
if (fileName == null)
{
throw new ArgumentNullException("fileName");
}

hehe. You work too hard :-) For newsgroups I just can't be bothered with
stuff like this, it's bad enough that I have to write it in my real code let
alone newsgroups. I suppose I should try to be a perfectionist in NG source
as well as my own, but at the end of the day I think the person reading has
the responsibility to learn some good techniques of their own rather than
responders being responsible for teaching them.

private void Execute()
{
lock (SyncRoot)
{
while (true)

This is where I am starting to disagree with you :-)

{
if (queue.Count 0)
{
string fileName = queue.Dequeue();

if (fileName == null)
{
break;

That's it, we are now officially in disagreement :-) Why would you go for
such a vague way of terminating the thread rather than my crystal clear
"Terminated" member?

Other than the change to how I terminate the thread + the check for null I
don't see how yours is functionally different from mine, what were your
thoughts at the time of reading my source?

// Subscribe to the FileFound event to be able to process filenames
// registered with this class
public event Action<stringFileFound;

private void ProcessFile(string fileName)
{
Action<stringhandler = FileFound;

if (handler != null)
{
handler(fileName);
}
}
}

I think I see now! You are showing him how to handle ProcessFile! There
was just no way I was going to bother :-)

To the OP: I think the point of the above is that Pete M. is suggesting
that you don't do lengthy processing in your event handler(s) for the
FileSystemWatcher event(s), but rather just queue filenames to a separate
consumer class and let it handle things as it can. The above would allow
that.

Exactly. In my service that does something similar I have a class that
decides what to do with the file based on its location + extension. It goes
into one of three of these queues based on how long the operation takes

Fast - Typically moves the file elsewhere
Intermediate - Typically unzips a small file
Long - Generates a MySQL db and zips it

So that the long process doesn't hold up the stuff that should occur
immediately despite the fact that there is a long term operation in
progress.

FileSystemWatcher class to notice them all (it relies on a fixed-sized
buffer than if filled before additions can be processed will cause some
notifications to be lost).

Any idea what size this buffer is? For me there will be a maximum of 10
people connecting at once so I doubt I will need to be concerned, but it's
always interesting to know.

--
Pete
====
http://mrpmorris.blogspot.com
http://www.capableobjects.com

Nov 20 '08 #5

Peter Duniho

On Thu, 20 Nov 2008 09:37:53 -0800, Peter Morris
<mr*********@spamgmail.comwrote:

Hi Pete

Any chance you could explain a few things?

> private object SyncRoot = new object();
private Queue<stringqueue = new Queue<string>;

public FileProcessor()
{
new Thread(Execute).Start();
}

Yeah, I was tempted to do the same thread start but couldn't remember if
I needed ThreadStart or not, and I wasn't willing to start up VS :_)

Since C# 2.0, the compiler will correctly infer the necessary delegate
type and create it for you.

Your approach isn't wrong, I just prefer the more concise syntax.

>
> public void QueueFile(string fileName)
{
if (fileName == null)
{
throw new ArgumentNullException("fileName");
}

hehe. You work too hard :-) For newsgroups I just can't be bothered
with stuff like this,

Understandable. To each his own. Note, however, to some extent I put
that in to make it very clear that, with my change to the termination
condition, the _only_ way a null should be able to get into the queue is
via the Terminate() method. The same thing would apply to your own code
as well, though the consequences of a mistake would be less drastic in
your version (the processing thread could pause prematurely, whereas in my
mine it would just exit altogether).

[...]
> {
if (queue.Count 0)
{
string fileName = queue.Dequeue();

if (fileName == null)
{
break;

That's it, we are now officially in disagreement :-) Why would you go
for such a vague way of terminating the thread rather than my crystal
clear "Terminated" member?

Why do you say it is vague? There's no ambiguity at all here.

Other than the change to how I terminate the thread + the check for null
I don't see how yours is functionally different from mine, what were
your thoughts at the time of reading my source?

I've removed a variable for one. With the removal of that variable, not
only have I saved as many as 4 bytes from the size of my class, I have
avoided the risk of making the mistake you did: failing to either put the
use of that variable inside a synchronized block of code, or making the
variable "volatile". :)

Another maintainance advantage is that by putting the termination
condition in the queue itself, I can then keep the "wait or dequeue" code
completely separate from the "dequeue then process" code. In your
version, those two functions of the block of code are intertwined. In
mine, they are clearly delineated, separate from each other.

Assuming one fixes your bug, there is no _functional_ difference between
the two. Both techniques would successfully allow the thread to exit.
But, I find my way simpler and thus more maintainable. The fewer
variables you have, the fewer chances you have to mess up how you use
those variables, and I like to be able to understand the code in smaller
pieces where possible.

I admit that the difference is relatively minor (assuming the other
approach is implemented correctly ;) ), and I wouldn't spend a lot of time
trying to argue in favor of one or the other. But hopefully the above
adequately describes why I do in fact have a preference of one over the
other.

> // Subscribe to the FileFound event to be able to process filenames
// registered with this class
public event Action<stringFileFound;

private void ProcessFile(string fileName)
{
Action<stringhandler = FileFound;

if (handler != null)
{
handler(fileName);
}
}
}

I think I see now! You are showing him how to handle ProcessFile!
There was just no way I was going to bother :-)

lol...well, in that case you just guarantee that your code example
wouldn't compile. :)

[...]
>FileSystemWatcher class to notice them all (it relies on a fixed-sized
buffer than if filled before additions can be processed will cause
some notifications to be lost).

Any idea what size this buffer is? For me there will be a maximum of 10
people connecting at once so I doubt I will need to be concerned, but
it's always interesting to know.

No, I don't know. Last time I looked at this, I did take a look to see if
it was documented, but couldn't find any specifics. Just the admonition
to try to process the notifications as quickly as possible. I'm not sure
the exact number would be very useful anyway, since whether the buffer
gets filled depends not only on how fast one processes the notifications,
but how frequently changes are happening to the file system. In addition,
the two relate according to things that are independent of the buffer
size: CPU speed, disk i/o bandwidth, stuff like that.

Pete

Nov 20 '08 #6

Peter Duniho

On Fri, 21 Nov 2008 02:58:32 -0800, Peter Morris
<mr*********@spamgmail.comwrote:

Hi Pete

>Why do you say it is vague? There's no ambiguity at all here.

In your case I see a null entry in the list as a kind of "magic number"
approach. If the value is null you end, or maybe you would put in the
string "STOP" or "FINISH" or "QUIT". My point is that stopping based on
the value of an entry is vague because what that value should be is
entirely up to the person writing the code, different people would have
different opinions on what that value should be. However, having a
Terminated boolean member to indicated that the Terminate has been
called is obvious, the value will be "true" if terminated, it is
indisputible.

>I've removed a variable for one. With the removal of that variable,
not only have I saved as many as 4 bytes from the size of my class, I
have avoided the risk of making the mistake you did: failing to either
put the use of that variable inside a synchronized block of code, or
making the variable "volatile". :)

Yes, I did miss that, didn't I :-) I'd still rather go with the boolean
approach though :-)

>Assuming one fixes your bug, there is no _functional_ difference
between the two.

Except when you terminate :-)

Ah, I see what you mean. Yes, you're right...there's a slight
difference. Though, I'll point out that one reason it's hard to detect
(you had trouble yourself :) ) is that you've got your "dequeue" mixed up
with your "terminate". :)

That behavior isn't precluded by using a sentinel value (and for the
record, sentinel values are quite common in software...there's nothing
fundamentally wrong with them), but you'd have to use an implementation of
Queue<Tthat allows for inserting at the head (i.e. not .NET's
implementation). You could write your own, but I agree that doing so
_just_ to avoid the flag wouldn't make sense.

Also note that the code you posted doesn't actually behave exactly as you
say. In particular, this statement isn't true:

My approach will stop processing items in the loop immediately,

The code you posted doesn't terminate when you call Terminate(). It
simply goes back and waits again. It won't terminate until the _next_
time the monitor is pulsed (whether that happens due to another call to
Terminate() or someone trying to enqueue something).

Again, I think this illustrates how a simpler design is easier to code
correctly. By my count, we're up to three bugs in your implementation
just related to termination alone. I got my implementation right the
first try, and I don't think you can say it's because I'm a better
programmer. I'm not...I just know better than to complicate things when I
don't have to. :)

Now, if you have to -- that is, you have that specific requirement to
terminate immediately -- I suppose you have to implement it that way. But
neither of us have the actual specification to work with, and it's
entirely possible my version would work fine in this context.

[...]
I'm more of a "here's the general idea, now write it youself" kind of
helper. The best lesson you can learn is how to teach yourself :-)

Well, I agree with that generally. In fact, you'll note that I rarely
actually post actual code. I prefer to just describe a solution, or point
someone to the documentation. But, I'm of the mind that when I do post
code, I should try to get it as close to correct as possible. Just
because I'm not being paid, that doesn't mean I shouldn't take pride in my
work. :)

>>Any idea what size this buffer is? For me there will be a maximum of
10 people connecting at once so I doubt I will need to be concerned,
but it's always interesting to know.

No, I don't know. Last time I looked at this, I did take a look to see
if it was documented, but couldn't find any specifics.

I should be okay with around 10 at the time time, maximum, I hope! This
should definitely be documented, and obvious!

As long as in your case, 10 connections translates directly to no more
than 10 file system changes at once, then I think you're definitely safe.
That's an odd guarantee to be able to have, but if you have it, no
problem. :)

Pete

Nov 21 '08 #7

=?Utf-8?B?Tmljaw==?=

Thanks again to both of you for the excellent discussion.

"Peter Duniho" wrote:

On Fri, 21 Nov 2008 02:58:32 -0800, Peter Morris
<mr*********@spamgmail.comwrote:

Hi Pete

Why do you say it is vague? There's no ambiguity at all here.
In your case I see a null entry in the list as a kind of "magic number"
approach. If the value is null you end, or maybe you would put in the
string "STOP" or "FINISH" or "QUIT". My point is that stopping based on
the value of an entry is vague because what that value should be is
entirely up to the person writing the code, different people would have
different opinions on what that value should be. However, having a
Terminated boolean member to indicated that the Terminate has been
called is obvious, the value will be "true" if terminated, it is
indisputible.

I've removed a variable for one. With the removal of that variable,
not only have I saved as many as 4 bytes from the size of my class, I
have avoided the risk of making the mistake you did: failing to either
put the use of that variable inside a synchronized block of code, or
making the variable "volatile". :)
Yes, I did miss that, didn't I :-) I'd still rather go with the boolean
approach though :-)

Assuming one fixes your bug, there is no _functional_ difference
between the two.
Except when you terminate :-)

Ah, I see what you mean. Yes, you're right...there's a slight
difference. Though, I'll point out that one reason it's hard to detect
(you had trouble yourself :) ) is that you've got your "dequeue" mixed up
with your "terminate". :)

That behavior isn't precluded by using a sentinel value (and for the
record, sentinel values are quite common in software...there's nothing
fundamentally wrong with them), but you'd have to use an implementation of
Queue<Tthat allows for inserting at the head (i.e. not .NET's
implementation). You could write your own, but I agree that doing so
_just_ to avoid the flag wouldn't make sense.

Also note that the code you posted doesn't actually behave exactly as you
say. In particular, this statement isn't true:

My approach will stop processing items in the loop immediately,

The code you posted doesn't terminate when you call Terminate(). It
simply goes back and waits again. It won't terminate until the _next_
time the monitor is pulsed (whether that happens due to another call to
Terminate() or someone trying to enqueue something).

Again, I think this illustrates how a simpler design is easier to code
correctly. By my count, we're up to three bugs in your implementation
just related to termination alone. I got my implementation right the
first try, and I don't think you can say it's because I'm a better
programmer. I'm not...I just know better than to complicate things when I
don't have to. :)

Now, if you have to -- that is, you have that specific requirement to
terminate immediately -- I suppose you have to implement it that way. But
neither of us have the actual specification to work with, and it's
entirely possible my version would work fine in this context.

[...]
I'm more of a "here's the general idea, now write it youself" kind of
helper. The best lesson you can learn is how to teach yourself :-)

Well, I agree with that generally. In fact, you'll note that I rarely
actually post actual code. I prefer to just describe a solution, or point
someone to the documentation. But, I'm of the mind that when I do post
code, I should try to get it as close to correct as possible. Just
because I'm not being paid, that doesn't mean I shouldn't take pride in my
work. :)

>Any idea what size this buffer is? For me there will be a maximum of
10 people connecting at once so I doubt I will need to be concerned,
but it's always interesting to know.

No, I don't know. Last time I looked at this, I did take a look to see
if it was documented, but couldn't find any specifics.
I should be okay with around 10 at the time time, maximum, I hope! This
should definitely be documented, and obvious!

As long as in your case, 10 connections translates directly to no more
than 10 file system changes at once, then I think you're definitely safe.
That's an odd guarantee to be able to have, but if you have it, no
problem. :)

Pete

Nov 21 '08 #8

FileSystemWatcher question

Similar topics