foreach, IEnumerable and modifying contents

jehugaleahsa

I have a rather complex need.

I have a class that parses web pages and extracts all relevant file
addresses. It allows me to download every pdf on a web page, for
instance. I would like to incorporate threads so that I can download N
files separately.

The obvious solution is a thread pool. However, I need to make sure
that I download the files Async - so I can get percentage and status
information to my interface.

I have decided that the best way to do this is to have my Download (a
class representing the file to download) to have events raised when
they are finished. I was hoping to have my threads rejoin with the
thread pool when the downloads are finished.

However, I have my Download instances coming out of an
IEnumerable<Downloadthat is recieved from the WebExtractor class
(which parses the HTML) on-the-fly using "yield return".

I think I am lacking some basics about Thread Pools. How can I use a
thread pool and have the events fired by the Downloads still reach the
interface? Is there are way to add an event handler to an instance
while in a foreach or IEnumerator code block?

Any help would put me one step closer to being done with my second
release of the software. Thanks in advance!

~Travis

Nov 28 '07 #1

Subscribe Reply

2205

Peter Duniho

On 2007-11-27 20:24:28 -0800, "je**********@gmail.com"
<je**********@gmail.comsaid:

I have a rather complex need.

Perhaps. Though, I suspect it's more that you've created a complex
need, where it wasn't really necessary to do so.

I have a class that parses web pages and extracts all relevant file
addresses. It allows me to download every pdf on a web page, for
instance. I would like to incorporate threads so that I can download N
files separately.

A reasonably common operation.

The obvious solution is a thread pool. However, I need to make sure
that I download the files Async - so I can get percentage and status
information to my interface.

It seems to me that a different "obvious" solution would be to just use
the async methods on the HttpWebRequest class, or even just a plain
TcpClient or Socket instance, along with a queue. The producer of the
queue would add URLs to be downloaded, while the consumer would keep
track of how many active downloads are going on (via HttpWebRequest,
TcpClient, or Socket).

Every time the producer adds something to the queue, it would signal
the consumer. The consumer in response would remove items from the
queue, stopping when either the queue is empty or your maximum number
of concurrent operations has been reached, whichever comes first.

Upon completion of an item, the consumer would also be signaled,
allowing it to pull a new item from the queue.

In the above, I'm thinking of the consumer and producer as individual
threads. But you could easily implement it without a thread dedicated
to either, with the consumer and producer classes simply being called
by whatever thread happens to be managing them at the time. In that
case, "signaling" the consumer would be more a matter of just executing
the method that attempts to dequeue more download operations.

I have decided that the best way to do this is to have my Download (a
class representing the file to download) to have events raised when
they are finished. I was hoping to have my threads rejoin with the
thread pool when the downloads are finished.

If you use the async methods on the above-mentioned classes, you get
the thread pooling behavior for free.

However, I have my Download instances coming out of an
IEnumerable<Downloadthat is recieved from the WebExtractor class
(which parses the HTML) on-the-fly using "yield return".

This is another reason I think a queue would be better. There's no
technical reason you can't implement an asynchronous enumerator, but
having done so in this case seems to have overcomplicated the issue. A
queue seems like a much more natural fit to me, and wouldn't have the
same complicating factors you seem to be running into.

I think I am lacking some basics about Thread Pools. How can I use a
thread pool and have the events fired by the Downloads still reach the
interface?

I think you can avoid the question altogether, but the basic answer is
that the idea of a thread pool and having "the events...reach the
interface" are orthogonal ideas. Because of the thread pool, you may
have thread synchronization issues to deal with. But the basic
question of raising an event in a way that some implementer of some
interface receives them isn't affected by whether there are multiple
threads involved.

Is there are way to add an event handler to an instance
while in a foreach or IEnumerator code block?

You can subscribe to an event at any time you find convenient.

Any help would put me one step closer to being done with my second
release of the software. Thanks in advance!

See above. I recommend abandoning this asynchronous enumerator idea
and going with a nice, simple queue.

Pete

Nov 28 '07 #2

jehugaleahsa

On Nov 27, 10:04 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com>
wrote:

On 2007-11-27 20:24:28 -0800, "jehugalea...@gmail.com"
<jehugalea...@gmail.comsaid:

I have a rather complex need.

Perhaps. Though, I suspect it's more that you've created a complex
need, where it wasn't really necessary to do so.

I have a class that parses web pages and extracts all relevant file
addresses. It allows me to download every pdf on a web page, for
instance. I would like to incorporate threads so that I can download N
files separately.

A reasonably common operation.

The obvious solution is a thread pool. However, I need to make sure
that I download the files Async - so I can get percentage and status
information to my interface.

It seems to me that a different "obvious" solution would be to just use
the async methods on the HttpWebRequest class, or even just a plain
TcpClient or Socket instance, along with a queue. The producer of the
queue would add URLs to be downloaded, while the consumer would keep
track of how many active downloads are going on (via HttpWebRequest,
TcpClient, or Socket).

Every time the producer adds something to the queue, it would signal
the consumer. The consumer in response would remove items from the
queue, stopping when either the queue is empty or your maximum number
of concurrent operations has been reached, whichever comes first.

Upon completion of an item, the consumer would also be signaled,
allowing it to pull a new item from the queue.

In the above, I'm thinking of the consumer and producer as individual
threads. But you could easily implement it without a thread dedicated
to either, with the consumer and producer classes simply being called
by whatever thread happens to be managing them at the time. In that
case, "signaling" the consumer would be more a matter of just executing
the method that attempts to dequeue more download operations.

I have decided that the best way to do this is to have my Download (a
class representing the file to download) to have events raised when
they are finished. I was hoping to have my threads rejoin with the
thread pool when the downloads are finished.

If you use the async methods on the above-mentioned classes, you get
the thread pooling behavior for free.

However, I have my Download instances coming out of an
IEnumerable<Downloadthat is recieved from the WebExtractor class
(which parses the HTML) on-the-fly using "yield return".

This is another reason I think a queue would be better. There's no
technical reason you can't implement an asynchronous enumerator, but
having done so in this case seems to have overcomplicated the issue. A
queue seems like a much more natural fit to me, and wouldn't have the
same complicating factors you seem to be running into.

I think I am lacking some basics about Thread Pools. How can I use a
thread pool and have the events fired by the Downloads still reach the
interface?

I think you can avoid the question altogether, but the basic answer is
that the idea of a thread pool and having "the events...reach the
interface" are orthogonal ideas. Because of the thread pool, you may
have thread synchronization issues to deal with. But the basic
question of raising an event in a way that some implementer of some
interface receives them isn't affected by whether there are multiple
threads involved.

Is there are way to add an event handler to an instance
while in a foreach or IEnumerator code block?

You can subscribe to an event at any time you find convenient.

Any help would put me one step closer to being done with my second
release of the software. Thanks in advance!

See above. I recommend abandoning this asynchronous enumerator idea
and going with a nice, simple queue.

Pete

My first implementation actually had a Queue<Downloadthat was
consumed when I recieved that a download had finished. However, it was
difficult for my code to say, "Hey, stop trying to consume!" I ended
up having a very rigid code set and I was hoping to get away from it.
I was having BIG issues with the events of from one download finishing
interrupting while another thread was in the middle of a locked block.
I kept getting the occasional dead lock.

My hope in my new design was to get away from the need for so much
concurrency management. I did that by using the yield return statement
and making that my Queue, in a sense. It also makes the termination
point a lot easier to see. However, without a way of saying, "Hey,
we're not ready to start downloading you yet - wait for a moment", I
was downloading as many files at once as my computer could handle. So
my hope was to find a way to say, "Hey wait" while not needing to
necessarily manage the number of threads/concurrent downloads
manually.

I could try to manage the downloads manually again. I did move a lot
of code around to separate the interface from the downloading, so it
might be easier now than before. ThreadPools seemed more intuitive to
me the second time around. Perhaps my first approach is the better
one.

Thanks for your thoughts,
Travis

Nov 28 '07 #3

Peter Duniho

On 2007-11-27 21:23:40 -0800, "je**********@gmail.com"
<je**********@gmail.comsaid:

My first implementation actually had a Queue<Downloadthat was
consumed when I recieved that a download had finished. However, it was
difficult for my code to say, "Hey, stop trying to consume!"

Typically with a queue, that point is when the queue is empty. It's
not usually difficult.

I ended
up having a very rigid code set and I was hoping to get away from it.
I was having BIG issues with the events of from one download finishing
interrupting while another thread was in the middle of a locked block.
I kept getting the occasional dead lock.

Well, for what it's worth you seem to be dealing with threading issues
anyway. Dead lock is a consequence of a buggy implementation. If you
had trouble dealing with thread synchronization in the previous design,
you're likely to have trouble with any other design that also involves
threads.

My hope in my new design was to get away from the need for so much
concurrency management.

How you intended to do that by introducing your own thread pool, I'm
not really clear on. :)

I did that by using the yield return statement
and making that my Queue, in a sense. It also makes the termination
point a lot easier to see. However, without a way of saying, "Hey,
we're not ready to start downloading you yet - wait for a moment", I
was downloading as many files at once as my computer could handle. So
my hope was to find a way to say, "Hey wait" while not needing to
necessarily manage the number of threads/concurrent downloads
manually.

Managing that with a the queue/async paradigm I mentioned would be
simple. Especially given the efficiency advantages of using the async
i/o methods on the network classes, it seems to me that managing the
concurrent consumer count by creating your own thread pool is much more
complicated and error-prone.

I'd say the puzzlement you appear to have put yourself into here is a
good indication of that. :)

I could try to manage the downloads manually again. I did move a lot
of code around to separate the interface from the downloading, so it
might be easier now than before. ThreadPools seemed more intuitive to
me the second time around. Perhaps my first approach is the better
one.

If it's like what I suggested, obviously I'd agree. :)

Pete

Nov 28 '07 #4

jehugaleahsa

On Nov 27, 10:53 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com>
wrote:

On 2007-11-27 21:23:40 -0800, "jehugalea...@gmail.com"
<jehugalea...@gmail.comsaid:

My first implementation actually had a Queue<Downloadthat was
consumed when I recieved that a download had finished. However, it was
difficult for my code to say, "Hey, stop trying to consume!"

Typically with a queue, that point is when the queue is empty. It's
not usually difficult.

Well, the first go around, the queue being empty didn't mean I was
done. It occurred quite often that I would finish downloading all my
files before more files were added to the list. I should have
mentioned that the application pulls all web pages off of a page and
descends into those as well. It happened often that a web page was
slow to download or that one would have many links, but not much
media. I ended up having an empty queue regularly toward the beginning
of a run.

Since I had code for extracting html pages and another for specific
file types, I had to keep them in sync so that the application would
finish when and only when both were done. Again, this was a bit of a
concurrency issue. Before I used the yield return method, my biggest
indication that the program was being cancelled was a class-wide
variable that need to be checked regularly (requiring lots of locks).
However, I can just stop the web extractor now and the downloads will
stop being yielded, which stops the downloader. The downloader can
then cancel all running downloads and break out of the consuming loop.
It did make concurrency simplier in this case.

However, now I just have Downloads coming in as fast as they are
found. I will try your approach of starting the next download when I
have time. What I will have to do is make the Download consumer
without a loop. But just MoveNext of the enumerator when I am
indicated that a download finished.

Here is a scenario: One download finishes and my code begins pulling
the next Download. However, the web extractor is not ready. While
waiting, another download finishes and now a second piece of code
begins pulling the next Download. Now I have two pieces of code trying
to access the same enumerator. Can I be sure that this won't corrupt
my enumerator? If I were to lock the IEnumerator<Download>, would this
cause a deadlock since they are different event handlers?

Concurrency isn't that simple for someone who hasn't had to deal with
it. I had plenty of theory in school, including producer/consumer
algorithms. Dealing with events seems similar to threads, but they
take complete control. Threads at least switch context when they hit a
lock.

Thanks again,
Travis

Nov 28 '07 #5

Peter Duniho

On 2007-11-28 07:01:58 -0800, "je**********@gmail.com"
<je**********@gmail.comsaid:

Well, the first go around, the queue being empty didn't mean I was
done. It occurred quite often that I would finish downloading all my
files before more files were added to the list.

The queue being empty did in fact mean you were done, at least for the moment.

In a typical queue design, you would gracefully deal with an empty
queue. A queue that's empty just means there's no work to do. The
consumer sits idle (either as an actual thread blocked on an wait
event, or just a class that doesn't do anything until some code calls
something that adds something new to the queue) until there's more work
to do. The logic is the same for the case of starting up some
processing as it is for the case of temporarily running out of work to
do and then being presented with some more.

If your design didn't support that, then you probably did not separate
the logic of the producer, consumer, and client of the queue well
enough.

[...]
Here is a scenario: One download finishes and my code begins pulling
the next Download. However, the web extractor is not ready. While
waiting, another download finishes and now a second piece of code
begins pulling the next Download. Now I have two pieces of code trying
to access the same enumerator. Can I be sure that this won't corrupt
my enumerator? If I were to lock the IEnumerator<Download>, would this
cause a deadlock since they are different event handlers?

I can't really comment on an enumerator that you haven't posted code
for. Also, I haven't used any custom enumerators in real-world code,
so I don't have much experience with them. However, I would say that
if you have two pieces of code trying to access the same enumerator,
you've got a bug. I would think that each call to GetEnumerator()
should return a brand new one, so that different parts of the code
don't interfere with each other.

If you do decide to return the same enumerator to different parts of
the code, or different instances of the same code, I'd say that at a
bare minimum you will need to be VERY careful about how you use the
enumerator (and for sure it will need to be written in a thread-safe
way to account for this multiple access usage), and it's very likely
there's a better way to design the code (like using a queue :) ).

Concurrency isn't that simple for someone who hasn't had to deal with
it. I had plenty of theory in school, including producer/consumer
algorithms. Dealing with events seems similar to threads, but they
take complete control. Threads at least switch context when they hit a
lock.

Events and threads are, as I mentioned, orthogonal to each other. An
event is really just a nice syntax for a multi-subscriber callback
mechanism. When an event is raised, the handler always executes in the
same thread in which it was raised. Multiple threads impose
synchronization requirements on your code, and these requirements are
the same whether you are using events or not.

That said, I never meant to imply that concurrency was simple. It's
not. If anything, my intent is to point out that concurrency is _not_
simple, and that your second design appears to have just made it more
complicated than it otherwise needed to be.

If you want to have multiple threads processing things, you _are_ going
to have to deal with concurrency. So the question is not whether you
can get away from concurrency issues or not; you can't. The question
is how complicated are you going to make those issues.

So far, it seems that you've made them very complicated. :)

For fun, I'm thinking about working on a simple download simulation
that uses a queue to manage the downloads. If and when it's finished,
I'll post the code here in case you or anyone else is interested.
Might not be done today, as I've got a busy day, but maybe tomorrow.

Pete

Nov 28 '07 #6

Ben Voigt [C++ MVP]

<je**********@gmail.comwrote in message
news:71**********************************@e23g2000 prf.googlegroups.com...

On Nov 27, 10:53 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com>
wrote:
>On 2007-11-27 21:23:40 -0800, "jehugalea...@gmail.com"
<jehugalea...@gmail.comsaid:

My first implementation actually had a Queue<Downloadthat was
consumed when I recieved that a download had finished. However, it was
difficult for my code to say, "Hey, stop trying to consume!"

Typically with a queue, that point is when the queue is empty. It's
not usually difficult.

Well, the first go around, the queue being empty didn't mean I was
done. It occurred quite often that I would finish downloading all my
files before more files were added to the list. I should have
mentioned that the application pulls all web pages off of a page and
descends into those as well. It happened often that a web page was
slow to download or that one would have many links, but not much
media. I ended up having an empty queue regularly toward the beginning
of a run.

Since I had code for extracting html pages and another for specific
file types, I had to keep them in sync so that the application would
finish when and only when both were done. Again, this was a bit of a
concurrency issue. Before I used the yield return method, my biggest
indication that the program was being cancelled was a class-wide
variable that need to be checked regularly (requiring lots of locks).
However, I can just stop the web extractor now and the downloads will
stop being yielded, which stops the downloader. The downloader can
then cancel all running downloads and break out of the consuming loop.
It did make concurrency simplier in this case.

However, now I just have Downloads coming in as fast as they are
found. I will try your approach of starting the next download when I
have time. What I will have to do is make the Download consumer
without a loop. But just MoveNext of the enumerator when I am
indicated that a download finished.

Here is a scenario: One download finishes and my code begins pulling
the next Download. However, the web extractor is not ready. While
waiting, another download finishes and now a second piece of code
begins pulling the next Download. Now I have two pieces of code trying
to access the same enumerator. Can I be sure that this won't corrupt
my enumerator? If I were to lock the IEnumerator<Download>, would this
cause a deadlock since they are different event handlers?

something like:

delegate ... DownloadProcessor(...);
Semaphore limit = new Semaphore(N);

foreach (Download down in GetDownloads()) {
limit.WaitOne();
DownloadProcessor dp = down.Process;
dp.BeginInvoke(..., delegate { limit.Release(); } , null); // using the
AsyncCallback to release one more semaphore after each download completes
}

>
Concurrency isn't that simple for someone who hasn't had to deal with
it. I had plenty of theory in school, including producer/consumer
algorithms. Dealing with events seems similar to threads, but they
take complete control. Threads at least switch context when they hit a
lock.

Thanks again,
Travis

Nov 28 '07 #7

jehugaleahsa

On Nov 28, 12:26 pm, "Ben Voigt [C++ MVP]" <r...@nospam.nospamwrote:

<jehugalea...@gmail.comwrote in message

news:71**********************************@e23g2000 prf.googlegroups.com...

On Nov 27, 10:53 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com>
wrote:
On 2007-11-27 21:23:40 -0800, "jehugalea...@gmail.com"
<jehugalea...@gmail.comsaid:

My first implementation actually had a Queue<Downloadthat was
consumed when I recieved that a download had finished. However, it was
difficult for my code to say, "Hey, stop trying to consume!"

Typically with a queue, that point is when the queue is empty. It's
not usually difficult.

Well, the first go around, the queue being empty didn't mean I was
done. It occurred quite often that I would finish downloading all my
files before more files were added to the list. I should have
mentioned that the application pulls all web pages off of a page and
descends into those as well. It happened often that a web page was
slow to download or that one would have many links, but not much
media. I ended up having an empty queue regularly toward the beginning
of a run.

Since I had code for extracting html pages and another for specific
file types, I had to keep them in sync so that the application would
finish when and only when both were done. Again, this was a bit of a
concurrency issue. Before I used the yield return method, my biggest
indication that the program was being cancelled was a class-wide
variable that need to be checked regularly (requiring lots of locks).
However, I can just stop the web extractor now and the downloads will
stop being yielded, which stops the downloader. The downloader can
then cancel all running downloads and break out of the consuming loop.
It did make concurrency simplier in this case.

However, now I just have Downloads coming in as fast as they are
found. I will try your approach of starting the next download when I
have time. What I will have to do is make the Download consumer
without a loop. But just MoveNext of the enumerator when I am
indicated that a download finished.

Here is a scenario: One download finishes and my code begins pulling
the next Download. However, the web extractor is not ready. While
waiting, another download finishes and now a second piece of code
begins pulling the next Download. Now I have two pieces of code trying
to access the same enumerator. Can I be sure that this won't corrupt
my enumerator? If I were to lock the IEnumerator<Download>, would this
cause a deadlock since they are different event handlers?

something like:

delegate ... DownloadProcessor(...);
Semaphore limit = new Semaphore(N);

foreach (Download down in GetDownloads()) {
limit.WaitOne();
DownloadProcessor dp = down.Process;
dp.BeginInvoke(..., delegate { limit.Release(); } , null); // using the
AsyncCallback to release one more semaphore after each download completes

}

Concurrency isn't that simple for someone who hasn't had to deal with
it. I had plenty of theory in school, including producer/consumer
algorithms. Dealing with events seems similar to threads, but they
take complete control. Threads at least switch context when they hit a
lock.

Thanks again,
Travis- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -

The semaphore tells me to wait. I will try that, when I get a chance,
as well. I will have to learn about Semaphores as well.

Nov 28 '07 #8

jehugaleahsa

On Nov 28, 12:00 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com>
wrote:

On 2007-11-28 07:01:58 -0800, "jehugalea...@gmail.com"
<jehugalea...@gmail.comsaid:

Well, the first go around, the queue being empty didn't mean I was
done. It occurred quite often that I would finish downloading all my
files before more files were added to the list.

The queue being empty did in fact mean you were done, at least for the moment.

In a typical queue design, you would gracefully deal with an empty
queue. A queue that's empty just means there's no work to do. The
consumer sits idle (either as an actual thread blocked on an wait
event, or just a class that doesn't do anything until some code calls
something that adds something new to the queue) until there's more work
to do. The logic is the same for the case of starting up some
processing as it is for the case of temporarily running out of work to
do and then being presented with some more.

If your design didn't support that, then you probably did not separate
the logic of the producer, consumer, and client of the queue well
enough.

[...]
Here is a scenario: One download finishes and my code begins pulling
the next Download. However, the web extractor is not ready. While
waiting, another download finishes and now a second piece of code
begins pulling the next Download. Now I have two pieces of code trying
to access the same enumerator. Can I be sure that this won't corrupt
my enumerator? If I were to lock the IEnumerator<Download>, would this
cause a deadlock since they are different event handlers?

I can't really comment on an enumerator that you haven't posted code
for. Also, I haven't used any custom enumerators in real-world code,
so I don't have much experience with them. However, I would say that
if you have two pieces of code trying to access the same enumerator,
you've got a bug. I would think that each call to GetEnumerator()
should return a brand new one, so that different parts of the code
don't interfere with each other.

If you do decide to return the same enumerator to different parts of
the code, or different instances of the same code, I'd say that at a
bare minimum you will need to be VERY careful about how you use the
enumerator (and for sure it will need to be written in a thread-safe
way to account for this multiple access usage), and it's very likely
there's a better way to design the code (like using a queue :) ).

Concurrency isn't that simple for someone who hasn't had to deal with
it. I had plenty of theory in school, including producer/consumer
algorithms. Dealing with events seems similar to threads, but they
take complete control. Threads at least switch context when they hit a
lock.

Events and threads are, as I mentioned, orthogonal to each other. An
event is really just a nice syntax for a multi-subscriber callback
mechanism. When an event is raised, the handler always executes in the
same thread in which it was raised. Multiple threads impose
synchronization requirements on your code, and these requirements are
the same whether you are using events or not.

That said, I never meant to imply that concurrency was simple. It's
not. If anything, my intent is to point out that concurrency is _not_
simple, and that your second design appears to have just made it more
complicated than it otherwise needed to be.

If you want to have multiple threads processing things, you _are_ going
to have to deal with concurrency. So the question is not whether you
can get away from concurrency issues or not; you can't. The question
is how complicated are you going to make those issues.

So far, it seems that you've made them very complicated. :)

For fun, I'm thinking about working on a simple download simulation
that uses a queue to manage the downloads. If and when it's finished,
I'll post the code here in case you or anyone else is interested.
Might not be done today, as I've got a busy day, but maybe tomorrow.

Pete

Your extended effort to help me is commendable. Thank you very much.

In a typical queue design, you would gracefully deal with an empty
queue. A queue that's empty just means there's no work to do. The
consumer sits idle (either as an actual thread blocked on an wait
event, or just a class that doesn't do anything until some code calls
something that adds something new to the queue) until there's more work
to do. The logic is the same for the case of starting up some
processing as it is for the case of temporarily running out of work to
do and then being presented with some more.

I grasp what you are saying, but I'm not sure what the thread does
while it is idle. That or I'm not sure how to wake it up.

When you use "yield return", it actually is very much like a thread.
It returns one thing and goes away until the next is needed. The class
processing the downloads does idle before the next Download is
yielded. This is just how "yield return" works and it did make my code
*seem* cleaner. All methods with "yield return" return IEnumerable.
You access the yielded data using an IEnumerator. So, I'm just using a
foreach loop. It looks like this:

public class DownloadManager
{
WebExtractor extractor = new WebExtractor(/* Arguments */);
bool cancelled = false;
object cancelSync = new object();

public void DownloadFiles()
{
// BEGIN THREAD
foreach (Download download in extractor.Start()) //
WebExtractor.Start yield returns
//
Downloads as they are found.
{
// add event handlers
download.Start();
lock (cancelSync)
{
if (cancelled)
{
break;
}
}
}
// END THREAD
}

public void Cancel()
{
// BEGIN THREAD
lock (cancelSync)
{
cancelled = true;
}
// END THREAD
}
}

Nov 28 '07 #9

jehugaleahsa

With Semaphores for example:

public class DownloadManager
{
WebExtractor extractor = new WebExtractor(/* Arguments */);
bool cancelled = false;
object cancelSync = new object();
Semaphore semaphore = new Semaphore(5, 5);
public void DownloadFiles()
{
// BEGIN THREAD
foreach (Download download in extractor.Start()) //
WebExtractor.Start yield returns
//
Downloads as they are found.
{
// add event handlers
semaphore.WaitOne();
download.StatusChanged += new
StatusChangedEventArgs(status_Changed);
download.Start();
lock (cancelSync)
{
if (cancelled)
{
break;
}
}
}
// END THREAD
}
private void status_Changed(object sender, StatusChangedEventArgs
e)
{
if (e.Status == DownloadStatus.Complete)
{
semaphore.Release();
}
}

public void Cancel()
{
// BEGIN THREAD
lock (cancelSync)
{
cancelled = true;
}
// END THREAD
}

Nov 28 '07 #10

jehugaleahsa

On Nov 28, 3:07 pm, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:

With Semaphores for example:

public class DownloadManager
{
WebExtractor extractor = new WebExtractor(/* Arguments */);
bool cancelled = false;
object cancelSync = new object();
Semaphore semaphore = new Semaphore(5, 5);

public void DownloadFiles()
{
// BEGIN THREAD
foreach (Download download in extractor.Start()) //
WebExtractor.Start yield returns
//
Downloads as they are found.
{
// add event handlers
semaphore.WaitOne();
download.StatusChanged += new
StatusChangedEventArgs(status_Changed);
download.Start();
lock (cancelSync)
{
if (cancelled)
{
break;
}
}
}
// END THREAD
}

private void status_Changed(object sender, StatusChangedEventArgs
e)
{
if (e.Status == DownloadStatus.Complete)
{
semaphore.Release();
}
}

public void Cancel()
{
// BEGIN THREAD
lock (cancelSync)
{
cancelled = true;
}
// END THREAD
}

Actually, it appears that using Semaphores with WebClient is a no-no.

Nov 29 '07 #11

Ben Voigt [C++ MVP]

<je**********@gmail.comwrote in message
news:15**********************************@o42g2000 hsc.googlegroups.com...

On Nov 28, 3:07 pm, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
>With Semaphores for example:

public class DownloadManager
{
WebExtractor extractor = new WebExtractor(/* Arguments */);
bool cancelled = false;
object cancelSync = new object();
Semaphore semaphore = new Semaphore(5, 5);

public void DownloadFiles()
{
// BEGIN THREAD
foreach (Download download in extractor.Start()) //
WebExtractor.Start yield returns

//
Downloads as they are found.
{
// add event handlers
semaphore.WaitOne();
download.StatusChanged += new
StatusChangedEventArgs(status_Changed);
download.Start();
lock (cancelSync)
{
if (cancelled)
{
break;
}
}
}
// END THREAD
}

private void status_Changed(object sender, StatusChangedEventArgs
e)
{
if (e.Status == DownloadStatus.Complete)
{
semaphore.Release();
}
}

public void Cancel()
{
// BEGIN THREAD
lock (cancelSync)
{
cancelled = true;
}
// END THREAD
}

Actually, it appears that using Semaphores with WebClient is a no-no.

That surprises me.

I thought you might have some issues with the spidering/page parsing not
running until there is a download slot available, and the code you posted
clearly won't cancel the spider until one of the downloads completes
(perhaps you can cancel each download somehow).

What exactly is going wrong? Does it help to use BeginInvoke to perform the
download from a thread other than the one holding the semaphore?

Nov 29 '07 #12

Peter Duniho

On 2007-11-28 11:00:17 -0800, Peter Duniho <Np*********@NnOwSlPiAnMk.comsaid:

[...]
For fun, I'm thinking about working on a simple download simulation
that uses a queue to manage the downloads. If and when it's finished,
I'll post the code here in case you or anyone else is interested.
Might not be done today, as I've got a busy day, but maybe tomorrow.

Hi. I finished the simulation. It effectively demonstrates what I'm
talking about with respect to using a queue. You'll see that the queue
class itself is _very_ simple, and yet it contains everything you need
to implement the basic functionality you've described.

It doesn't provide dynamic resizing of the number of allowed concurrent
operations, nor of cancelling an existing operation (whether it's
started working or not). Those things would not be difficult to add
though. The basic logic is easily extended to handle those scenarios.

The application itself is a GUI application. It wasn't strictly
required for the demonstration, but it makes for an easy-to-control
mechanism for adding new work items and providing user feedback for how
those work items are processed.

It's funny...even though I know the program isn't doing any actual
work, there is something oddly satisfying about watching all those
progress bars work their way toward completion. I keep wanting to add
more work items, so I can see more progress bars finish. :)

Anyway, I'm copying the code here (see below). You can create a new,
empty project, add a new source file to the project and copy all of
this verbatim into that single file. You'll need to add references to
System, System.Drawing, and System.Windows.Forms to the project. Then
it should just compile and run.

Enjoy! I tried to provide sufficient comments for the classes and code
to explain the details, bu please feel free to post specific questions
if it's not all clear.

Pete

using System;

using System.Collections.Generic;

using System.Windows.Forms;

using System.Threading;

namespace TestMultiDownloadQueue

{

// DummyAsync is probably the most complicated class here, and the

// least interesting. It only exists to provide a class with an

// async API similar to those available from classes like

// HttpWebRequest, Socket, Stream, and TcpClient but without

// requiring any actual work to be done.

//

// It has the basic "Begin" and "End" methods to start and complete

// a simulated async operation. The constructor is passed a TimeSpan

// that indicates the total time that should be consumed by the

// simulated operation. The DummyAsync class decides how to divide the

// total time into partial operations, and each call to the "Begin"

// method will consume one interval of the total time. In that way,

// it simulates an async operation that takes a certain amount of time

// to complete and which requires multiple async calls to the same

// instance to finish all of the work.

class DummyAsync

{

// Total duration of this simulated operation

private TimeSpan _tsDuration;

// Duration of a single simulated processing unit

// (i.e. one call to BeginDummy()

private TimeSpan _tsInterval;

// A list of all of the current async operations for

// this instance

private List<AsyncInfo_rgai = new List<AsyncInfo>();

// The callback delegate that the client of this class must

// implement.

public delegate void DummyCallback(IAsyncInfo iai);

// Could've used IAsyncResult, but didn't want to bother with

// supporting the full IAsyncResult model (with a wait handle, etc.)

public interface IAsyncInfo

{

object AsyncState { get; }

}

// An actual async state class. Contains the context information

// for the client, as well as this class's own internal state
information

private class AsyncInfo : IAsyncInfo

{

// I don't normally make public fields, but for this very simple

// class in a very specific test harness, I decided to not bother

// wrapping them in properties.

public DummyCallback callback;

public object context;

public TimeSpan ts;

// Stores a one-shot timer that will call us back when the

// simulated processing is done

private System.Threading.Timer _timer;

public AsyncInfo(DummyCallback callback, object context)

{

this.callback = callback;

this.context = context;

}

public void Start(TimeSpan ts, TimerCallback callback)

{

this.ts = ts;

_timer = new System.Threading.Timer(callback, this,
this.ts, TimeSpan.Zero);

}

#region IAsyncInfo Members

public object AsyncState

{

get { return context; }

}

#endregion

}

public DummyAsync(TimeSpan tsDuration)

{

_tsDuration = tsDuration;

_tsInterval = new TimeSpan(_tsDuration.Ticks / 100);

// No need to abuse our timers

if (_tsInterval < new TimeSpan(0, 0, 0, 0, 100))

{

_tsInterval = new TimeSpan(0, 0, 0, 0, 100);

}

}

// When the client calls this method, we create a new state object,

// tell it to start a timer, update our duration to indicate how

// much additional processing might be required, and save the new

// state object to our list of current async operations,

public void BeginDummy(DummyCallback callback, object context)

{

AsyncInfo ai = new AsyncInfo(callback, context);

ai.Start(new TimeSpan(Math.Min(_tsDuration.Ticks,
_tsInterval.Ticks)), _TimerCallback);

_tsDuration = new TimeSpan(Math.Max(0, _tsDuration.Ticks -
_tsInterval.Ticks));

_rgai.Add(ai);

}

// The client must call this method after being notified of the

// completion of our simulated async operation via the callback.

// This method removes the state object from our list of current

// async operations and returns to the user the time spent on

// the simulated processing, representing some amount of progress

// toward completion of the simulated operaton.

public TimeSpan EndDummy(IAsyncInfo iai)

{

AsyncInfo ai = (AsyncInfo)iai;

_rgai.Remove(ai);

return ai.ts;

}

// Our _own_ callback method, used for the timer that is used to

// simulate the work. Not to be confused with the client's callback,

// though the client's callback is in fact called from here.

private void _TimerCallback(object context)

{

AsyncInfo ai = (AsyncInfo)context;

ai.callback(ai);

}

}

// DownloadItem represents a single work item that the controller

// class managing the downloads knows about. For the manager's

// benefit, it implements a specific interface that the manager

// requires. The rest of the class is specific to our simulated work,

// but it would trivial to replace that code with code that uses

// some class that does actual work via async methods.

public class DownloadItem : DownloadManager.IDownloadItem

{

// The ProgressChanged event is not required for the demonstration

// of the queue, but is useful for providing user feedback regarding

// the progress of the operation. A real-world download object

// would likely have something similar, whether explicitly in its

// own class or implemented already by a class it uses (like

// BackgroundWorker).

//

// This ProgressChanged event, unlike the one in BackgroundWorker,

// provides a real number as the percentage information, allowing

// finer granularity than 1% intervals. The float ranges from 0.0

// to 1.0, so strictly speaking it's not a "percent" value. But

// the name is still reasonably suggestive as to the relationship

// between the value and the progress.

#region ProgressChanged event members

public class ProgressChangedEventArgs : EventArgs

{

private float _percentDone;

public ProgressChangedEventArgs(float percentDone)

{

_percentDone = percentDone;

}

public float PercentDone

{

get { return _percentDone; }

}

}

public delegate void ProgressChangedEventHandler(object sender,
ProgressChangedEventArgs e);

public event ProgressChangedEventHandler ProgressChanged;

#endregion

public DownloadItem(TimeSpan tsDuration)

{

_tsDuration = tsDuration;

}

#region IDownloadItem Members

// Even in a real-world async operation object, the Process() method

// could be nearly as simple as this. This class takes a duration in

// construction, whereas a real-world class might take some kind of web

// address object or a string that represents one. This class

// creates an instance of our DummyAsync class, whereas a real-world

// class might create an HttpWebRequest instance. Finally, this class

// calls the "Begin" method for the async API of the DummyAsync class,

// whereas a real-world class would call the "Begin" method of whatever

// async-capable class it's using.

public void Process()

{

_da = new DummyAsync(_tsDuration);

_da.BeginDummy(_AsyncCallback, null);

}

// This event must be implemented and raised when the actual
work is done,

// otherwise the queue has no way to know that it's okay to
start another

// work item.

//

// Note that while this event is _required_ by the queue manager, it

// is also used by the user-interface. A nice example of taking

// advantage of the multi-cast characteristics of events/delegates. :)

public event EventHandler Done;

#endregion

// Total time consumed by this simulated work item

private TimeSpan _tsDuration;

// Time consumed so far by this simulated work item

private TimeSpan _tsDone;

// Instance of our aync operation simulation class; equivalent to

// an HttpWebRequest, Stream, Socket, TcpClient, etc.

private DummyAsync _da;

// Callback for use with async operation class. Note that in spite

// of the fact that the async operation class is itself a simulation,

// this method is very similar to those you might find in a real-world

// async-using class. Using the parameter of the callback, the "End"

// method is called, which returns whatever results the async operation

// has accomplished. If there is still work left to be done,
the "Begin"

// is called again.

private void _AsyncCallback(DummyAsync.IAsyncInfo iai)

{

_tsDone += _da.EndDummy(iai);

if (_tsDone < _tsDuration)

{

_da.BeginDummy(_AsyncCallback, null);

}

else

{

_RaiseDoneEvent();

}

_RaiseProgressChangedEvent();

}

// A couple of standard event-raising methods

private void _RaiseProgressChangedEvent()

{

ProgressChangedEventHandler handler = ProgressChanged;

if (handler != null)

{

handler(this, new
ProgressChangedEventArgs((float)_tsDone.Ticks / _tsDuration.Ticks));

}

}

private void _RaiseDoneEvent()

{

EventHandler handler = Done;

if (handler != null)

{

handler(this, new EventArgs());

}

}

}

// The DownloadManager class is the most interesting and relevant class

// to this thread. It also happens to be the simplest. :)

//

// This class defines a simple interface, IDownloadItem, that
represents the

// bare minimum required to deal with the async work. A class must have

// a method that starts the processing, and it must implement an event

// that will be raised when the work is done.

//

// There's no real requirement that the class implementing that

// interface do its work in an async manner, but if it doesn't then

// the thread that added a work item to the queue could wind up

// blocked until the work is done. Additionally a thread completing

// some work could wind up blocked until the next dequeued work item

// has been completed (assuming any are in the queue). And of course,

// this chain of blocking could continue until all the items have

// been finally processed.

//

// Obviously then, this implementation really works best with tasks

// that can be represented with an async API, so that the call to a

// method that initializes the processing will complete immediately.

// If you want to use a class that implements IDownloadItem by doing

// all of its actual work in the Process() method, then this class

// should be changed so that it creates (for example) a BackgroundWorker

// instance for each IDownloadItem, and call the IDownloadItem.Process()

// method from that BackgroundWorker's DoWork event handler.

//

// Note that the IDownloadItem could implement Process() by using

// BackgroundWorker or similar itself. In that case, the IDownloadItem

// would basically be using an async API and would work just fine with

// this manager code.

class DownloadManager

{

// This interface is what any item that might be managed by

// this manager must implement. Note that while I called this

// class "DownloadManager" there's not really anything here

// that is specific to downloading. You could use this manager

// to control a queue of any sort of operation that can be

// encapsulated in this interface (which could be all sorts of

// things).

public interface IDownloadItem

{

void Process();

event EventHandler Done;

}

// Object instance to use for locking the queue

private object _objLock = new object();

// The queue itself

private Queue<IDownloadItem_queue = new Queue<IDownloadItem>();

// The maximum number of things that should be working

// at once.

private int _citemProcessMax;

// The number of things that are currently working.

private int _citemProcessCur;

public DownloadManager(int citemProcessMax)

{

_citemProcessMax = citemProcessMax;

}

// The client calls this with a reference to an IDownloadItem

// it wants processed.

public void NewItem(IDownloadItem item)

{

lock (_objLock)

{

// As long as we haven't hit our maximum number of

// items to concurrently process yet, just count the

// item. Otherwise, queue it up.

if (_citemProcessCur < _citemProcessMax)

{

_citemProcessCur++;

}

else

{

_queue.Enqueue(item);

item = null;

}

}

// If we get this far with "item" still being non-null,

// then it's okay to let it start working. Add ourselves

// to monitor when it's done, and start it going.

//

// Note that this is outside of the lock. This means that if

// the queue manager _is_ used with non-async type operations,

// it could still effectively be used by threaded code that is

// managing multiple operations at a higher level. It's not

// really how this code was designed to be used, but it would

// at least work.

if (item != null)

{

item.Done += _DoneEventHandler;

item.Process();

}

}

// Here's the method that's actually does the work when the

// Done event is raised.

private void _DoneItem()

{

IDownloadItem itemNext = null;

lock (_objLock)

{

// Checks our queue. If there's something in it, get the

// next item. Otherwise, we've got one fewer items working

// than we did before.

if (_queue.Count 0)

{

itemNext = _queue.Dequeue();

}

else

{

_citemProcessCur--;

}

}

// If we dequeued an item above, go ahead and start it just like we

// did in the NewItem() method. (And again, it's outside the lock)

if (itemNext != null)

{

itemNext.Done += _DoneEventHandler;

itemNext.Process();

}

}

private void _DoneEventHandler(object sender, EventArgs e)

{

_DoneItem();

}

}

// A simple form that allows the user to specify some parameters to

// control the range of random durations used to create work items, as

// well as the number of work items that should be created with each click

// of the "Start" button.

//

// In this simulation, clicking the "Start" button while some operations

// are still going would be equivalent to starting new downloads before the

// others had finished. Clicking it while no operations are still going

// would be equivalent to starting new downloads after all the previous

// ones had completed. Note that there isn't any code to specifically deal

// with the two different scenarios. Because of the way the queue works,

// the same logic handles both scenarios without any difficulty.

public class Form1 : Form

{

private const int kcitemsDefault = 1;

private TimeSpan ktsMinDefault = new TimeSpan(0, 0, 5);

private TimeSpan ktsMaxDefault = new TimeSpan(0, 0, 20);

public Form1()

{

InitializeComponent();

tbxOperations.Text = kcitemsDefault.ToString();

tbxTimeMin.Text = ktsMinDefault.ToString();

tbxTimeMax.Text = ktsMaxDefault.ToString();

}

private Random _rnd = new Random();

private int _citemSequence;

private DownloadManager _dm = new DownloadManager(5);

private void button1_Click(object sender, EventArgs e)

{

int citems;

TimeSpan tsMin, tsMax;

if (!int.TryParse(tbxOperations.Text, out citems))

{

citems = kcitemsDefault;

tbxOperations.Text = kcitemsDefault.ToString();

}

if (!TimeSpan.TryParse(tbxTimeMin.Text, out tsMin))

{

tsMin = ktsMinDefault;

tbxTimeMin.Text = ktsMinDefault.ToString();

}

if (!TimeSpan.TryParse(tbxTimeMax.Text, out tsMax))

{

tsMax = ktsMaxDefault;

tbxTimeMax.Text = ktsMaxDefault.ToString();

}

for (int iitem = 0; iitem < citems; iitem++)

{

TimeSpan tsItem = new TimeSpan((int)(_rnd.NextDouble()
* (tsMax.Ticks - tsMin.Ticks)) + tsMin.Ticks);

DownloadItem item = new DownloadItem(tsItem);

ItemStatus status = new ItemStatus(item, "Download #" +
(_citemSequence++).ToString() + ", " + tsItem.ToString());

tableLayoutPanel1.Controls.Add(status);

_dm.NewItem(item);

}

}

/// <summary>

/// Required designer variable.

/// </summary>

private System.ComponentModel.IContainer components = null;

/// <summary>

/// Clean up any resources being used.

/// </summary>

/// <param name="disposing">true if managed resources should be
disposed; otherwise, false.</param>

protected override void Dispose(bool disposing)

{

if (disposing && (components != null))

{

components.Dispose();

}

base.Dispose(disposing);

}

#region Windows Form Designer generated code

/// <summary>

/// Required method for Designer support - do not modify

/// the contents of this method with the code editor.

/// </summary>

private void InitializeComponent()

{

this.button1 = new System.Windows.Forms.Button();

this.label1 = new System.Windows.Forms.Label();

this.label2 = new System.Windows.Forms.Label();

this.label3 = new System.Windows.Forms.Label();

this.label4 = new System.Windows.Forms.Label();

this.tbxOperations = new System.Windows.Forms.TextBox();

this.tbxTimeMin = new System.Windows.Forms.TextBox();

this.tbxTimeMax = new System.Windows.Forms.TextBox();

this.tableLayoutPanel1 = new
System.Windows.Forms.TableLayoutPanel();

this.SuspendLayout();

//

// button1

//

this.button1.Location = new System.Drawing.Point(13, 13);

this.button1.Name = "button1";

this.button1.Size = new System.Drawing.Size(75, 23);

this.button1.TabIndex = 0;

this.button1.Text = "Start";

this.button1.UseVisualStyleBackColor = true;

this.button1.Click += new System.EventHandler(this.button1_Click);

//

// label1

//

this.label1.AutoSize = true;

this.label1.Location = new System.Drawing.Point(13, 43);

this.label1.Name = "label1";

this.label1.Size = new System.Drawing.Size(61, 13);

this.label1.TabIndex = 1;

this.label1.Text = "Operations:";

//

// label2

//

this.label2.AutoSize = true;

this.label2.Location = new System.Drawing.Point(13, 63);

this.label2.Name = "label2";

this.label2.Size = new System.Drawing.Size(106, 13);

this.label2.TabIndex = 2;

this.label2.Text = "Duration Parameters:";

//

// label3

//

this.label3.AutoSize = true;

this.label3.Location = new System.Drawing.Point(24, 80);

this.label3.Name = "label3";

this.label3.Size = new System.Drawing.Size(51, 13);

this.label3.TabIndex = 3;

this.label3.Text = "Minimum:";

//

// label4

//

this.label4.AutoSize = true;

this.label4.Location = new System.Drawing.Point(24, 105);

this.label4.Name = "label4";

this.label4.Size = new System.Drawing.Size(54, 13);

this.label4.TabIndex = 4;

this.label4.Text = "Maximum:";

//

// tbxOperations

//

this.tbxOperations.Location = new System.Drawing.Point(80, 40);

this.tbxOperations.Name = "tbxOperations";

this.tbxOperations.Size = new System.Drawing.Size(60, 20);

this.tbxOperations.TabIndex = 5;

//

// tbxTimeMin

//

this.tbxTimeMin.Location = new System.Drawing.Point(80, 77);

this.tbxTimeMin.Name = "tbxTimeMin";

this.tbxTimeMin.Size = new System.Drawing.Size(60, 20);

this.tbxTimeMin.TabIndex = 5;

//

// tbxTimeMax

//

this.tbxTimeMax.Location = new System.Drawing.Point(80, 102);

this.tbxTimeMax.Name = "tbxTimeMax";

this.tbxTimeMax.Size = new System.Drawing.Size(60, 20);

this.tbxTimeMax.TabIndex = 5;

//

// tableLayoutPanel1

//

this.tableLayoutPanel1.Anchor =
((System.Windows.Forms.AnchorStyles)((((System.Win dows.Forms.AnchorStyles.Top
| System.Windows.Forms.AnchorStyles.Bottom)

| System.Windows.Forms.AnchorStyles.Left)

| System.Windows.Forms.AnchorStyles.Right)));

this.tableLayoutPanel1.AutoScroll = true;

this.tableLayoutPanel1.ColumnCount = 1;

this.tableLayoutPanel1.ColumnStyles.Add(new
System.Windows.Forms.ColumnStyle(System.Windows.Fo rms.SizeType.Percent,
100F));

this.tableLayoutPanel1.Location = new
System.Drawing.Point(146, 12);

this.tableLayoutPanel1.Name = "tableLayoutPanel1";

this.tableLayoutPanel1.RowCount = 1;

this.tableLayoutPanel1.RowStyles.Add(new
System.Windows.Forms.RowStyle());

this.tableLayoutPanel1.Size = new System.Drawing.Size(452, 407);

this.tableLayoutPanel1.TabIndex = 6;

//

// Form1

//

this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);

this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;

this.ClientSize = new System.Drawing.Size(610, 431);

this.Controls.Add(this.tableLayoutPanel1);

this.Controls.Add(this.tbxTimeMax);

this.Controls.Add(this.tbxTimeMin);

this.Controls.Add(this.tbxOperations);

this.Controls.Add(this.label4);

this.Controls.Add(this.label3);

this.Controls.Add(this.label2);

this.Controls.Add(this.label1);

this.Controls.Add(this.button1);

this.Name = "Form1";

this.Text = "Form1";

this.ResumeLayout(false);

this.PerformLayout();

}

#endregion

private System.Windows.Forms.Button button1;

private System.Windows.Forms.Label label1;

private System.Windows.Forms.Label label2;

private System.Windows.Forms.Label label3;

private System.Windows.Forms.Label label4;

private System.Windows.Forms.TextBox tbxOperations;

private System.Windows.Forms.TextBox tbxTimeMin;

private System.Windows.Forms.TextBox tbxTimeMax;

private System.Windows.Forms.TableLayoutPanel tableLayoutPanel1;

}

// A convenience control, containing a label and an actual progress bar.

// By wrapping those in a single UserControl, they can be managed
as a single

// unit to the TableLayoutPanel in the main form.

public class ItemStatus : UserControl

{

// This class is designed to be a viewer of a DownloadItem instance.

// You need one before creating an instance of this class.

public ItemStatus(DownloadItem item, string strName)

{

InitializeComponent();

_item = item;

lblItemName.Text = strName;

_item.ProgressChanged += _ProgressChangedHandler;

_item.Done += _DoneHandler;

}

private DownloadItem _item;

// Event handlers for the DownloadItem events. Very simple:
when progress

// has changed, update the ProgressBar instance in this
control, and when

// the processing is done, remove this control instance from
its container.

private void _ProgressChangedHandler(object sender,
DownloadItem.ProgressChangedEventArgs e)

{

this.BeginInvoke((MethodInvoker)delegate()

{

progressBar.Value =

progressBar.Minimum +

(int)(e.PercentDone * (progressBar.Maximum -
progressBar.Minimum));

});

}

private void _DoneHandler(object sender, EventArgs e)

{

Control ctlParent = this.Parent;

DownloadItem item = (DownloadItem)sender;

item.ProgressChanged -= _ProgressChangedHandler;

this.BeginInvoke((MethodInvoker)delegate()

{

ctlParent.Controls.Remove(this);

this.Dispose();

});

}

/// <summary>

/// Required designer variable.

/// </summary>

private System.ComponentModel.IContainer components = null;

/// <summary>

/// Clean up any resources being used.

/// </summary>

/// <param name="disposing">true if managed resources should be
disposed; otherwise, false.</param>

protected override void Dispose(bool disposing)

{

if (disposing && (components != null))

{

components.Dispose();

}

base.Dispose(disposing);

}

#region Component Designer generated code

/// <summary>

/// Required method for Designer support - do not modify

/// the contents of this method with the code editor.

/// </summary>

private void InitializeComponent()

{

this.lblItemName = new System.Windows.Forms.Label();

this.progressBar = new System.Windows.Forms.ProgressBar();

this.SuspendLayout();

//

// lblItemName

//

this.lblItemName.AutoSize = true;

this.lblItemName.Location = new System.Drawing.Point(4, 4);

this.lblItemName.Name = "lblItemName";

this.lblItemName.Size = new System.Drawing.Size(35, 13);

this.lblItemName.TabIndex = 0;

this.lblItemName.Text = "label1";

//

// progressBar

//

this.progressBar.Anchor =
((System.Windows.Forms.AnchorStyles)((((System.Win dows.Forms.AnchorStyles.Top
| System.Windows.Forms.AnchorStyles.Bottom)

| System.Windows.Forms.AnchorStyles.Left)

| System.Windows.Forms.AnchorStyles.Right)));

this.progressBar.Location = new System.Drawing.Point(7, 21);

this.progressBar.Name = "progressBar";

this.progressBar.Size = new System.Drawing.Size(354, 23);

this.progressBar.Style =
System.Windows.Forms.ProgressBarStyle.Continuous;

this.progressBar.TabIndex = 1;

//

// ItemStatus

//

this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);

this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;

this.Controls.Add(this.progressBar);

this.Controls.Add(this.lblItemName);

this.Name = "ItemStatus";

this.Size = new System.Drawing.Size(368, 52);

this.ResumeLayout(false);

this.PerformLayout();

}

#endregion

private System.Windows.Forms.Label lblItemName;

private System.Windows.Forms.ProgressBar progressBar;

}

// And of course the main entry point.

static class Program

{

/// <summary>

/// The main entry point for the application.

/// </summary>

[STAThread]

static void Main()

{

Application.EnableVisualStyles();

Application.SetCompatibleTextRenderingDefault(fals e);

Application.Run(new Form1());

}

}

}

Nov 30 '07 #13

jehugaleahsa

On Nov 29, 5:00 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.comwrote:

On 2007-11-28 11:00:17 -0800, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.comsaid:

[...]
For fun, I'm thinking about working on a simple download simulation
that uses a queue to manage the downloads. If and when it's finished,
I'll post the code here in case you or anyone else is interested.
Might not be done today, as I've got a busy day, but maybe tomorrow.

Hi. I finished the simulation. It effectively demonstrates what I'm
talking about with respect to using a queue. You'll see that the queue
class itself is _very_ simple, and yet it contains everything you need
to implement the basic functionality you've described.

It doesn't provide dynamic resizing of the number of allowed concurrent
operations, nor of cancelling an existing operation (whether it's
started working or not). Those things would not be difficult to add
though. The basic logic is easily extended to handle those scenarios.

The application itself is a GUI application. It wasn't strictly
required for the demonstration, but it makes for an easy-to-control
mechanism for adding new work items and providing user feedback for how
those work items are processed.

It's funny...even though I know the program isn't doing any actual
work, there is something oddly satisfying about watching all those
progress bars work their way toward completion. I keep wanting to add
more work items, so I can see more progress bars finish. :)

Anyway, I'm copying the code here (see below). You can create a new,
empty project, add a new source file to the project and copy all of
this verbatim into that single file. You'll need to add references to
System, System.Drawing, and System.Windows.Forms to the project. Then
it should just compile and run.

Enjoy! I tried to provide sufficient comments for the classes and code
to explain the details, bu please feel free to post specific questions
if it's not all clear.

Pete

using System;

using System.Collections.Generic;

using System.Windows.Forms;

using System.Threading;

namespace TestMultiDownloadQueue

{

// DummyAsync is probably the most complicated class here, and the

// least interesting. It only exists to provide a class with an

// async API similar to those available from classes like

// HttpWebRequest, Socket, Stream, and TcpClient but without

// requiring any actual work to be done.

//

// It has the basic "Begin" and "End" methods to start and complete

// a simulated async operation. The constructor is passed a TimeSpan

// that indicates the total time that should be consumed by the

// simulated operation. The DummyAsync class decides how to divide the

// total time into partial operations, and each call to the "Begin"

// method will consume one interval of the total time. In that way,

// it simulates an async operation that takes a certain amount of time

// to complete and which requires multiple async calls to the same

// instance to finish all of the work.

class DummyAsync

{

// Total duration of this simulated operation

private TimeSpan _tsDuration;

// Duration of a single simulated processing unit

// (i.e. one call to BeginDummy()

private TimeSpan _tsInterval;

// A list of all of the current async operations for

// this instance

private List<AsyncInfo_rgai = new List<AsyncInfo>();

// The callback delegate that the client of this class must

// implement.

public delegate void DummyCallback(IAsyncInfo iai);

// Could've used IAsyncResult, but didn't want to bother with

// supporting the full IAsyncResult model (with a wait handle, etc.)

public interface IAsyncInfo

{

object AsyncState { get; }

}

// An actual async state class. Contains the context information

// for the client, as well as this class's own internal state
information

private class AsyncInfo : IAsyncInfo

{

// I don't normally make public fields, but for this very simple

// class in a very specific test harness, I decided to not bother

// wrapping them in properties.

public DummyCallback callback;

public object context;

public TimeSpan ts;

// Stores a one-shot timer that will call us back when the

// simulated processing is done

private System.Threading.Timer _timer;

public AsyncInfo(DummyCallback callback, object context)

{

this.callback = callback;

this.context = context;

}

public void Start(TimeSpan ts, TimerCallback callback)

{

this.ts = ts;

_timer = new System.Threading.Timer(callback, this,
this.ts, TimeSpan.Zero);

}

#region IAsyncInfo Members

public object AsyncState

{

get { return context; }

}

#endregion

}

public DummyAsync(TimeSpan tsDuration)

{

_tsDuration = tsDuration;

_tsInterval = new TimeSpan(_tsDuration.Ticks / 100);

// No need to abuse our timers

if (_tsInterval < new TimeSpan(0, 0, 0, 0, 100))

{

_tsInterval = new TimeSpan(0, 0, 0, 0, 100);

}

}

// When the client calls this method, we create a new state object,

// tell it to start a timer, update our duration to indicate how

// much additional processing might be required, and save the new

// state object to our list of current async operations,

public void BeginDummy(DummyCallback callback, object context)

{

AsyncInfo ai = new AsyncInfo(callback, context);

ai.Start(new TimeSpan(Math.Min(_tsDuration.Ticks,
_tsInterval.Ticks)), _TimerCallback);

_tsDuration = new TimeSpan(Math.Max(0, _tsDuration.Ticks -
_tsInterval.Ticks));

_rgai.Add(ai);

}

// The client must call this method after being notified of the

// completion of our simulated async operation via the callback.

// This method removes the state object from our list of current

// async operations and returns to the user the time spent on

// the simulated processing, representing some amount of progress

// toward completion of the simulated operaton.

public TimeSpan EndDummy(IAsyncInfo iai)

{

AsyncInfo ai = (AsyncInfo)iai;

_rgai.Remove(ai);

return ai.ts;

}

// Our _own_ callback method, used for the timer that is used to

// simulate the work. Not to be confused with the client's callback,

// though the client's callback is in fact called from here.

private void _TimerCallback(object context)

{

AsyncInfo ai = (AsyncInfo)context;

ai.callback(ai);

}

}

// DownloadItem represents a single work item that the controller

// class managing the downloads knows about. For the manager's

// benefit, it implements a specific interface that the manager

// requires. The rest of the class is specific to our simulated work,

// but it would trivial to replace that code with code that uses

// some class that does actual work via async methods.

public class DownloadItem : DownloadManager.IDownloadItem

{

// The ProgressChanged event is not required for the demonstration

// of the queue, but is useful for providing user feedback regarding

// the progress of the operation. A real-world download object

// would likely have something similar, whether explicitly in its

// own class or implemented already by a class it uses (like

// BackgroundWorker).

//

// This ProgressChanged event, unlike the one in BackgroundWorker,

// provides a real number as the percentage information, allowing

// finer granularity than 1% intervals. The float ranges from 0.0

// to 1.0, so strictly speaking it's not a "percent" value. But

// the name is still reasonably suggestive as to the relationship

// between the value and the progress.

#region ProgressChanged event members

public class ProgressChangedEventArgs : EventArgs

{

private float _percentDone;

public ProgressChangedEventArgs(float percentDone)

{

_percentDone = percentDone;

}

public float PercentDone

{

get { return _percentDone; }

}

}

public delegate void ProgressChangedEventHandler(object sender,
ProgressChangedEventArgs e);

public event ProgressChangedEventHandler ProgressChanged;

#endregion

public DownloadItem(TimeSpan tsDuration)

{

_tsDuration = tsDuration;

}

#region IDownloadItem Members

// Even in a real-world async operation object, the Process() method

// could be nearly as simple as this. This class takes a duration in

// construction, whereas a real-world class might take some kind of web

// address object or a string that represents one. This class

// creates an instance of our DummyAsync class, whereas a real-world

// class might create an HttpWebRequest instance. Finally, this class

// calls the "Begin" method for the async API of the DummyAsync class,

// whereas a real-world class would call the "Begin" method of whatever

// async-capable class it's using.
...

read more >>

I like it. I am going to play with it right now. Watching progress
bars is fun. :-)

Nov 30 '07 #14

jehugaleahsa

On Nov 29, 7:47 am, "Ben Voigt [C++ MVP]" <r...@nospam.nospamwrote:

<jehugalea...@gmail.comwrote in message

news:15**********************************@o42g2000 hsc.googlegroups.com...

On Nov 28, 3:07 pm, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
With Semaphores for example:

public class DownloadManager
{
WebExtractor extractor = new WebExtractor(/* Arguments */);
bool cancelled = false;
object cancelSync = new object();
Semaphore semaphore = new Semaphore(5, 5);

public void DownloadFiles()
{
// BEGIN THREAD
foreach (Download download in extractor.Start()) //
WebExtractor.Start yield returns

//
Downloads as they are found.
{
// add event handlers
semaphore.WaitOne();
download.StatusChanged += new
StatusChangedEventArgs(status_Changed);
download.Start();
lock (cancelSync)
{
if (cancelled)
{
break;
}
}
}
// END THREAD
}

private void status_Changed(object sender, StatusChangedEventArgs
e)
{
if (e.Status == DownloadStatus.Complete)
{
semaphore.Release();
}
}

public void Cancel()
{
// BEGIN THREAD
lock (cancelSync)
{
cancelled = true;
}
// END THREAD
}

Actually, it appears that using Semaphores with WebClient is a no-no.

That surprises me.

I thought you might have some issues with the spidering/page parsing not
running until there is a download slot available, and the code you posted
clearly won't cancel the spider until one of the downloads completes
(perhaps you can cancel each download somehow).

What exactly is going wrong? Does it help to use BeginInvoke to perform the
download from a thread other than the one holding the semaphore?- Hide quoted text -

- Show quoted text -

I actually got the Semaphore to work this evening. For some unknown
reason, my download progress seemed to never fire. I am not sure what
it was, but I removed a little piece of code that checked the file
size before downloading and it worked great.

So, yeah, it is okay to use WebClient and Semphore. It had something
to do with using HttpWebRequest and WebClient. The WebRequest would
hang indefinitely unless I set my Timeout to be some small number,
like 2 or 3. Even 20, which I think is in Milliseconds, totally hung
the system. I will have to investigate it later. However, your
solution worked very well. I now have 5 concurrent downloads and they
all update the interface individually. I am really happy.

Thank you for your suggestion of Semphores!

Nov 30 '07 #15

foreach, IEnumerable and modifying contents

Similar topics