By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,417 Members | 1,845 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,417 IT Pros & Developers. It's quick & easy.

pushing the envelope with sockets

P: n/a
I've got a client/server app that I used to send large amounts of data via
UDP to the client. We use it in various scenarios, one of which includes
rendering a media file on the client as it is transferred via the underlying
UDP transport. In this scenario it is very important to keep CPU usage as
low as possible.

Both the client and server were originally written in C++, but I've
re-written the client in C#, partly to simplify it, but also just to see if
C# is up to the task. I can come pretty close to keeping up in terms of
throughput (within tolerable limits), but I'm finding CPU usage is an issue
at times.

I'm using asynchronous I/O, and in the receive handler I issue another
BeginReceiveFrom immediately in order to have an I/O ready to receive as
quickly as possible. This works very well, but I find that my CPU usage will
suddenly (and apparently randomly) increase dramatically (ie, from < 30% to
about 60% on average). I've added performance counters, and identified that
the majority of the increase in CPU time is spent in the BeginReceiveFrom
call. This is surprising to me for a couple of reasons. First, this is a
non-blocking call, so I would expect it to return quickly whether data is
received or not. I also don't see any corresponding increase/decrease in
either the number packets received per second or the number of reads I
complete per second (which tracks very closely to UDP packets/second), so I
don't believe the number of BeginReceiveFrom calls I'm making is changing
commensurately.

So my question is, is there anything in the Socket implementation that might
be causing this unexpected increase in CPU time? I've looked at
BeginReceiveFrom with .Net Reflector, and I'm wondering if perhaps the call
to ThreadPool.RegisterWaitForSingleObject might be stalling. I'm issuing on
the order of about 3500 I/Os per second, but I don't see my ThreadPool
availability being impinged. Any ideas? Thanks.
May 23 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a


Dan Ritchie wrote:
I'm using asynchronous I/O, and in the receive handler I issue another
BeginReceiveFrom immediately in order to have an I/O ready to receive as
quickly as possible. This works very well, but I find that my CPU usage will
Async IO can potentially avoid spending threads in massively parallel
applications, but why would it reduce CPU-load?
suddenly (and apparently randomly) increase dramatically (ie, from < 30% to
about 60% on average). I've added performance counters, and identified that
During IO, or changing from blocking to non-blocking IO?

30 vs. 60% of how long? comparing relative CPU-usage doesn't make much
sense if it's not known what it's relative to.

Try timing:

- receiving the way you do now
- receiving into a large buffer, but less often -- while still not
exceeding the OS-buffer-size, something like:

byte[] large_buf = ...;
while ( true ) {
Receive(large_buf, ...);
Thread.Sleep(TimeSpan.FromSeconds(0.1));
}

Which spends more cpu? (it's not the same as which completes first!)
the majority of the increase in CPU time is spent in the BeginReceiveFrom
call. This is surprising to me for a couple of reasons. First, this is a
non-blocking call, so I would expect it to return quickly whether data is
received or not. I also don't see any corresponding increase/decrease in
BeginReceiveFrom may complete synchronously, or it may need to queue in
an IO-completion-port for the next receive. The latter will probably
spend more or less CPU (although I haven't measured it).

Have you tried reading synchronously with a very large buffer, just to
see how that measures up to your async-IO?

If you re-issue BeginReceive immidiatly you must be allocating a new
receive-buffer for each receive. Perhaps a less CPU-intensive approach
would be to only have one buffer and process that before re-issuing receive?
either the number packets received per second or the number of reads I
complete per second (which tracks very closely to UDP packets/second), so I
don't believe the number of BeginReceiveFrom calls I'm making is changing
commensurately.
You may be seeing switch-overhead. The faster you BeginReceive, the
fewer packages the OS will have queued up for you and the more calls to
Begin/End-Receive will be executed.
So my question is, is there anything in the Socket implementation that might
be causing this unexpected increase in CPU time? I've looked at
BeginReceiveFrom with .Net Reflector, and I'm wondering if perhaps the call
to ThreadPool.RegisterWaitForSingleObject might be stalling. I'm issuing on
the order of about 3500 I/Os per second, but I don't see my ThreadPool
availability being impinged. Any ideas? Thanks.


I would try reading in large chunks instead of small ones if low
cpu-usage is the goal, at least just to see if there's a difference.

It's another matter if low latency or efficient usage of the
number-of-threads resource is required than if you are trying to
minimize CPU-usage.

--
Helge
May 23 '06 #2

P: n/a
Thanks for the reply (comments below):

"Helge Jensen" wrote:


Dan Ritchie wrote:
I'm using asynchronous I/O, and in the receive handler I issue another
BeginReceiveFrom immediately in order to have an I/O ready to receive as
quickly as possible. This works very well, but I find that my CPU usage will
Async IO can potentially avoid spending threads in massively parallel
applications, but why would it reduce CPU-load?


It wouldn't, but CPU is not the ONLY issue. With UDP, if an I/O has not
been issued when a packet is received the packet is dropped. Using async I/O
and issuing subsequent reads immediately in the handler is the only way I was
able to keep up with the amount of data coming in (~30 Mbps in ~1Kb packets).
suddenly (and apparently randomly) increase dramatically (ie, from < 30% to
about 60% on average). I've added performance counters, and identified that
During IO, or changing from blocking to non-blocking IO?


This is overall CPU usage. All I/O is asynchronous (non-blocking). The
point here is that without any apparent change in either the number of UDP
packets received or the number of reads being completed, CPU usage suddenly
jumps after holding steady for some time.

30 vs. 60% of how long? comparing relative CPU-usage doesn't make much
sense if it's not known what it's relative to.
30% or 60% of available CPU (ie, out of 100% for a given time slice in
perfmon). CPU usage holds steady at under 30% on average for, say about 30
seconds, then jumps to 60% on average and stays that way. Again, the point
is there is nothing that changes in the load to explain the additional usage.

Try timing:

- receiving the way you do now
- receiving into a large buffer, but less often -- while still not
exceeding the OS-buffer-size, something like:

byte[] large_buf = ...;
while ( true ) {
Receive(large_buf, ...);
Thread.Sleep(TimeSpan.FromSeconds(0.1));
}

Which spends more cpu? (it's not the same as which completes first!)
the majority of the increase in CPU time is spent in the BeginReceiveFrom
call. This is surprising to me for a couple of reasons. First, this is a
non-blocking call, so I would expect it to return quickly whether data is
received or not. I also don't see any corresponding increase/decrease in
BeginReceiveFrom may complete synchronously, or it may need to queue in
an IO-completion-port for the next receive. The latter will probably
spend more or less CPU (although I haven't measured it).


Either way the call itself should not block. They always complete
asynchronously (ie, the handler is invoked), although it is possible for the
handler to be invoked on the calling thread.

Have you tried reading synchronously with a very large buffer, just to
see how that measures up to your async-IO?
Yes. Async I/O does better. It's the only way I can keep up with the send
rate.

If you re-issue BeginReceive immidiatly you must be allocating a new
receive-buffer for each receive. Perhaps a less CPU-intensive approach
would be to only have one buffer and process that before re-issuing receive?
See above, I must re-issue I/Os quickly in order to avoid UDP packet loss.
And again, the CPU usage is not consistent throughout the run (in fact it
changes suddenly after remaining steady) despite the fact that the I/O load
is consistent. Through measurement, I've determined that the vast majority
of the additional CPU time is spent inside BeginReceiveFrom, not in
allocating buffers, etc. And if you're wondering, the amount of time it
spends in garbage collection remains consistent as well. Before the jump in
CPU usage, the amount of time spent in BeginReceiveFrom is a small percentage
of the total CPU usage. When the total CPU usage jumps, the amount of time
spent in BeginReceiveFrom becomes a significant portion (about half) of the
total CPU usage.
either the number packets received per second or the number of reads I
complete per second (which tracks very closely to UDP packets/second), so I
don't believe the number of BeginReceiveFrom calls I'm making is changing
commensurately.
You may be seeing switch-overhead. The faster you BeginReceive, the
fewer packages the OS will have queued up for you and the more calls to
Begin/End-Receive will be executed.


There's no queueing here beyond issuing the next I/O. That's because the
next I/O isn't issued until the previous one completes (the handler issues
the next I/O).
So my question is, is there anything in the Socket implementation that might
be causing this unexpected increase in CPU time? I've looked at
BeginReceiveFrom with .Net Reflector, and I'm wondering if perhaps the call
to ThreadPool.RegisterWaitForSingleObject might be stalling. I'm issuing on
the order of about 3500 I/Os per second, but I don't see my ThreadPool
availability being impinged. Any ideas? Thanks.


I would try reading in large chunks instead of small ones if low
cpu-usage is the goal, at least just to see if there's a difference.

It's another matter if low latency or efficient usage of the
number-of-threads resource is required than if you are trying to
minimize CPU-usage.

--
Helge

May 23 '06 #3

P: n/a


"Helge Jensen" wrote:


Dan Ritchie wrote:
Thanks for the reply (comments below):

It wouldn't, but CPU is not the ONLY issue. With UDP, if an I/O has not
been issued when a packet is received the packet is dropped. Using async I/O
and issuing subsequent reads immediately in the handler is the only way I was
able to keep up with the amount of data coming in (~30 Mbps in ~1Kb packets).


Then you must be doing some handling before issuing the next read?

I still don't understand why you expect async-io to be able to *receive*
faster than sync-io.

The OS *does* buffer UDP-pacakges, they are not discarded if no-one is
receiving on the socket they are received (upto the buffer size), you
can even change what happens if the buffer runs full. If
your data comes in "too fast", probably the simplest way to increase
performance is to make the buffer larger via Socket.ReceiveBufferSize.

I also don't understand how async-io should be able to resume listening
faster than sync io.

The following code is cleaned from non-IO. Can you suggest to me where
you dispatch to processing the input, and why using async-IO would be
better than doing the *processing* of the input asynchonously.

(new_buffer may be freshly allocated, or reused from a pool or whatnot.

async:

void continueRead(IAsyncResult r)
{
int read = S.EndReceive(r);
if ( read != 0 )
S.BeginReceive(new_buffer, 0, new_buffer.Length,
SocketFlags.None, continueRead, new_buffer);
// process(read)
}

sync:

int read;
do
{
read = s.Receive(new_buffer);
// AsyncProcess(read)
} while (read != 0);

30% or 60% of available CPU (ie, out of 100% for a given time slice in
perfmon). CPU usage holds steady at under 30% on average for, say about 30
seconds, then jumps to 60% on average and stays that way. Again, the point
is there is nothing that changes in the load to explain the additional usage.


Okay, so during *one* run, the cpu spent increases when input-data is
excepted to be considered equivalent wrt. processing.
the majority of the increase in CPU time is spent in the BeginReceiveFrom
call. This is surprising to me for a couple of reasons. First, this is a
non-blocking call, so I would expect it to return quickly whether data is
received or not. I also don't see any corresponding increase/decrease in
Are you using async-IO on non-blocking sockets? that is effectively a
busy loop, which will consume all CPU if no input is available.
non-blocking sockets and IO-completion-ports isn't a good combination.

That hypothesis could fit with your observations. After your
code has been able to catch-up the OS-buffered data it starts burning
CPU by busy-waiting when no data is available. How much CPU is spent
when *no* data is received? just to confirm you are not having that problem.
BeginReceiveFrom may complete synchronously, or it may need to queue in
an IO-completion-port for the next receive. The latter will probably
spend more or less CPU (although I haven't measured it).


Either way the call itself should not block. They always complete


I will assume you are using blocking sockets here, since async-io and
non-blocking sockets is not a good combination.

If data is available synchonously the call to BeginReceive may fill the
buffer using the invoking thread. You can check
IAsyncResult.CompletedSynchonously.

If no data is available in the OS-buffer BeginReceive will need to begin
a new IO-completion port, update some data-structures... This *may*
require additional processing, I don't know for sure.

An async-IO operation *may* be able to receive the data directly into
it's allocated buffer. I don't known for sure.

However, I suspect that the increased cpu-usage is due to
context-switching a lot more when the OS-buffer is empty when you issue
your receive.
Have you tried reading synchronously with a very large buffer, just to
see how that measures up to your async-IO?


Yes. Async I/O does better. It's the only way I can keep up with the send
rate.


That's pretty weird, 30Mbps should be easily received on a modestly
modern machine by either sync or async IO.
If you re-issue BeginReceive immidiatly you must be allocating a new
receive-buffer for each receive. Perhaps a less CPU-intensive approach
would be to only have one buffer and process that before re-issuing receive?


See above, I must re-issue I/Os quickly in order to avoid UDP packet loss.


You could try increasing the buffer-size.
And again, the CPU usage is not consistent throughout the run (in fact it
changes suddenly after remaining steady) despite the fact that the I/O load
is consistent. Through measurement, I've determined that the vast majority
of the additional CPU time is spent inside BeginReceiveFrom, not in
allocating buffers, etc. And if you're wondering, the amount of time it


That's exactly why i suspect that BeginReceive is more expensive when it
cannot complete from the buffer.
You may be seeing switch-overhead. The faster you BeginReceive, the
fewer packages the OS will have queued up for you and the more calls to
Begin/End-Receive will be executed.


There's no queueing here beyond issuing the next I/O. That's because the
next I/O isn't issued until the previous one completes (the handler issues
the next I/O).


But there will still (possibly) be additional context-switching if no
data is ready. If it were me I would try to prove that context-switching
isn't the problem is a simple test application that does no processing
of the received data at all.

I have attached a test-program which explores async vs. sync IO. It's
not to be trusted 100% as it all runs locally, but I think it does
support my views enough to justify further arguments for using async IO.

The output of a run of the the program on my machine is also attached.

--
Helge

May 25 '06 #4

P: n/a
Sorry Helge, ignore my previous post (no additional content). I wrote a
rather lengthy reply, during which time my authentication expired and
apparently the wrong data was posted.

Anyway, I am using .Net 1.1, and from what I can see, UDP packets are not
buffered in 1.1. The Socket.ReceiveBufferSize property you mention is new to
2.0, so it would seem 2.0 does indeed buffer UDP. I have been thinking about
trying 2.0 to solve the problem, and even if the same problem persists, it
might then be possible to fix it with syncronous I/O.
"Helge Jensen" wrote:


Dan Ritchie wrote:
Thanks for the reply (comments below):

It wouldn't, but CPU is not the ONLY issue. With UDP, if an I/O has not
been issued when a packet is received the packet is dropped. Using async I/O
and issuing subsequent reads immediately in the handler is the only way I was
able to keep up with the amount of data coming in (~30 Mbps in ~1Kb packets).


Then you must be doing some handling before issuing the next read?

I still don't understand why you expect async-io to be able to *receive*
faster than sync-io.

The OS *does* buffer UDP-pacakges, they are not discarded if no-one is
receiving on the socket they are received (upto the buffer size), you
can even change what happens if the buffer runs full. If
your data comes in "too fast", probably the simplest way to increase
performance is to make the buffer larger via Socket.ReceiveBufferSize.

I also don't understand how async-io should be able to resume listening
faster than sync io.

The following code is cleaned from non-IO. Can you suggest to me where
you dispatch to processing the input, and why using async-IO would be
better than doing the *processing* of the input asynchonously.

(new_buffer may be freshly allocated, or reused from a pool or whatnot.

async:

void continueRead(IAsyncResult r)
{
int read = S.EndReceive(r);
if ( read != 0 )
S.BeginReceive(new_buffer, 0, new_buffer.Length,
SocketFlags.None, continueRead, new_buffer);
// process(read)
}

sync:

int read;
do
{
read = s.Receive(new_buffer);
// AsyncProcess(read)
} while (read != 0);

30% or 60% of available CPU (ie, out of 100% for a given time slice in
perfmon). CPU usage holds steady at under 30% on average for, say about 30
seconds, then jumps to 60% on average and stays that way. Again, the point
is there is nothing that changes in the load to explain the additional usage.


Okay, so during *one* run, the cpu spent increases when input-data is
excepted to be considered equivalent wrt. processing.
the majority of the increase in CPU time is spent in the BeginReceiveFrom
call. This is surprising to me for a couple of reasons. First, this is a
non-blocking call, so I would expect it to return quickly whether data is
received or not. I also don't see any corresponding increase/decrease in
Are you using async-IO on non-blocking sockets? that is effectively a
busy loop, which will consume all CPU if no input is available.
non-blocking sockets and IO-completion-ports isn't a good combination.

That hypothesis could fit with your observations. After your
code has been able to catch-up the OS-buffered data it starts burning
CPU by busy-waiting when no data is available. How much CPU is spent
when *no* data is received? just to confirm you are not having that problem.
BeginReceiveFrom may complete synchronously, or it may need to queue in
an IO-completion-port for the next receive. The latter will probably
spend more or less CPU (although I haven't measured it).


Either way the call itself should not block. They always complete


I will assume you are using blocking sockets here, since async-io and
non-blocking sockets is not a good combination.

If data is available synchonously the call to BeginReceive may fill the
buffer using the invoking thread. You can check
IAsyncResult.CompletedSynchonously.

If no data is available in the OS-buffer BeginReceive will need to begin
a new IO-completion port, update some data-structures... This *may*
require additional processing, I don't know for sure.

An async-IO operation *may* be able to receive the data directly into
it's allocated buffer. I don't known for sure.

However, I suspect that the increased cpu-usage is due to
context-switching a lot more when the OS-buffer is empty when you issue
your receive.
Have you tried reading synchronously with a very large buffer, just to
see how that measures up to your async-IO?


Yes. Async I/O does better. It's the only way I can keep up with the send
rate.


That's pretty weird, 30Mbps should be easily received on a modestly
modern machine by either sync or async IO.
If you re-issue BeginReceive immidiatly you must be allocating a new
receive-buffer for each receive. Perhaps a less CPU-intensive approach
would be to only have one buffer and process that before re-issuing receive?


See above, I must re-issue I/Os quickly in order to avoid UDP packet loss.


You could try increasing the buffer-size.
And again, the CPU usage is not consistent throughout the run (in fact it
changes suddenly after remaining steady) despite the fact that the I/O load
is consistent. Through measurement, I've determined that the vast majority
of the additional CPU time is spent inside BeginReceiveFrom, not in
allocating buffers, etc. And if you're wondering, the amount of time it


That's exactly why i suspect that BeginReceive is more expensive when it
cannot complete from the buffer.
You may be seeing switch-overhead. The faster you BeginReceive, the
fewer packages the OS will have queued up for you and the more calls to
Begin/End-Receive will be executed.


There's no queueing here beyond issuing the next I/O. That's because the
next I/O isn't issued until the previous one completes (the handler issues
the next I/O).


But there will still (possibly) be additional context-switching if no
data is ready. If it were me I would try to prove that context-switching
isn't the problem is a simple test application that does no processing
of the received data at all.

I have attached a test-program which explores async vs. sync IO. It's
not to be trusted 100% as it all runs locally, but I think it does
support my views enough to justify further arguments for using async IO.

The output of a run of the the program on my machine is also attached.

--
Helge

May 25 '06 #5

P: n/a
Dan Ritchie wrote:
Sorry Helge, ignore my previous post (no additional content). I wrote a
rather lengthy reply, during which time my authentication expired and
apparently the wrong data was posted.
That sometimes happens to me too when using web-based interfaces. I use
a real NNTP-client, Thunderbird for USENET though.
Anyway, I am using .Net 1.1, and from what I can see, UDP packets are not
buffered in 1.1. The Socket.ReceiveBufferSize property you mention is new to
2.0, so it would seem 2.0 does indeed buffer UDP. I have been thinking about
trying 2.0 to solve the problem, and even if the same problem persists, it
might then be possible to fix it with syncronous I/O.


It's not the runtime that buffers IO on sockets, it's the OS. You can
use Socket.SetSocketOptions to achieve the same effect (which i'll bet
ReceiveBufferSize is a convenience wrapper for).

I'm still not convinced that async IO will help you achieve less
package-drops. Perhaps you are seeing a side-effect of the restructuring
of your code applied to fit it to async-IO?

You could try doing you I/O synchronously, and the processing of it
asynchronously, for example using the producer/consumer pattern.

If I were you, I would try writing a small test-program, less than 1-200
lines, which receives and "mock-up" processes the received, and do some
experimentation. You could try and post that small program and maybe
someone will have some comments. It's a lot harder to discuss precisely
when there is no concrete example.

--
Helge
May 25 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.