Problem with writing fast UDP server

Krzysztof Retel

Hi guys,

I am struggling writing fast UDP server. It has to handle around 10000
UDP packets per second. I started building that with non blocking
socket and threads. Unfortunately my approach does not work at all.
I wrote a simple case test: client and server. The client sends 2200
packets within 0.137447118759 secs. The tcpdump received 2189 packets,
which is not bad at all.
But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 – 700 received with blocking sockets.
The client and the server are working within the same local network
and tcpdump shows pretty correct amount of packets received.

I included a bit of the code of the UDP server.

class PacketReceive(threading.Thread):
def __init__(self, tname, socket, queue):
self._tname = tname
self._socket = socket
self._queue = queue
threading.Thread.__init__(self, name=self._tname)

def run(self):
print 'Started thread: ', self.getName()
cnt = 1
cnt_msgs = 0
while True:
try:
data = self._socket.recv(512)
msg = data
cnt_msgs += 1
total += 1
# self._queue.put(msg)
print 'thread: %s, cnt_msgs: %d' % (self.getName(),
cnt_msgs)
except:
pass
I was also using Queue, but this didn't help neither.
Any idea what I am doing wrong?

I was reading that Python socket modules was causing some delays with
TCP server. They recomended to set up socket option for nondelays:
"sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) ". I couldn't find any
similar option for UDP type sockets.
Is there anything I have to change in socket options to make it
working faster?
Why the server can't process all incomming packets? Is there a bug in
the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10.

Cheers
K

Nov 20 '08 #1

Subscribe Post Reply

12423

Hrvoje Niksic

Krzysztof Retel <Kr*************@googlemail.comwrites:

But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 â€“ 700 received with blocking sockets.

What are your other threads doing? Have you tried the same code
without any threading?

Nov 20 '08 #2

Krzysztof Retel

On Nov 20, 3:34*pm, Hrvoje Niksic <hnik...@xemacs.orgwrote:

Krzysztof Retel <Krzysztof.Re...@googlemail.comwrites:
But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 – 700 received with blocking sockets.

What are your other threads doing? *Have you tried the same code
without any threading?

I have only this one thread, which I can run couple of times.
I tried without a threading and was the same result, not all packets
were processed.

Nov 20 '08 #3

bieffe62

On 20 Nov, 16:03, Krzysztof Retel <Krzysztof.Re...@googlemail.com>
wrote:

Hi guys,

I am struggling writing fast UDP server. It has to handle around 10000
UDP packets per second. I started building that with non blocking
socket and threads. Unfortunately my approach does not work at all.
I wrote a simple case test: client and server. The client sends 2200
packets within 0.137447118759 secs. The tcpdump received 2189 packets,
which is not bad at all.
But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 – 700 received with blocking sockets.
The client and the server are working within the same local network
and tcpdump shows pretty correct amount of packets received.

I included a bit of the code of the UDP server.

class PacketReceive(threading.Thread):
* * def __init__(self, tname, socket, queue):
* * * * self._tname = tname
* * * * self._socket = socket
* * * * self._queue = queue
* * * * threading.Thread.__init__(self, name=self._tname)

* * def run(self):
* * * * print 'Started thread: ', self.getName()
* * * * cnt = 1
* * * * cnt_msgs = 0
* * * * while True:
* * * * * * try:
* * * * * * * * data = self._socket.recv(512)
* * * * * * * * msg = data
* * * * * * * * cnt_msgs += 1
* * * * * * * * total += 1
* * * * * * * * # self._queue.put(msg)
* * * * * * * * print *'thread: %s, cnt_msgs: %d' % (self.getName(),
cnt_msgs)
* * * * * * except:
* * * * * * * * pass

I was also using Queue, but this didn't help neither.
Any idea what I am doing wrong?

I was reading that Python socket modules was causing some delays with
TCP server. They recomended to set up *socket option for nondelays:
"sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) ". I couldn't find any
similar option for UDP type sockets.
Is there anything I have to change in socket options to make it
working faster?
Why the server can't process all incomming packets? Is there a bug in
the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10.

Cheers
K

Stupid question: did you try removing the print (e.g. printing once
every 100 messages) ?

Ciao
----
FB

Nov 20 '08 #4

Krzysztof Retel

On Nov 20, 4:00*pm, bieff...@gmail.com wrote:

On 20 Nov, 16:03, Krzysztof Retel <Krzysztof.Re...@googlemail.com>
wrote:

Hi guys,

I am struggling writing fast UDP server. It has to handle around 10000
UDP packets per second. I started building that with non blocking
socket and threads. Unfortunately my approach does not work at all.
I wrote a simple case test: client and server. The client sends 2200
packets within 0.137447118759 secs. The tcpdump received 2189 packets,
which is not bad at all.
But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 – 700 received with blocking sockets.
The client and the server are working within the same local network
and tcpdump shows pretty correct amount of packets received.

I included a bit of the code of the UDP server.

class PacketReceive(threading.Thread):
* * def __init__(self, tname, socket, queue):
* * * * self._tname = tname
* * * * self._socket = socket
* * * * self._queue = queue
* * * * threading.Thread.__init__(self, name=self._tname)

* * def run(self):
* * * * print 'Started thread: ', self.getName()
* * * * cnt = 1
* * * * cnt_msgs = 0
* * * * while True:
* * * * * * try:
* * * * * * * * data = self._socket.recv(512)
* * * * * * * * msg = data
* * * * * * * * cnt_msgs += 1
* * * * * * * * total += 1
* * * * * * * * # self._queue.put(msg)
* * * * * * * * print *'thread: %s, cnt_msgs: %d' % (self.getName(),
cnt_msgs)
* * * * * * except:
* * * * * * * * pass

I was also using Queue, but this didn't help neither.
Any idea what I am doing wrong?

I was reading that Python socket modules was causing some delays with
TCP server. They recomended to set up *socket option for nondelays:
"sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) ". I couldn't find any
similar option for UDP type sockets.
Is there anything I have to change in socket options to make it
working faster?
Why the server can't process all incomming packets? Is there a bug in
the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10.

Cheers
K

Stupid question: did you try removing the print (e.g. printing once
every 100 messages) ?

:) Of course I did Nothing has changed

I wonder if there is a kind of setting for socket to allow no delays?

Nov 20 '08 #5

Gabriel Genellina

En Thu, 20 Nov 2008 14:24:20 -0200, Krzysztof Retel
<Kr*************@googlemail.comescribiÃ³:

On Nov 20, 4:00Â*pm, bieff...@gmail.com wrote:
>On 20 Nov, 16:03, Krzysztof Retel <Krzysztof.Re...@googlemail.com>
wrote:

I am struggling writing fast UDP server. It has to handle around 10000
UDP packets per second. I started building that with non blocking
socket and threads. Unfortunately my approach does not work at all.
I wrote a simple case test: client and server. The client sends 2200
packets within 0.137447118759 secs. The tcpdump received 2189 packets,
which is not bad at all.
But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 â€“ 700 received with blocking sockets.
The client and the server are working within the same local network
and tcpdump shows pretty correct amount of packets received.

I wonder if there is a kind of setting for socket to allow no delays?

I've used this script to test sending UDP packets. I've not seen any
delays.

<code>
"""a very simple UDP test

Usage:

%(name)s client <remotehost<message to send|length of message>
to continuously send messages to <remotehostuntil Ctrl-C

%(name)s server
to listen for messages until Ctrl-C

Uses port %(port)d. Once stopped, shows some statistics.
Creates udpstress-client.csv or udpstress-server.csv with
pairs (size,time)
"""

import os, sys
import socket
import time

PORT = 21758
BUFSIZE = 4096
socket.setdefaulttimeout(10.0)

def server(port):
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('',port))
print "Receiving at port %d" % (port)
history = []
print "Waiting for first packet to arrive...",
sock.recvfrom(BUFSIZE)
print "ok"
t0 = time.clock()
while 1:
try:
try:
data, remoteaddr = sock.recvfrom(BUFSIZE)
except socket.timeout:
print "Timed out"
break
except KeyboardInterrupt: # #1755388 #926423
raise
t1 = time.clock()
if not data:
break
history.append((len(data), t1-t0))
t0 = t1
except KeyboardInterrupt:
print "Stopped"
break
sock.close()
return history

def client(remotehost, port, data):
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
history = []
print "Sending %d-bytes packets to %s:%d" % (len(data), remotehost,
port)
t0 = time.clock()
while 1:
try:
nbytes = sock.sendto(data, (remotehost,port))
t1 = time.clock()
if not nbytes:
break
history.append((nbytes, t1-t0))
t0 = t1
except KeyboardInterrupt:
print "Stopped"
break
sock.close()
return history

def show_stats(history, which):
npackets = len(history)
bytes_total = sum([item[0] for item in history])
bytes_avg = float(bytes_total) / npackets
bytes_max = max([item[0] for item in history])
time_total = sum([item[1] for item in history])
time_max = max([item[1] for item in history])
time_min = min([item[1] for item in history])
time_avg = float(time_total) / npackets
speed_max = max([item[0]/item[1] for item in history if item[1]>0])
speed_min = min([item[0]/item[1] for item in history if item[1]>0])
speed_avg = float(bytes_total) / time_total
print "Packet count %8d" % npackets
print "Total bytes %8d bytes" % bytes_total
print "Total time %8.1f secs" % time_total
print "Avg size / packet %8d bytes" % bytes_avg
print "Max size / packet %8d bytes" % bytes_max
print "Max time / packet %8.1f us" % (time_max*1e6)
print "Min time / packet %8.1f us" % (time_min*1e6)
print "Avg time / packet %8.1f us" % (time_avg*1e6)
print "Max speed %8.1f Kbytes/sec" % (speed_max/1024)
print "Min speed %8.1f Kbytes/sec" % (speed_min/1024)
print "Avg speed %8.1f Kbytes/sec" % (speed_avg/1024)
print
open("udpstress-%s.csv" % which,"w").writelines(
["%d,%f\n" % item for item in history])

if len(sys.argv)>1:
if "client".startswith(sys.argv[1].lower()):
remotehost = sys.argv[2]
data = sys.argv[3]
if data.isdigit(): # means length of message
data = "x" * int(data)
history = client(remotehost, PORT, data)
show_stats(history, "client")
sys.exit(0)
elif "server".startswith(sys.argv[1].lower()):
history = server(PORT)
show_stats(history, "server")
sys.exit(0)

print >>sys.stderr, __doc__ % {
"name": os.path.basename(sys.argv[0]),
"port": PORT}
</code>

Start the server before the client.

--
Gabriel Genellina

Nov 21 '08 #6

John Nagle

Gabriel Genellina wrote:

En Thu, 20 Nov 2008 14:24:20 -0200, Krzysztof Retel
<Kr*************@googlemail.comescribiÃ³:
>On Nov 20, 4:00 pm, bieff...@gmail.com wrote:
>>On 20 Nov, 16:03, Krzysztof Retel <Krzysztof.Re...@googlemail.com>
wrote:

I am struggling writing fast UDP server. It has to handle around 10000
UDP packets per second. I started building that with non blocking
socket and threads. Unfortunately my approach does not work at all.
I wrote a simple case test: client and server. The client sends 2200
packets within 0.137447118759 secs. The tcpdump received 2189 packets,
which is not bad at all.
But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 â€“ 700 received with blocking sockets.
The client and the server are working within the same local network
and tcpdump shows pretty correct amount of packets received.

I wonder if there is a kind of setting for socket to allow no delays?

Is the program CPU-bound? If so, CPython is too slow for what you want
to do.

John Nagle

Nov 21 '08 #7

Greg Copeland

On Nov 20, 9:03*am, Krzysztof Retel <Krzysztof.Re...@googlemail.com>
wrote:

Hi guys,

I am struggling writing fast UDP server. It has to handle around 10000
UDP packets per second. I started building that with non blocking
socket and threads. Unfortunately my approach does not work at all.
I wrote a simple case test: client and server. The client sends 2200
packets within 0.137447118759 secs. The tcpdump received 2189 packets,
which is not bad at all.
But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 – 700 received with blocking sockets.
The client and the server are working within the same local network
and tcpdump shows pretty correct amount of packets received.

I included a bit of the code of the UDP server.

class PacketReceive(threading.Thread):
* * def __init__(self, tname, socket, queue):
* * * * self._tname = tname
* * * * self._socket = socket
* * * * self._queue = queue
* * * * threading.Thread.__init__(self, name=self._tname)

* * def run(self):
* * * * print 'Started thread: ', self.getName()
* * * * cnt = 1
* * * * cnt_msgs = 0
* * * * while True:
* * * * * * try:
* * * * * * * * data = self._socket.recv(512)
* * * * * * * * msg = data
* * * * * * * * cnt_msgs += 1
* * * * * * * * total += 1
* * * * * * * * # self._queue.put(msg)
* * * * * * * * print *'thread: %s, cnt_msgs: %d' % (self.getName(),
cnt_msgs)
* * * * * * except:
* * * * * * * * pass

I was also using Queue, but this didn't help neither.
Any idea what I am doing wrong?

I was reading that Python socket modules was causing some delays with
TCP server. They recomended to set up *socket option for nondelays:
"sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) ". I couldn't find any
similar option for UDP type sockets.
Is there anything I have to change in socket options to make it
working faster?
Why the server can't process all incomming packets? Is there a bug in
the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10.

Cheers
K

First and foremost, you are not being realistic here. Attempting to
squeeze 10,000 packets per second out of 10Mb/s (assumed) Ethernet is
not realistic. The maximum theoretical limit is 14,880 frames per
second, and that assumes each frame is only 84 bytes per frame, making
it useless for data transport. Using your numbers, each frame requires
(90B + 84B) 174B, which works out to be a theoretical maximum of ~7200
frames per second. These are obviously some rough numbers but I
believe you get the point. It's late here, so I'll double check my
numbers tomorrow.

In your case, you would not want to use TCP_NODELAY, even if you were
to use TCP, as it would actually limit your throughput. UDP does not
have such an option because each datagram is an ethernet frame - which
is not true for TCP as TCP is a stream. In this case, use of TCP may
significantly reduce the number of frames required for transport -
assuming TCP_NODELAY is NOT used. If you want to increase your
throughput, use larger datagrams. If you are on a reliable connection,
which we can safely assume since you are currently using UDP, use of
TCP without the use of TCP_NODELAY may yield better performance
because of its buffering strategy.

Assuming you are using 10Mb ethernet, you are nearing its frame-
saturation limits. If you are using 100Mb ethernet, you'll obviously
have a lot more elbow room but not nearly as much as one would hope
because 100Mb is only possible when frames which are completely
filled. It's been a while since I last looked at 100Mb numbers, but
it's not likely most people will see numbers near its theoretical
limits simply because that number has so many caveats associated with
it - and small frames are its nemesis. Since you are using very small
datagrams, you are wasting a lot of potential throughput. And if you
have other computers on your network, the situation is made yet more
difficult. Additionally, many switches and/or routes also have
bandwidth limits which may or may not pose a wall for your
application. And to make matters worse, you are allocating lots of
buffers (4K) to send/receive 90 bytes of data, creating yet more work
for your computer.

Options to try:
See how TCP measures up for you
Attempt to place multiple data objects within a single datagram,
thereby optimizing available ethernet bandwidth
You didn't say if you are CPU-bound, but you are creating a tuple and
appending it to a list on every datagram. You may find allocating
smaller buffers and optimizing your history accounting may help if
you're CPU-bound.
Don't forget, localhost does not suffer from frame limits - it's
basically testing your memory/bus speed
If this is for local use only, considering using a different IPC
mechanism - unix domain sockets or memory mapped files

Nov 21 '08 #8

Krzysztof Retel

On Nov 21, 5:49*am, Greg Copeland <gtcopel...@gmail.comwrote:

On Nov 20, 9:03*am, Krzysztof Retel <Krzysztof.Re...@googlemail.com>
wrote:

Hi guys,

I am struggling writing fast UDP server. It has to handle around 10000
UDP packets per second. I started building that with non blocking
socket and threads. Unfortunately my approach does not work at all.
I wrote a simple case test: client and server. The client sends 2200
packets within 0.137447118759 secs. The tcpdump received 2189 packets,
which is not bad at all.
But the server only handles 700 -- 870 packets, when it is non-
blocking, and only 670 – 700 received with blocking sockets.
The client and the server are working within the same local network
and tcpdump shows pretty correct amount of packets received.

I included a bit of the code of the UDP server.

class PacketReceive(threading.Thread):
* * def __init__(self, tname, socket, queue):
* * * * self._tname = tname
* * * * self._socket = socket
* * * * self._queue = queue
* * * * threading.Thread.__init__(self, name=self._tname)

* * def run(self):
* * * * print 'Started thread: ', self.getName()
* * * * cnt = 1
* * * * cnt_msgs = 0
* * * * while True:
* * * * * * try:
* * * * * * * * data = self._socket.recv(512)
* * * * * * * * msg = data
* * * * * * * * cnt_msgs += 1
* * * * * * * * total += 1
* * * * * * * * # self._queue.put(msg)
* * * * * * * * print *'thread: %s, cnt_msgs: %d' % (self.getName(),
cnt_msgs)
* * * * * * except:
* * * * * * * * pass

I was also using Queue, but this didn't help neither.
Any idea what I am doing wrong?

I was reading that Python socket modules was causing some delays with
TCP server. They recomended to set up *socket option for nondelays:
"sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) ". I couldn't find any
similar option for UDP type sockets.
Is there anything I have to change in socket options to make it
working faster?
Why the server can't process all incomming packets? Is there a bug in
the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10.

Cheers
K

First and foremost, you are not being realistic here. Attempting to
squeeze 10,000 packets per second out of 10Mb/s (assumed) Ethernet is
not realistic. The maximum theoretical limit is 14,880 frames per
second, and that assumes each frame is only 84 bytes per frame, making
it useless for data transport. Using your numbers, each frame requires
(90B + 84B) 174B, which works out to be a theoretical maximum of ~7200
frames per second. These are obviously some rough numbers but I
believe you get the point. It's late here, so I'll double check my
numbers tomorrow.

In your case, you would not want to use TCP_NODELAY, even if you were
to use TCP, as it would actually limit your throughput. UDP does not
have such an option because each datagram is an ethernet frame - which
is not true for TCP as TCP is a stream. In this case, use of TCP may
significantly reduce the number of frames required for transport -
assuming TCP_NODELAY is NOT used. If you want to increase your
throughput, use larger datagrams. If you are on a reliable connection,
which we can safely assume since you are currently using UDP, use of
TCP without the use of TCP_NODELAY may yield better performance
because of its buffering strategy.

Assuming you are using 10Mb ethernet, you are nearing its frame-
saturation limits. If you are using 100Mb ethernet, you'll obviously
have a lot more elbow room but not nearly as much as one would hope
because 100Mb is only possible when frames which are completely
filled. It's been a while since I last looked at 100Mb numbers, but
it's not likely most people will see numbers near its theoretical
limits simply because that number has so many caveats associated with
it - and small frames are its nemesis. Since you are using very small
datagrams, you are wasting a lot of potential throughput. And if you
have other computers on your network, the situation is made yet more
difficult. Additionally, many switches and/or routes also have
bandwidth limits which may or may not pose a wall for your
application. And to make matters worse, you are allocating lots of
buffers (4K) to send/receive 90 bytes of data, creating yet more work
for your computer.

Options to try:
See how TCP measures up for you
Attempt to place multiple data objects within a single datagram,
thereby optimizing available ethernet bandwidth
You didn't say if you are CPU-bound, but you are creating a tuple and
appending it to a list on every datagram. You may find allocating
smaller buffers and optimizing your history accounting may help if
you're CPU-bound.
Don't forget, localhost does not suffer from frame limits - it's
basically testing your memory/bus speed
If this is for local use only, considering using a different IPC
mechanism - unix domain sockets or memory mapped files

Greg, thanks very much for your reply.
I am not sure what do you mean by CPU-bound? How can I find out if I
run it on CPU-bound?

May I also ask you for list of references about sockets and
networking? Just want to develop my knowledge regarding networking.

Cheers
K

Nov 21 '08 #9

Peter Pearson

On Fri, 21 Nov 2008 08:14:19 -0800 (PST), Krzysztof Retel wrote:

I am not sure what do you mean by CPU-bound? How can I find out if I
run it on CPU-bound?

CPU-bound is the state in which performance is limited by the
availability of processor cycles. On a Unix box, you might
run the "top" utility and look to see whether the "%CPU" figure
indicates 100% CPU use. Alternatively, you might have a
tool for plotting use of system resources.

--
To email me, substitute nowhere->spamcop, invalid->net.

Nov 21 '08 #10

Krzysztof Retel

On Nov 21, 4:48*pm, Peter Pearson <ppear...@nowhere.invalidwrote:

On Fri, 21 Nov 2008 08:14:19 -0800 (PST), Krzysztof Retel wrote:
I am not sure what do you mean by CPU-bound? How can I find out if I
run it on CPU-bound?

CPU-bound is the state in which performance is limited by the
availability of processor cycles. *On a Unix box, you might
run the "top" utility and look to see whether the "%CPU" figure
indicates 100% CPU use. *Alternatively, you might have a
tool for plotting use of system resources.

--
To email me, substitute nowhere->spamcop, invalid->net.

Thanks. I run it without CPU-bound

Nov 21 '08 #11

Greg Copeland

On Nov 21, 11:05*am, Krzysztof Retel <Krzysztof.Re...@googlemail.com>
wrote:

On Nov 21, 4:48*pm, Peter Pearson <ppear...@nowhere.invalidwrote:

On Fri, 21 Nov 2008 08:14:19 -0800 (PST), Krzysztof Retel wrote:
I am not sure what do you mean by CPU-bound? How can I find out if I
run it on CPU-bound?

CPU-bound is the state in which performance is limited by the
availability of processor cycles. *On a Unix box, you might
run the "top" utility and look to see whether the "%CPU" figure
indicates 100% CPU use. *Alternatively, you might have a
tool for plotting use of system resources.

--
To email me, substitute nowhere->spamcop, invalid->net.

Thanks. I run it without CPU-bound

With clearer eyes, I did confirm my math above is correct. I don't
have a networking reference to provide. You'll likely have some good
results via Google. :)

If you are not CPU bound, you are likely IO-bound. That means you
computer is waiting for IO to complete - likely on the sending side.
In this case, it likely means you have reached your ethernet bandwidth
limits available to your computer. Since you didn't correct me when I
assumed you're running 10Mb ethernet, I'll continue to assume that's a
safe assumption. So, assuming you are running on 10Mb ethernet, try
converting your application to use TCP. I'd bet, unless you have
requirements which prevent its use, you'll suddenly have enough
bandwidth (in this case, frames) to achieve your desired results.

This is untested and off the top of my head but it should get you
pointed in the right direction pretty quickly. Make the following
changes to the server:

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
to
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

Make this:
print "Waiting for first packet to arrive...",
sock.recvfrom(BUFSIZE)

look like:
print "Waiting for first packet to arrive...",
cliSock = sock.accept()

Change your calls to sock.recvfrom(BUFSIZE) to cliSock.recv(BUFSIZE).
Notice the change to "cliSock".

Keep in mind TCP is stream based, not datagram based so you may need
to add additional logic to determine data boundaries for re-assemble
of your data on the receiving end. There are several strategies to
address that, but for now I'll gloss it over.

As someone else pointed out above, change your calls to time.clock()
to time.time().

On your client, make the following changes.
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
to
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect( (remotehost,port) )

nbytes = sock.sendto(data, (remotehost,port))
to
nbytes = sock.send(data)

Now, rerun your tests on your network. I expect you'll be faster now
because TCP can be pretty smart about buffering. Let's say you write
16, 90B blocks to the socket. If they are timely enough, it is
possible all of those will be shipped across ethernet as a single
frame. So what took 16 frames via UDP can now *potentially* be done in
a single ethernet frame (assuming 1500MTU). I say potentially because
the exact behaviour is OS/stack and NIC-driver specific and is often
tunable to boot. Likewise, on the client end, what previously required
15 calls to recvfrom, each returning 90B, can *potentially* be
completed in a single call to recv, returning 1440B. Remember, fewer
frames means less protocol overhead which makes more bandwidth
available to your applications. When sending 90B datagrams, you're
waisting over 48% of your available bandwidth because of protocol
overhead (actually a lot more because I'm not accounting for UDP
headers).

Because of the differences between UDP and TCP, unlike your original
UDP implementation which can receive from multiple clients, the TCP
implementation can only receive from a single client. If you need to
receive from multiple clients concurrently, look at python's select
module to take up the slack.

Hopefully you'll be up and running. Please report back your findings.
I'm curious as to your results.

Nov 21 '08 #12

Problem with writing fast UDP server

Similar topics