Fixing socket.makefile()

Bryan Olson

Here's the problem: Suppose we use:

import socket
[...]
f = some_socket.makefile()

Then:

f.read() is efficient, but verbose, and incorrect (or at
least does not play will with others);

f.readline() is correct, but verbose and inefficient.

To justify the "verbose" part, just look at the code in the
Python library's socket.py. Below, I'll explain playing well
with others, and then (in)efficiency.

Consider the operations:

f = some_socket.makefile()
ch = f.read(1)
print "The first char is", ch
ch = some_socket.recv(1)
print "The second char is", ch

The code above does *not* (usually) print the first and second
characters from the socket.

The problem is that makefile() returns a Python object that has
its own local buffer. The recv() call reads directly from the
socket, oblivious to any data queued in the file object's
buffer. The problem is not limited to recv(); select(), and
perhaps other calls, will ignore the buffer and look directly at
the socket. Output buffering appears to have a similar problem.

Now look up socket.makefile().readline(). It gets one byte at a
time. It will get the byte from the Python buffer if the buffer
is non-empty, otherwise it will try to recv() one byte at a
time, directly from the socket. By itself, readline() never
over-reads the socket; if select() and recv() would work
correctly before the readline(), they'll work after. While
correct, reading one byte at a time is painfully slow.

The Python Library Reference is silent on whether the
socket.makefile operations are supposed to interact correctly
with the direct socket operations. If they are supposed to play
well together, then read() is wrong. If they are not, then
readline() is absurdly slow.

Enough of my whining. The good news is that we can have both
efficiency and correctness, and we can fix the bloat at the same
time. Operating systems already do efficient buffering for
sockets. That efficiency varies, but any smart operating system
copies buffers to user-space in large chunks, and answers
recv()'s from the buffers without system calls, when possible.
Python's socket module now supports MSG_PEEK, which enables
Python code to examine a socket's native buffer.

Below my sig, I show code to replace the corresponding member
functions in the class socket._fileobject. The updated version
passes the tests in test_socket.py.

Make sense? Worth doing? I thought I'd talk it up here before
jumping into the devel list.
--
--Bryan

# class _fileobject(object):

def __init__(self, sock, mode='rb', bufsize=-1):
self._sock = sock
if bufsize <= 0:
bufsize = self.default_bufsize
self.bufsize = bufsize
self.softspace = False

def read(self, size=-1):
if size <= 0:
size = sys.maxint
blocks = []
while size > 0:
b = self._sock.recv(min(size, self.bufsize))
size -= len(b)
if not b:
break
blocks.append(b)
return "".join(blocks)

def readline(self, size=-1):
if size < 0:
size = sys.maxint
blocks = []
read_size = min(20, size)
found = 0
while size and not found:
b = self._sock.recv(read_size, MSG_PEEK)
if not b:
break
found = b.find('\n') + 1
length = found or len(b)
size -= length
blocks.append(self._sock.recv(length))
read_size = min(read_size * 2, size, self.bufsize)
return "".join(blocks)

def write(self, data):
self._sock.sendall(str(data))

def writelines(self, lines):
# This version mimics the current writelines, which calls
# str() on each line, but comments that we should reject
# non-string non-buffers. Let's omit the next line.
lines = [str(s) for s in lines]
self._sock.sendall(''.join(lines))

def flush(self):
pass

Jul 18 '05 #1

Subscribe Post Reply

11706

Alan Kennedy

[Bryan Olson]

The problem is that makefile() returns a Python object that has
its own local buffer. The recv() call reads directly from the
socket, oblivious to any data queued in the file object's
buffer. The problem is not limited to recv(); select(), and
perhaps other calls, will ignore the buffer and look directly at
the socket. Output buffering appears to have a similar problem.
and
The Python Library Reference is silent on whether the
socket.makefile operations are supposed to interact correctly
with the direct socket operations. If they are supposed to play
well together, then read() is wrong. If they are not, then
readline() is absurdly slow.

I'm glad you asked these questions ;-)

I also am interested in the answers, because I'm just coming to end of
my implementation of cpython 2.3 compatible socket, select and asyncore
modules for jython, i.e. asynchronous socket support, using the new
java.nio APIs in jdk1.4+.

Points to make in relation to jython include

1. The problem you describe doesn't arise very often, I think. Most
users who use makefile() on sockets are going to use the file-based
interface exclusively and not the underlying socket interface.

2. The problem does not exist in jython, because jython implements the
socket.makefile() method by returning wrappers on the java.net.socket's
InputStream and OutputStream, meaning that calling either file or socket
interface sends data through the same underlying streams.

3. I am eager to have the behaviour of cpython explicitly defined, since
I am working hard to make my jython implementation 100% cpython
compatible, right down to the exceptions. I want all cpython socket code
to not know that it's running on jython.

4. I'm particularly interested in seeing documentation on how read and
write operations on socket.makefile()s should behave when the socket is
in non-blocking mode: Should it raise an exception? Which exception? The
same exception on every platform?

P.S. To those who know I've working on this for *ages* now: apologies
(Hi Irmen :-) My finances have prevented me from spending too much time
working on this voluntary project. However, you may be encouraged to
know that I now have it passing most of the cpython 2.3 test_socket.py
unit tests (including the ones that use select.select). It's only a
matter of a month or two more now .....

Out of interest: Does anyone know if developing asynch-socket support
for jython is the sort of work that might fall under the auspices of the
PSF grant scheme?

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Jul 18 '05 #2

Donn Cave

In article <pJ*****************@newssvr27.news.prodigy.com> ,
Bryan Olson <fa*********@nowhere.org> wrote:
....

The problem is that makefile() returns a Python object that has
its own local buffer. The recv() call reads directly from the
socket, oblivious to any data queued in the file object's
buffer. The problem is not limited to recv(); select(), and
perhaps other calls, will ignore the buffer and look directly at
the socket. Output buffering appears to have a similar problem.

Now look up socket.makefile().readline(). It gets one byte at a
time. It will get the byte from the Python buffer if the buffer
is non-empty, otherwise it will try to recv() one byte at a
time, directly from the socket. By itself, readline() never
over-reads the socket; if select() and recv() would work
correctly before the readline(), they'll work after. While
correct, reading one byte at a time is painfully slow.

I don't get this. Has socket.py changed this much since 2.2?
The readline I'm looking at says self._sock.recv(self._rbufsize),
so you would only get this behavior if you specified a buffer
size of 1 or less, and read() does the same - so you could do
this to yourself, but not specially just with readline.

At any rate, I think it would put this in better perspective
to recall that pipes, terminals and in general any "slow"
device has the same issues, and that they work out the same
in Python as in the original C, with socket file descriptors
in place of socket objects and stdio file pointers in place
of file objects.

It's definitely a problem, and some kind of solution might be
well received, but it needs to be portable (so forget MSG_PEEK
unless you're really confident that it will be supported on
every platform that now supports sockets to some useful degree),
and it would be nice to apply to the problem in general and not
just sockets. I think the root of the problem really is that
select() doesn't look at process buffers in fileobject instances,
and it can't be made to do that because that information isn't
available from the stdio file pointer underneath the fileobject.
So, you need a replacement for fileobject, to start with.

Donn Cave, do**@u.washington.edu

Jul 18 '05 #3

John J. Lee

Alan Kennedy <al****@hotmail.com> writes:
[...]

I also am interested in the answers, because I'm just coming to end of
my implementation of cpython 2.3 compatible socket, select and
asyncore modules for jython, i.e. asynchronous socket support, using
the new java.nio APIs in jdk1.4+.
Hooray!

[...] P.S. To those who know I've working on this for *ages* now: apologies
(Hi Irmen :-) My finances have prevented me from spending too much
time working on this voluntary project. However, you may be encouraged
to know that I now have it passing most of the cpython 2.3
test_socket.py unit tests (including the ones that use
select.select). It's only a matter of a month or two more now .....
Having Pyro running on both ends of Jython / CPython divide will be
very handy.

Out of interest: Does anyone know if developing asynch-socket support
for jython is the sort of work that might fall under the auspices of
the PSF grant scheme?

[...]

I don't see why not. This sort of fundamental-but-undramatic stuff is
really valuable.

Maybe the PSF should look into funding research aimed at cloning
Martin v. Loewis? Or, if he's really a bot, maybe he could be
reimplemented using your new code, for superior scalability?
John

Jul 18 '05 #4

Bryan Olson

Donn Cave wrote:
[...]

I don't get this. Has socket.py changed this much since 2.2?
The readline I'm looking at says self._sock.recv(self._rbufsize),
so you would only get this behavior if you specified a buffer
size of 1 or less, and read() does the same - so you could do
this to yourself, but not specially just with readline.
Hi Donn; yes, looks like I got confused on that one.
At any rate, I think it would put this in better perspective
to recall that pipes, terminals and in general any "slow"
device has the same issues, and that they work out the same
in Python as in the original C, with socket file descriptors
in place of socket objects and stdio file pointers in place
of file objects.
And it gets worse. I've seen layered handlers with buffers of
buffers of buffers.
It's definitely a problem, and some kind of solution might be
well received, but it needs to be portable (so forget MSG_PEEK
unless you're really confident that it will be supported on
every platform that now supports sockets to some useful degree),
Hold on ... Gooogle...Google...Google... Well, support for
MSG_PEEK seems to be universal except for a couple reported bugs
and versions of BeOS without BONE (BeOS Network Environment).
I've never used BeOS, but apparently BeOS'ers are used to the
idea that they need BONE to get network stuff working.

Actually testing against the wide range of platforms is beyond
my own capabilities.
and it would be nice to apply to the problem in general and
not just sockets.
Agreed, but for now I'd like to call that out-of-scope. I came
upon this particular problem when writing an HTTP/1.1 thingy.
The socket module works well, but I found the higher-level
library classes not-so-useful.
I think the root of the problem really is that
select() doesn't look at process buffers in fileobject instances,
and it can't be made to do that because that information isn't
available from the stdio file pointer underneath the fileobject.
So, you need a replacement for fileobject, to start with.

Really we want a general, portable, extensible event-handler.
It should to be unified with all the asynchronous things, such
as socket/file activity, thread locks and semaphore, and GUI
event loops.
--
--Bryan

Jul 18 '05 #5

Similar topics

socket.makefile & AF_UNIX

by: Jamie Saker | last post by:

I think I'm overlooking something assumed in socket's makefile method. Googling several hours and digging thru the python reference didn't help - I think I'm overlooking an assumption between...

Python

Annoying Socket Problem

by: John Abel | last post by:

I'm hoping this is something simple, and someone can point me in the right direction here. I have a class based on SocketServer (ThreadingTCPServer), and I've used makefile on the socket so I use...

Python

Is socket.shutdown(1) useless

by: pyguy2 | last post by:

Issues of socket programming can be wierd, so I'm looking for some comments. In my python books I find exclusive use of socket.close(). From my other readings, I know about a "partial close...

Python

compiling socket client and server on cygwin

by: Eugene A | last post by:

Hello. I am trying to compile a linux socket server and a client in cygwin on windows. The g++ version is 3.3.1. The source was obtained from this location: ...

C / C++

need a thread to keep a socket connection alive?

by: nephish | last post by:

hey there, i have a script that waits for message packets from a data server over a socket. it goes a little like this: while 1: x+=1 databack = sockobj.recv(158) if databack:

Python

get a line of text from a socket...

by: KraftDiner | last post by:

If you don't know how long your input data is going to be how can you at least treat it a text line at a time... like looking for new line in the data... Right now recv blocks. Yes I could do a...

Python

Socket module bug on OpenVMS

by: Irmen de Jong | last post by:

Hi, Recently I was bitten by an apparent bug in the BSD socket layer on Open VMS. Specifically, it appears that VMS defines MSG_WAITALL in socket.h but does not implement it (it is not in the...

Python

The trouble with sockets.... (fixing inheritance, etc.)

by: rossabri | last post by:

This topic has been addressed in limited detail in other threads: "sockets don't play nice with new style classes :(" May 14 2005....

Python

socket.makefile() buggy?

by: ahlongxp | last post by:

socket.makefile() may lose data when "connection reset by peer". and socket.recv() will never lose the data. change the "1" to "0" in the client code to see the difference. confirmed on both...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General