BaseHTTPServer weirdness

Ron Garret

I'm trying to figure out how to use BaseHTTPServer. Here's my little
test app:

=================================

#!/usr/bin/python

from BaseHTTPServer import *

import cgi

class myHandler(BaseHTTPRequestHandler):

def do_GET(r):
s = ''
try:
s = cgi.parse_qs(r.rfile.read(int(r.headers.get
("Content-length"))), 1)
except:
pass

r.send_response(200)
r.send_header("Content-type", "text/html")
r.end_headers()
r.wfile.write("""
<form method=post action=foo>
<input type=text name=text1 value="">
<input type=text name=text2 value="">
<input type=submit>
</form%s
""" % s)

def do_POST(r):
r.do_GET()
d = HTTPServer(('', 1024), myHandler)
d.serve_forever()

===================================

Two questions:

1. The line:

s = cgi.parse_qs(r.rfile.read(int(r.headers.get("Conte nt-length"))), 1)

feels like a horrible hack. It seems like this would be a better
alternative:

s = cgi.parse(r.rfile)

but that doesn't actually work. Why? What is the Right Way to parse
form data in a BaseHTTPServer?

2. Despite the fact that I'm passing a 1 for the keep_blank_values
argument to cgi.parse_qs, it doesn't actually keep blank values. Is
this a bug, or am I doing something wrong?

Thanks,
rg

Sep 11 '06 #1

Subscribe Reply

2641

Steve Holden

Ron Garret wrote:

I'm trying to figure out how to use BaseHTTPServer. Here's my little
test app:

=================================

#!/usr/bin/python

from BaseHTTPServer import *

import cgi

class myHandler(BaseHTTPRequestHandler):

def do_GET(r):
s = ''
try:
s = cgi.parse_qs(r.rfile.read(int(r.headers.get
("Content-length"))), 1)
except:
pass

r.send_response(200)
r.send_header("Content-type", "text/html")
r.end_headers()
r.wfile.write("""
<form method=post action=foo>
<input type=text name=text1 value="">
<input type=text name=text2 value="">
<input type=submit>
</form%s
""" % s)

def do_POST(r):
r.do_GET()
d = HTTPServer(('', 1024), myHandler)
d.serve_forever()

===================================

Two questions:

1. The line:

s = cgi.parse_qs(r.rfile.read(int(r.headers.get("Conte nt-length"))), 1)

feels like a horrible hack. It seems like this would be a better
alternative:

s = cgi.parse(r.rfile)

but that doesn't actually work. Why? What is the Right Way to parse
form data in a BaseHTTPServer?

The normal way is

s = cgi.parse()

since the CGI script sees the client network socket (after consumption
of HTTP headers) as its standard input. However I'm not sure how much it
currently does in the way on handling strange inputs like gzip
compressed data.

2. Despite the fact that I'm passing a 1 for the keep_blank_values
argument to cgi.parse_qs, it doesn't actually keep blank values. Is
this a bug, or am I doing something wrong?

Sounds like a bug, but then since your parsing looks buggy I'm surprised
you get anything at all. Try using a keyword argument
keep_blank_values=1 just in case the order has changed or something
daft. But fix your parsing first.

The other thing to note is that since you are putting a dictionary's
string representation out straight into your HTML if there are odd
characters in it this may give you strange output in the browser, so you
should view the page source to ensure that's not the case. Which it
probably isn't ...

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 11 '06 #2

Ron Garret

In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

The normal way is

s = cgi.parse()

since the CGI script sees the client network socket (after consumption
of HTTP headers) as its standard input.

Doesn't work. (I even tried sys.stdin=r.rfile; s=cgi.parse()) Don't
forget, this is not a CGI script, it's a handler for a BaseHTTPServer.

2. Despite the fact that I'm passing a 1 for the keep_blank_values
argument to cgi.parse_qs, it doesn't actually keep blank values. Is
this a bug, or am I doing something wrong?
Sounds like a bug, but then since your parsing looks buggy I'm surprised
you get anything at all. Try using a keyword argument
keep_blank_values=1 just in case the order has changed or something
daft. But fix your parsing first.

The other thing to note is that since you are putting a dictionary's
string representation out straight into your HTML if there are odd
characters in it this may give you strange output in the browser, so you
should view the page source to ensure that's not the case. Which it
probably isn't ...

I know that's not a problem because it does work when I use parse_qs.
(I know about escaping HTML and all that, but this is just a little test
program.)

rg

Sep 11 '06 #3

Steve Holden

Ron Garret wrote:

In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

>>The normal way is

s = cgi.parse()

since the CGI script sees the client network socket (after consumption
of HTTP headers) as its standard input.

Doesn't work. (I even tried sys.stdin=r.rfile; s=cgi.parse()) Don't
forget, this is not a CGI script, it's a handler for a BaseHTTPServer.

Right. My bad. However there's clearly something screwy going on,
because otherwise you'd expect to see at least an empty dictionary in
the output.

>

>>>2. Despite the fact that I'm passing a 1 for the keep_blank_values
argument to cgi.parse_qs, it doesn't actually keep blank values. Is
this a bug, or am I doing something wrong?

Sounds like a bug, but then since your parsing looks buggy I'm surprised
you get anything at all. Try using a keyword argument
keep_blank_values=1 just in case the order has changed or something
daft. But fix your parsing first.

Reading the source of the 2.4.3 library shows that someone added an
environ=os.environ argument, which will be the second argument on a
positional call, so that clears that mystery up. The doicumentation
should really show these as keyword arguments rather than implying they
are positionals. It'd be nice if you could report this as a
documentation bug - though I believe by now the 2.5rc2 release will be
frozen.

>>The other thing to note is that since you are putting a dictionary's
string representation out straight into your HTML if there are odd
characters in it this may give you strange output in the browser, so you
should view the page source to ensure that's not the case. Which it
probably isn't ...

I know that's not a problem because it does work when I use parse_qs.
(I know about escaping HTML and all that, but this is just a little test
program.)

I suspect that the remainder of your problems (cgi_parse appears to be
returning a *string*, dammit) are due to the fact that the process you
are running the HTTP server in doesn't have the environment variables
set that a server would set if it really were being called in a CGI
context, and which the CGI library expects to be set. You could try
passing them as an explicit environ argument and see if that worked.

But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 11 '06 #4

Ron Garret

In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.

rg

Sep 11 '06 #5

Damjan

>But basically, you aren't providing a CGI environment, and that's why

>cgi.parse() isn't working.

Clearly. So what should I be doing?

Probably you'll need to read the source of cgi.parse_qs (like Steve did) and
see what it needs from os.environ and then provide that (either in
os.environ or in a custom environ dictionary).

BUT why don't you use WSGI?

--
damjan

Sep 11 '06 #6

Steve Holden

Ron Garret wrote:

In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

>>But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.

Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that. Instead you want to use your own code.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Have I missed anything? :-)

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 12 '06 #7

Kent Johnson

Steve Holden wrote:

Ron Garret wrote:
>In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

>>But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.

Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that. Instead you want to use your own code.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Have I missed anything? :-)

Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?

Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.

HTH,
Kent

Sep 12 '06 #8

Ron Garret

In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

Ron Garret wrote:
In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

>But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.
Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that.

That's right. I don't want to run CGI scripts. I don't want to launch
a new process for every request. I want all requests handled in the
server process.

Instead you want to use your own code.

No, the whole reason I'm asking this question is because I *don't* want
to write my own code. It seems to me that the code to do what I want
ought to be out there (or in there) somewhere and I shouldn't have to
reinvent this wheel. But I can't find it.

So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Yep.

rg

Sep 12 '06 #9

Ron Garret

In article <3q***************@newsreading01.news.tds.net>,
Kent Johnson <ke**@kentsjohnson.comwrote:

Steve Holden wrote:
Ron Garret wrote:
In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:
But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.

Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that. Instead you want to use your own code.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Have I missed anything? :-)

Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?

It may come to that. I just thought that what I'm trying to do is so
basic that it ought to be part of the standard library. I mean, what do
people use BaseHTTPServer for if you can't parse form input?

Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.

Here's what I actually ended up doing:

def parse(r):
ctype = r.headers.get('content-type')
if not ctype: return None
ctype, pdict = cgi.parse_header(ctype)
if ctype == 'multipart/form-data':
return cgi.parse_multipart(r.rfile, pdict)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(r.headers.get('Content-length'))
if maxlen and clength maxlen:
raise ValueError, 'Maximum content length exceeded'
return cgi.parse_qs(r.rfile.read(clength), 1)
else:
return None

which is copied more or less directly from cgi.py. But it still seems
to me like this (or something like it) ought to be standardized in one
of the *HTTPServer.py modules.

But what do I know?

rg

Sep 12 '06 #10

Ron Garret

In article <45***********************@news.sunsite.dk>,
Damjan <gd*****@gmail.comwrote:

But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.
Clearly. So what should I be doing?

Probably you'll need to read the source of cgi.parse_qs (like Steve did) and
see what it needs from os.environ and then provide that (either in
os.environ or in a custom environ dictionary).

I ended up just copying and hacking the code. It was only a dozen lines
or so. But it still feels wrong.

BUT why don't you use WSGI?

Because BaseHTTPServer does everything I need except for this one thing.
Why use a sledge hammer to squish a gnat?

rg

Sep 12 '06 #11

Steve Holden

Ron Garret wrote:

In article <3q***************@newsreading01.news.tds.net>,
Kent Johnson <ke**@kentsjohnson.comwrote:

>>Steve Holden wrote:

>>>Ron Garret wrote:

In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

>But basically, you aren't providing a CGI environment, and that's why
>cgi.parse() isn't working.

Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.
Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that. Instead you want to use your own code.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Have I missed anything? :-)

Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?

It may come to that. I just thought that what I'm trying to do is so
basic that it ought to be part of the standard library. I mean, what do
people use BaseHTTPServer for if you can't parse form input?

>>Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('conte nt-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.

Here's what I actually ended up doing:

def parse(r):
ctype = r.headers.get('content-type')
if not ctype: return None
ctype, pdict = cgi.parse_header(ctype)
if ctype == 'multipart/form-data':
return cgi.parse_multipart(r.rfile, pdict)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(r.headers.get('Content-length'))
if maxlen and clength maxlen:
raise ValueError, 'Maximum content length exceeded'
return cgi.parse_qs(r.rfile.read(clength), 1)
else:
return None

which is copied more or less directly from cgi.py. But it still seems
to me like this (or something like it) ought to be standardized in one
of the *HTTPServer.py modules.

But what do I know?

I wouldn't necessarily say you are wrong here, It's just that the cgi
module has sort of "just growed", so it isn't conveniently factyored for
reusability in other contexts. Several people (including me) have taken
a look at it with a view to possible re-engineering and backed away
because of the difficulty of maintaining compatibility. Python 3K will
be an ideal oppoertunity to replace it, but until then it's probably
going to stay in the same rather messy but working state.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 12 '06 #12

Ron Garret

In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

I wouldn't necessarily say you are wrong here, It's just that the cgi
module has sort of "just growed", so it isn't conveniently factyored for
reusability in other contexts. Several people (including me) have taken
a look at it with a view to possible re-engineering and backed away
because of the difficulty of maintaining compatibility. Python 3K will
be an ideal oppoertunity to replace it, but until then it's probably
going to stay in the same rather messy but working state.

It's not necessary to re-engineer cgi, just cutting and pasting and
editing the code as I've done would seem to suffice.

But all I'm really looking for here at this point is confirmation that
I'm not in fact doing something stupid. In the past I've found that
nine times out of ten if I find myself wanting to rewrite or add
something to a Python module it's an indication that I'm doing something
wrong.

rg

Sep 12 '06 #13

Eddie Corns

Ron Garret <rN*******@flownet.comwrites:

>In article <ma*************************************@python.or g>,
Steve Holden <st***@holdenweb.comwrote:

>I wouldn't necessarily say you are wrong here, It's just that the cgi
module has sort of "just growed", so it isn't conveniently factyored for
reusability in other contexts. Several people (including me) have taken
a look at it with a view to possible re-engineering and backed away
because of the difficulty of maintaining compatibility. Python 3K will
be an ideal oppoertunity to replace it, but until then it's probably
going to stay in the same rather messy but working state.

>It's not necessary to re-engineer cgi, just cutting and pasting and
editing the code as I've done would seem to suffice.

>But all I'm really looking for here at this point is confirmation that
I'm not in fact doing something stupid. In the past I've found that
nine times out of ten if I find myself wanting to rewrite or add
something to a Python module it's an indication that I'm doing something
wrong.

Well if it's any consolation; that's exactly what I did - cut about 7 lines
from CGIHTTPSERVER into my do_POST method. Maybe we're both stoopid. This
was at least 3 years ago before I moved on to Quixote and then
CherryPy/TurboGears but I recall thinking at the time that it was probably
just one of those little cracks that show up from time to time in the library
(there aren't so very many of them).

Eddie

Sep 12 '06 #14

BaseHTTPServer weirdness

Similar topics