By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,375 Members | 1,342 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,375 IT Pros & Developers. It's quick & easy.

Urllib2, problems with a webserver

P: n/a
My aim is to write a small application to use free sms-sending services
in a more convenient way than with a web-browser. I found: (which resemples the perl
variant). With mechanize I should manage to interact with the website
through python, like supplying usernames, filling the message form, etc.
All well so far, I have installed it and tested it locally,
seems to work well.
But this piece of code:
from mechanize import Browser

b = Browser()"")
assert b.viewing_html()

print b.geturl()
print b.title()

Give me this error:

Traceback (most recent call last):
File "./", line 11, in ?"")
File "/usr/lib/python2.3/site-packages/mechanize/",
line 106, in open
def open(self, url, data=None): return self._open(url, data)
File "/usr/lib/python2.3/site-packages/mechanize/",
line 133, in _open
File "/usr/lib/python2.3/site-packages/mechanize/",
line 464, in _parse_html
for token in p.tags(*(self.urltags.keys()+["base"])):
File "/usr/lib/python2.3/site-packages/", line 90, in
yield fn(*args, **kwds)
File "/usr/lib/python2.3/site-packages/", line 194,
in get_tag
tok = self.get_token()
File "/usr/lib/python2.3/site-packages/", line 177,
in get_token
File "/usr/lib/python2.3/", line 108, in feed
File "/usr/lib/python2.3/", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.3/", line 239, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.3/", line 314,
in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.3/", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 2, column 1365

By commenting out line 314 of and adding "return"
I manage to continue. And everything seems to work (albeit not tested
much). This is ofcourse not an acceptable solution...

How come I get this error?

Can the server software be a issue?
According to netcraft the server runs Microsoft-IIS/5.0


Jul 18 '05 #1
Share this Question
Share on Google+
1 Reply

P: n/a
Erling Ringen Elvsrud <er*******> writes:
HTMLParser.HTMLParseError: malformed start tag, at line 2, column 1365 [...] How come I get this error?


Bad HTML. (OK, I haven't actually looked at the HTML, but it's 100/1
that HTMLParser is at fault.)

I hope eventually to rewrite mechanize to use htmllib.HTMLParser
everywhere, and not use HTMLParser.HTMLParser. The former is less
fussy. That just means rewriting pullparser to support both classes,
I think. Not too hard (see ClientForm for how to do it -- why not
write a patch?-).

In the meantime, the best thing to do is to pre-process the HTML.
Inconvenient, I know. Also a bit inconvenient is that the only way to
do this ATM with mechanize is to write a tiny urllib2 handler class
(.http_response() is the handler method you want, which only exists in
the as-yet-unreleased Python 2.4, and in ClientCookie, which has a
near-identical interface to urllib2; mechanize uses ClientCookie, not
urllib2). See posts to the wwwsearch-general mailing lists for sample

Don't mix urllib2 and ClientCookie, BTW (with the exception of classes
that exist in urllib2 but not in ClientCookie: you can use those
urllib2 classes with ClientCookie).

Jul 18 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.