By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,375 Members | 1,342 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,375 IT Pros & Developers. It's quick & easy.

Urllib2, problems with a webserver

P: n/a
Hello,
My aim is to write a small application to use free sms-sending services
in a more convenient way than with a web-browser. I found:
http://wwwsearch.sourceforge.net/mechanize/ (which resemples the perl
variant). With mechanize I should manage to interact with the website
through python, like supplying usernames, filling the message form, etc.
All well so far, I have installed it and tested it locally,
seems to work well.
But this piece of code:
-------------------------------------
from mechanize import Browser

b = Browser()
b.open("http://freesms.no:88/")
assert b.viewing_html()

print b.geturl()
print b.title()
-------------------------------------

Give me this error:

Traceback (most recent call last):
File "./sms_sender.py", line 11, in ?
b.open("http://freesms.no:88/")
File "/usr/lib/python2.3/site-packages/mechanize/_mechanize.py",
line 106, in open
def open(self, url, data=None): return self._open(url, data)
File "/usr/lib/python2.3/site-packages/mechanize/_mechanize.py",
line 133, in _open
self._parse_html(self.response)
File "/usr/lib/python2.3/site-packages/mechanize/_mechanize.py",
line 464, in _parse_html
for token in p.tags(*(self.urltags.keys()+["base"])):
File "/usr/lib/python2.3/site-packages/pullparser.py", line 90, in
iter_until_exception
yield fn(*args, **kwds)
File "/usr/lib/python2.3/site-packages/pullparser.py", line 194,
in get_tag
tok = self.get_token()
File "/usr/lib/python2.3/site-packages/pullparser.py", line 177,
in get_token
self.feed(data)
File "/usr/lib/python2.3/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.3/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.3/HTMLParser.py", line 239, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.3/HTMLParser.py", line 314,
in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.3/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 2, column 1365
--------------------------------------

By commenting out line 314 of HTMLParser.py and adding "return"
I manage to continue. And everything seems to work (albeit not tested
much). This is ofcourse not an acceptable solution...

How come I get this error?

Can the server software be a issue?
According to netcraft the server runs Microsoft-IIS/5.0
Thanks,

Erling


Jul 18 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Erling Ringen Elvsrud <er*******@killozapHALLO.com.invalid> writes:
[...]
HTMLParser.HTMLParseError: malformed start tag, at line 2, column 1365 [...] How come I get this error?

[...]

Bad HTML. (OK, I haven't actually looked at the HTML, but it's 100/1
that HTMLParser is at fault.)

I hope eventually to rewrite mechanize to use htmllib.HTMLParser
everywhere, and not use HTMLParser.HTMLParser. The former is less
fussy. That just means rewriting pullparser to support both classes,
I think. Not too hard (see ClientForm for how to do it -- why not
write a patch?-).

In the meantime, the best thing to do is to pre-process the HTML.
Inconvenient, I know. Also a bit inconvenient is that the only way to
do this ATM with mechanize is to write a tiny urllib2 handler class
(.http_response() is the handler method you want, which only exists in
the as-yet-unreleased Python 2.4, and in ClientCookie, which has a
near-identical interface to urllib2; mechanize uses ClientCookie, not
urllib2). See posts to the wwwsearch-general mailing lists for sample
code.

Don't mix urllib2 and ClientCookie, BTW (with the exception of classes
that exist in urllib2 but not in ClientCookie: you can use those
urllib2 classes with ClientCookie).

HTH
John
Jul 18 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.