469,625 Members | 1,123 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,625 developers. It's quick & easy.

BeautifulSoup error

Hi, all,

This piece of code used to work well. i guess the error occurs after
some upgrade.
import urllib
from BeautifulSoup import BeautifulSoup
url = 'http://www.google.com'
port = urllib.urlopen(url).read()
soup = BeautifulSoup()
soup.feed(port) Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sgmllib.py", line 94, in feed
self.rawdata = self.rawdata + data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: ordinal not in range(128)


Any ideas to solve this?

version info:

Python 2.3.5 (#2, Mar 7 2006, 12:43:17)
[GCC 4.0.3 20060212 (prerelease) (Debian 4.0.2-9)] on linux2

python-beautifulsoup: 3.0.1-1

--
William

"I'd love to go out with you, but I have to floss my cat."
Jun 16 '06 #1
4 2558
William Xu wrote:
Hi, all,

This piece of code used to work well. i guess the error occurs after
some upgrade.
import urllib
from BeautifulSoup import BeautifulSoup
url = 'http://www.google.com'
port = urllib.urlopen(url).read()
soup = BeautifulSoup()
soup.feed(port) Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sgmllib.py", line 94, in feed
self.rawdata = self.rawdata + data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: ordinal not in range(128)

Any ideas to solve this?


According to the documentation
<http://www.crummy.com/software/BeautifulSoup/documentation.html>
chapter "Beautiful Soup Gives You Unicode, Dammit" Beautiful Soup fully
supports unicode so it's probably a bug.
version info:

Python 2.3.5 (#2, Mar 7 2006, 12:43:17)
[GCC 4.0.3 20060212 (prerelease) (Debian 4.0.2-9)] on linux2

python-beautifulsoup: 3.0.1-1


Upgrading python-beautifulsoup is a good idea, since there were two bug
fix releases after 3.0.1

Jun 16 '06 #2
"Serge Orlov" <Se*********@gmail.com> writes:

[...]
Upgrading python-beautifulsoup is a good idea, since there were two bug
fix releases after 3.0.1


I just downloaded latest version 3.0.3 from its homepage, seems it still
has the same problem.

--
William

PL/I -- "the fatal disease" -- belongs more to the problem set than to the
solution set.
-- Edsger W. Dijkstra, SIGPLAN Notices, Volume 17, Number 5
Jun 16 '06 #3
William Xu wrote:
Hi, all,

This piece of code used to work well. i guess the error occurs after
some upgrade.
import urllib
from BeautifulSoup import BeautifulSoup
url = 'http://www.google.com'
port = urllib.urlopen(url).read()
soup = BeautifulSoup()
soup.feed(port)

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/sgmllib.py", line 94, in feed


Look at the traceback: you're not calling BeautifulSoup module! In
fact, there is no feed method in the current BeautifulSoup
documentation. Maybe it used to work well, but now it's definitely
going to fail. As I understand documentation you need to write

soup = BeautifulSoup(port)

Jun 16 '06 #4
"Serge Orlov" <Se*********@gmail.com> writes:

[...]
Look at the traceback: you're not calling BeautifulSoup module! In
fact, there is no feed method in the current BeautifulSoup
documentation. Maybe it used to work well, but now it's definitely
going to fail. As I understand documentation you need to write

soup = BeautifulSoup(port)


Ah, yes ! Things change ! :-)

BeautifulSoup feed() method used to exist before 3.0.0, and was left
over to SGMLParser later. As explained in the changlog,

http://www.crummy.com/software/Beaut...CHANGELOG.html

Release 3.0.0 (2006/05/28), "Who would not give all else for two p"

Beautiful Soup no longer implements a feed method. You need to pass a
string or a filehandle into the soup constructor, not with feed after
the soup has been created. There is still a feed method, but it's the
feed method implemented by SGMLParser and calling it will bypass
Beautiful Soup and cause problems.

Thanks for all the help !

--
William

Thrashing is just virtual crashing.
Jun 16 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Steve Young | last post: by
1 post views Thread by ye juan | last post: by
7 posts views Thread by Gonzillaaa | last post: by
7 posts views Thread by John Nagle | last post: by
9 posts views Thread by Mizipzor | last post: by
11 posts views Thread by John Nagle | last post: by
5 posts views Thread by Larry Bates | last post: by
3 posts views Thread by bsagert | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.