473,406 Members | 2,843 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Once again a unicode question

Hello,

I'm puzzled by this test I made while trying to transform a page in
html to plain text. Because I cannot send unicode to feed, nor str so
how can I do this ?

..nicoe@smarties:~$ python2.4
..Python 2.4.1c2 (#2, Mar 19 2005, 01:04:19)
..[GCC 3.3.5 (Debian 1:3.3.5-12)] on linux2
..Type "help", "copyright", "credits" or "license" for more information.
..>>> import formatter
..>>> import htmllib
..>>> html2txt = htmllib.HTMLParser(formatter.AbstractFormatter(for matter.DumbWriter()))
..>>> html2txt.feed(u'D\xe9but')
..Traceback (most recent call last):
.. File "<stdin>", line 1, in ?
.. File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
.. self.goahead(0)
.. File "/usr/lib/python2.4/sgmllib.py", line 120, in goahead
.. self.handle_data(rawdata[i:j])
.. File "/usr/lib/python2.4/htmllib.py", line 65, in handle_data
.. self.formatter.add_flowing_data(data)
.. File "/usr/lib/python2.4/formatter.py", line 197, in add_flowing_data
.. self.writer.send_flowing_data(data)
.. File "/usr/lib/python2.4/formatter.py", line 421, in send_flowing_data
.. write(word)
..UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
..>>> html2txt.feed(u'D\xe9but'.encode('latin1'))
..Traceback (most recent call last):
.. File "<stdin>", line 1, in ?
.. File "/usr/lib/python2.4/sgmllib.py", line 94, in feed
.. self.rawdata = self.rawdata + data
..UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 1: ordinal not in range(128)
..>>> html2txt.feed('Début')
..Traceback (most recent call last):
.. File "<stdin>", line 1, in ?
.. File "/usr/lib/python2.4/sgmllib.py", line 94, in feed
.. self.rawdata = self.rawdata + data
..UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
..>>>

--
(°> Nicolas Évrard
/ ) Liège - Belgique
^^
Jul 18 '05 #1
2 1681
Nicolas Evrard wrote:
Hello,

I'm puzzled by this test I made while trying to transform a page in
html to plain text. Because I cannot send unicode to feed, nor str so
how can I do this ?


Seems like the parser is in the broken state after the first exception.
Feed only binary strings to it.

Serge.
Jul 18 '05 #2
* Serge Orlov [23:45 26/03/05 CET]:
Nicolas Evrard wrote:
Hello,

I'm puzzled by this test I made while trying to transform a page in
html to plain text. Because I cannot send unicode to feed, nor str so
how can I do this ?


Seems like the parser is in the broken state after the first exception.
Feed only binary strings to it.


That was that thank you very much.

--
(°> Nicolas Évrard
/ ) Liège - Belgique
^^
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: François Pinard | last post by:
Hi, people. I hope someone would like to enlighten me. For any application handling Unicode internally, I'm usually careful at properly converting those Unicode strings into 8-bit strings before...
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
0
by: deloford | last post by:
Hi This is going to be a question for anyone who is an expert in C# Text Encoding. My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.