473,395 Members | 1,919 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Text Encoding - Like Wrestling Oiled Pigs

So I've got a problem.

I've got a database of information that is encoded in Windows/CP1252.
What I want to do is dump this to a UTF-8 encoded text file (a RSS
feed).

While the overall problem seems to be related to the conversion, the
only error I'm getting is a

"UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position
163: ordinal not in range(128)"

So somewhere I'm missing an implicit conversion to ASCII which is
completely aggrivating my brain.

So, what fundamental issue am I completely overlooking?

Code follows.

def GenerateNoticeRSS():
output = codecs.open(FILEBASE + 'noticeboard.xml','w','utf-8')
conn = psycopg.connect(DSN)
curs = conn.cursor()
sql_query = "select story.subject as subject, story.content as
content, story.summary as summary, story.sid as sid, posts.bid as
board, posts.date_to_publish as date from story$
curs.execute(sql_query)
rows = curs.fetchall()
output.write('<?xml version="1.0" encoding="utf-8"?>\n')
output.write('<rss version="2.0">\n')

output.write('<channel>\n')
output.write('<title>U of L Notice Board</title>\n')
output.write('<link>http://www.uleth.ca/notice</link>\n')
output.write('<description>University of Lethbridge News and
Events</description>\n')
for each in rows:


output.write('<item>\n')
output.write('<title>' + rssTitlePrefix(each[4]) +
unicode(each[0]) + '</title>\n')
output.write('<link>http://www.uleth.ca/notice/display.html?b=' +
str(each[4]) + '&amp;s=' + str(each[3]) + '</link>\n')
output.write('<guid>http://www.uleth.ca/notice/display.html?b=' +
str(each[4]) + '&amp;s=' + str(each[3]) + '</guid>\n')
descript = each[2] + '<BR><BR>' + each[1]

output.write(u'<description>' + unicode(descript) +
u'</description>\n') # this is the line that causes the error.
output.write('</item>\n')
output.write('</channel>\n')
output.write('</rss>\n')
output.close()
return 0

Dec 8 '06 #1
1 1521

ap******@gmail.com wrote:
So I've got a problem.

I've got a database of information that is encoded in Windows/CP1252.
What I want to do is dump this to a UTF-8 encoded text file (a RSS
feed).

While the overall problem seems to be related to the conversion, the
only error I'm getting is a

"UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position
163: ordinal not in range(128)"

So somewhere I'm missing an implicit conversion to ASCII which is
completely aggrivating my brain.

So, what fundamental issue am I completely overlooking?
That nowhere in your *code* do you mention "I've got a database of
information that is encoded in Windows/CP1252". This is not recorded
anywhere in your database. Python is fantastic, but we don't expect a
readauthorsmind() function until Python 4000 :-)
>
Code follows.
[snip]
>
sql_query = "select story.subject as subject, story.content as
content, story.summary as summary, story.sid as sid, posts.bid as
board, posts.date_to_publish as date from story$
The above line has been mangled ... fortunately it doesn't affect the
diagnostic outcome.

[snip]
>

output.write(u'<description>' + unicode(descript) +
u'</description>\n') # this is the line that causes the error.
What is happening is that unicode(descript) has not been told what
encoding to use to decode your "Windows/CP1252" text, and it uses the
default encoding, "ascii". You need to put unicode(descript, 'cp1252').

Cheers,
John

Dec 8 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Greg Schmidt | last post by:
I'm wrestling with how to align images with text. The particular effect I'm trying to achieve at the moment is to place a heading to the right of an image, such that the image and text are...
4
by: H Lee | last post by:
Hi, I'm an XML newbie, and not sure if this is the appropriate newsgroup to post my question, so feel free to suggest other newgroups where I should post this message if this is the case. I'm...
10
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
4
by: Ken Soenen | last post by:
The code below illustrates my problem which is: I'm trying to access the TEXT from TextBox1 which is on Form1. Line "aa = Form1.TextBox1.Text" produces the error--Reference to a non-shared member...
3
by: Flix | last post by:
Hello. What I want to do is simple: correctly reading a text file whose encoding is not known (it can be Ascii,UTF7,UTF8 or Unicode). I'm thinking of something like that: 1) Read the text...
4
by: George | last post by:
Hi, I am puzzled by the following and seeking some assistance to help me understand what happened. I have very limited encoding knowledge. Our SAP system writes out a text file which includes...
11
by: pardesiya | last post by:
Friends, I am having trouble displaying Japanese text within a textbox (or anywhere else) in an aspx page with .net 2.0 framework. Initial default text in Japanese displays perfectly but when I...
3
by: WebNewbie | last post by:
Hi, please help I've been wrestling with this for a very long time and its not working. I'm trying to display text files from a database when someone selects one or more list items of my listbox. The...
7
by: Martin Marcher | last post by:
Hello, having worked quite a bit with python in the last months (some Java before, and some C++ before that) I was very impressed by an idea the Java people had. Explanation: the JSRs define...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.