473,320 Members | 1,841 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Python nuube needs Unicode help

HELP!
Guy who was here before me wrote a script to parse files in Python.

Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:

<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/

Clearly those "n&#227" are some non-Ascii characters, but how do I get
print to understand that?

I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"

Jan 11 '07 #1
4 4621
gh************@gmail.com schrieb:
HELP!
Guy who was here before me wrote a script to parse files in Python.

Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:

<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/

Clearly those "n&#227" are some non-Ascii characters, but how do I get
print to understand that?

I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"
Does the error happen at the

print u

line? If yes, what happens is that you try and print a unicode object.
Which means that it has to be converted (actually the right term is
encoded) to a byte-string. If you don't do that explicitely, it will be
done implicitly, using the default encoding - which is ascii.

If you have non-ascii characters, you end up with the error you see.

What to do? Use something like this:

print u.encode('utf-8')

instead.

Diez
Jan 11 '07 #2
Progress! You managed to change the error message.

File "./acc_test_script_generator.py", line 106, in loadData
print u.encode('utf-8')
AttributeError: Utterance instance has no attribute 'encode'

I'm missing somethign really obvious here, but I don't know what it
is...
Diez B. Roggisch wrote:
gh************@gmail.com schrieb:
HELP!
Guy who was here before me wrote a script to parse files in Python.

Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:

<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/

Clearly those "n&#227" are some non-Ascii characters, but how do I get
print to understand that?

I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"

Does the error happen at the

print u

line? If yes, what happens is that you try and print a unicode object.
Which means that it has to be converted (actually the right term is
encoded) to a byte-string. If you don't do that explicitely, it will be
done implicitly, using the default encoding - which is ascii.

If you have non-ascii characters, you end up with the error you see.

What to do? Use something like this:

print u.encode('utf-8')

instead.

Diez
Jan 11 '07 #3
At Thursday 11/1/2007 18:27, gh************@gmail.com wrote:
>HELP!
Guy who was here before me wrote a script to parse files in Python.

Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:

<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/
Is this part of an XML document? You should use a
true XML parser instead of doing that by hand.
>Clearly those "n&#227" are some non-Ascii characters, but how do I get
print to understand that?
Understanding how Unicode works may be very
useful: http://www.amk.ca/python/howto/unicode
>I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"
pyu = u"áéíóú"
pyprint u, repr(u)
áéíóú u'\xe1\xe9\xed\xf3\xfa'
pyprint str(u)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode
characters in position 0-4: ordin
al not in range(128)
pyprint u.encode('cp850')
áéíóú

(cp850 is my console encoding)
--
Gabriel Genellina
Softlab SRL


__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Jan 12 '07 #4
At Thursday 11/1/2007 20:42, gh************@gmail.com wrote:
Progress! You managed to change the error message.

File "./acc_test_script_generator.py", line 106, in loadData
print u.encode('utf-8')
AttributeError: Utterance instance has no attribute 'encode'

I'm missing somethign really obvious here, but I don't know what it
is...
Then you're not "printing a line from a file we are parsing", which
should be a string or unicode object. You're printing some
"Utterance" instance; probably it has a __str__ method, and there,
you're mixing unicode+strings.
--
Gabriel Genellina
Softlab SRL


__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Jan 12 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Posadas, Dennis | last post by:
I don't want to have to compile python, but I need one ready to support unicode that includes CJK. Dennis
2
by: riek | last post by:
Hello, I am using pymssql (http://pymssql.sourceforge.net/) to insert data from a web-frontend (encoded in utf-8) into fields of type nvarchar of an MS-SQL Server 2000. The problem is, ms-sql...
10
by: Larry Hastings | last post by:
I'm an indie shareware Windows game developer. In indie shareware game development, download size is terribly important; conventional wisdom holds that--even today--your download should be 5MB or...
28
by: john_sips_tea | last post by:
Just tried Ruby over the past two days. I won't bore you with the reasons I didn't like it, however one thing really struck me about it that I think we (the Python community) can learn from. ...
1
by: Max Wilson | last post by:
Hi, Has anyone here built Boost.Python modules under MinGW? I'm trying to build the Boost.Python tutorial under MinGW and getting an error that says it depends on MSVC, which puzzles me because...
7
by: gheissenberger | last post by:
HELP! Guy who was here before me wrote a script to parse files in Python. Includes line: print u where u is a line from a file we are parsing. However, we have started recieving data from...
2
by: geniuskanwal | last post by:
Before I begin to explain my problem, I just want to say that I can do the following two things: 1. Using Perl, connect to a MS Access Databse Table and perform the required operations.(Database...
1
by: manos | last post by:
Hello, I'm working on an open source JS version of WAX (Writing API for XML). Most unit tests work , but I need a regexp that validates strings are proper XML names . I suck at regexps :-/ ...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.