HELP!
Guy who was here before me wrote a script to parse files in Python.
Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:
<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/
Clearly those "nã" are some non-Ascii characters, but how do I get
print to understand that?
I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)" 4 4621 gh************@gmail.com schrieb:
HELP!
Guy who was here before me wrote a script to parse files in Python.
Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:
<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/
Clearly those "nã" are some non-Ascii characters, but how do I get
print to understand that?
I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"
Does the error happen at the
print u
line? If yes, what happens is that you try and print a unicode object.
Which means that it has to be converted (actually the right term is
encoded) to a byte-string. If you don't do that explicitely, it will be
done implicitly, using the default encoding - which is ascii.
If you have non-ascii characters, you end up with the error you see.
What to do? Use something like this:
print u.encode('utf-8')
instead.
Diez
Progress! You managed to change the error message.
File "./acc_test_script_generator.py", line 106, in loadData
print u.encode('utf-8')
AttributeError: Utterance instance has no attribute 'encode'
I'm missing somethign really obvious here, but I don't know what it
is...
Diez B. Roggisch wrote:
gh************@gmail.com schrieb:
HELP!
Guy who was here before me wrote a script to parse files in Python.
Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:
<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/
Clearly those "nã" are some non-Ascii characters, but how do I get
print to understand that?
I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"
Does the error happen at the
print u
line? If yes, what happens is that you try and print a unicode object.
Which means that it has to be converted (actually the right term is
encoded) to a byte-string. If you don't do that explicitely, it will be
done implicitly, using the default encoding - which is ascii.
If you have non-ascii characters, you end up with the error you see.
What to do? Use something like this:
print u.encode('utf-8')
instead.
Diez
At Thursday 11/1/2007 18:27, gh************@gmail.com wrote:
>HELP! Guy who was here before me wrote a script to parse files in Python.
Includes line: print u where u is a line from a file we are parsing. However, we have started recieving data from Brazil. If I open file to parse in VI, looks like:
<Utt id="3" transcribe="yes" audioRoot="A1" audio="313-20070102144528.wav" grammarSet="G3" rawText="não" recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0" transcribedText="não" parsableText="não"/
Is this part of an XML document? You should use a
true XML parser instead of doing that by hand.
>Clearly those "nã" are some non-Ascii characters, but how do I get print to understand that?
Understanding how Unicode works may be very
useful: http://www.amk.ca/python/howto/unicode
>I keep getting: "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in position 40:
ordinal not in range(128)"
pyu = u"áéíóú"
pyprint u, repr(u)
áéíóú u'\xe1\xe9\xed\xf3\xfa'
pyprint str(u)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode
characters in position 0-4: ordin
al not in range(128)
pyprint u.encode('cp850')
áéíóú
(cp850 is my console encoding)
--
Gabriel Genellina
Softlab SRL
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! http://www.yahoo.com.ar/respuestas
At Thursday 11/1/2007 20:42, gh************@gmail.com wrote:
Progress! You managed to change the error message.
File "./acc_test_script_generator.py", line 106, in loadData
print u.encode('utf-8') AttributeError: Utterance instance has no attribute 'encode'
I'm missing somethign really obvious here, but I don't know what it is...
Then you're not "printing a line from a file we are parsing", which
should be a string or unicode object. You're printing some
"Utterance" instance; probably it has a __str__ method, and there,
you're mixing unicode+strings.
--
Gabriel Genellina
Softlab SRL
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! http://www.yahoo.com.ar/respuestas This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Posadas, Dennis |
last post by:
I don't want to have to compile python, but I need one ready to support
unicode that includes CJK.
Dennis
|
by: riek |
last post by:
Hello,
I am using pymssql (http://pymssql.sourceforge.net/) to insert data
from a web-frontend (encoded in utf-8) into fields of type nvarchar of
an MS-SQL Server 2000.
The problem is, ms-sql...
|
by: Larry Hastings |
last post by:
I'm an indie shareware Windows game developer. In indie shareware
game development, download size is terribly important; conventional
wisdom holds that--even today--your download should be 5MB or...
|
by: john_sips_tea |
last post by:
Just tried Ruby over the past two days. I won't bore you
with the reasons I didn't like it, however one thing really
struck me about it that I think we (the Python community)
can learn from.
...
|
by: Max Wilson |
last post by:
Hi,
Has anyone here built Boost.Python modules under MinGW? I'm trying to
build the Boost.Python tutorial under MinGW and getting an error that
says it depends on MSVC, which puzzles me because...
|
by: gheissenberger |
last post by:
HELP!
Guy who was here before me wrote a script to parse files in Python.
Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from...
|
by: geniuskanwal |
last post by:
Before I begin to explain my problem, I just want to say that I can do the following two things:
1. Using Perl, connect to a MS Access Databse Table and perform the required operations.(Database...
|
by: manos |
last post by:
Hello,
I'm working on an open source JS version of WAX (Writing API for
XML). Most unit tests work , but I need a regexp that validates
strings are proper XML names . I suck at regexps :-/
...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
| |