473,386 Members | 1,621 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Unicode and exception strings

Assuming an exception like:

x = ValueError(u'\xf8')

AFAIK the common way to get a string representation of the exception
as a message is to simply cast it to a string: str(x). This will
result in an "UnicodeError: ASCII encoding error: ordinal not in
range(128)".

The common way to fix this is with something like
u'\xf8'.encode("ascii", 'replace'). However I can't find any way to
tell ValueErrors __str__ method which encoding to use.

Is it possible to solve this without using sys.setdefaultencoding()
from sitecustomize?

Regards,
Rune Frøysa
Jul 18 '05 #1
4 3805
On 09 Jan 2004 13:18:39 +0100, Rune Froysa <ru*********@usit.uio.no>
wrote:
Assuming an exception like:

x = ValueError(u'\xf8')

AFAIK the common way to get a string representation of the exception
as a message is to simply cast it to a string: str(x). This will
result in an "UnicodeError: ASCII encoding error: ordinal not in
range(128)".

The common way to fix this is with something like
u'\xf8'.encode("ascii", 'replace'). However I can't find any way to
tell ValueErrors __str__ method which encoding to use.


Rune, I'm not understanding what your problem is.

Is there any reason you're not using, for example, just repr(u'\xf8')?

In one program I have that occasionally runs into a line that includes
some (UTF-8) Unicode-encoded Chinese characters , I have something like
this:

try:
_display_text = _display_text + "%s\n" % line
except UnicodeDecodeError:
try:
# decode those UTF8 nasties
_display_text = _display_text + "%s\n" % line.decode('utf-8'))
except UnicodeDecodeError:
# if that still doesn't work, punt
# (I don't think we'll ever reach this, but just in case)
_display_text = _display_text + "%s\n" % repr(line)

I don't know if this will help you or not.

Jul 18 '05 #2
On Fri, 09 Jan 2004 19:44:21 GMT, Terry Carroll <ca*****@tjc.com> wrote:
In one program I have that occasionally runs into a line that includes
some (UTF-8) Unicode-encoded Chinese characters , I have something like
this:
Sorry, a stray parenthesis crept in here (since this is a pared down
version of my actual code). It should read:
try:
_display_text = _display_text + "%s\n" % line
except UnicodeDecodeError:
try:
# decode those UTF8 nasties
_display_text = _display_text + "%s\n" % line.decode('utf-8')
except UnicodeDecodeError:
# if that still doesn't work, punt
# (I don't think we'll ever reach this, but just in case)
_display_text = _display_text + "%s\n" % repr(line)
I don't know if this will help you or not.


Jul 18 '05 #3
Terry Carroll <ca*****@tjc.com> writes:
On 09 Jan 2004 13:18:39 +0100, Rune Froysa <ru*********@usit.uio.no>
wrote:
Assuming an exception like:

x = ValueError(u'\xf8')

AFAIK the common way to get a string representation of the exception
as a message is to simply cast it to a string: str(x). This will
result in an "UnicodeError: ASCII encoding error: ordinal not in
range(128)".

The common way to fix this is with something like
u'\xf8'.encode("ascii", 'replace'). However I can't find any way to
tell ValueErrors __str__ method which encoding to use.
Rune, I'm not understanding what your problem is.

Is there any reason you're not using, for example, just repr(u'\xf8')?


The problem is that I have little control over the message string that
is passed to ValueError(). All my program knows is that it has caught
one such error, and that its message string is in unicode format. I
need to access the message string (for logging etc.).
_display_text = _display_text + "%s\n" % line.decode('utf-8'))


This does not work, as I'm unable to get at the 'line', which is
stored internally in the ValueError class (and generated by its __str_
method).

Regards,
Rune Frøysa
Jul 18 '05 #4
On Wed, 14 Jan 2004 01:32:36 GMT, Terry Carroll <ca*****@tjc.com> wrote:
You can try to extract it as above, and then decode it with the codecs
module, but if it's only the first byte, it won't decode correctly:
import codecs
d = codecs.getdecoder('utf-8')
x.args[0]u'\xf8' d.decode(x.args[0])Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'builtin_function_or_method' object has no attribute
'decode'
Oops. Copy-and-pasted the wrong line here. Let's try that again:
x = ValueError(u'\xf8')
import codecs
d = codecs.getdecoder('utf-8')
d(x.args[0]) Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in
position 0:
ordinal not in range(128)


*That's* the exception I was trying to show, not the AttributeError you
get when you use the decoder wrongly!

Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...
7
by: Robert | last post by:
Hello, I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this: I want to use win32-fuctions like win32ui.MessageBox, listctrl.InsertItem ..... to get unicode strings on the...
24
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...
14
by: Dennis Benzinger | last post by:
Hi! The following program in an UTF-8 encoded file: # -*- coding: UTF-8 -*- FIELDS = ("Fächer", ) FROZEN_FIELDS = frozenset(FIELDS) FIELDS_SET = set(FIELDS)
15
by: luc.saffre | last post by:
Hello, here is something that surprises me. #coding: iso-8859-1 s1=u"Frau Müller machte große Augen" s2="Frau Müller machte große Augen" if s1 == s2: pass
13
by: gabor | last post by:
hi, from the documentation (http://docs.python.org/lib/os-file-dir.html) for os.listdir: "On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode...
9
by: Jim | last post by:
Hello, I'm trying to write exception-handling code that is OK in the presence of unicode error messages. I seem to have gotten all mixed up and I'd appreciate any un-mixing that anyone can...
7
by: 7stud | last post by:
Based on this example and the error: ----- u_str = u"abc\u9999" print u_str UnicodeEncodeError: 'ascii' codec can't encode character u'\u9999' in position 3: ordinal not in range(128) ------
7
by: Robert Latest | last post by:
Here's a test snippet... import sys for k in sys.stdin: print '%s -%s' % (k, k.decode('iso-8859-1')) ....but it barfs when actually fed with iso8859-1 characters. How is this done right? ...
1
by: nkarkhan | last post by:
Hello, I have a list of strings, some of the strings might be unicode. I am trying to a .join operation on the list and the .join raises a unicode exception. I am looking for ways to get around...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.