
July 18th, 2005, 10:56 PM
| | | unicode(obj, errors='foo') raises TypeError - bug?
This works as expected (this is on an ASCII terminal):
[color=blue][color=green][color=darkred]
>>> unicode('asdf\xff', errors='replace')[/color][/color][/color]
u'asdf\ufffd'
This does not work as I expect it to:
[color=blue][color=green][color=darkred]
>>> class C:[/color][/color][/color]
.... def __str__(self):
.... return 'asdf\xff'
....[color=blue][color=green][color=darkred]
>>> o = C()
>>> unicode(o, errors='replace')[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
Shouldn't it work the same as calling unicode(str(self), errors='replace')?
It doesn't matter what value you use for 'errors' (ignore, replace, strict);
you'll get the same TypeError.
What am I doing wrong? Is this a bug in Python? | 
July 18th, 2005, 10:56 PM
| | | Re: unicode(obj, errors='foo') raises TypeError - bug?
Mike Brown wrote:[color=blue][color=green][color=darkred]
>>>>class C:[/color][/color]
> ... def __str__(self):
> ... return 'asdf\xff'
> ...[color=green][color=darkred]
>>>>o = C()
>>>>unicode(o, errors='replace')[/color][/color]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: coercing to Unicode: need string or buffer, instance found
>[/color]
[snip][color=blue]
>
> What am I doing wrong? Is this a bug in Python?[/color]
No, this is documented behavior[1]:
"""
unicode([object[, encoding [, errors]]])
...
For objects which provide a __unicode__() method, it will call this
method without arguments to create a Unicode string. For all other
objects, the 8-bit string version or representation is requested and
then converted to a Unicode string using the codec for the default
encoding in 'strict' mode.
"""
Note that the documentation basically says that it will call str() on
your object, and then convert it in 'strict' mode. You should either
define __unicode__ or call str() manually on the object.
STeVe
[1] http://docs.python.org/lib/built-in-funcs.html | 
July 18th, 2005, 10:56 PM
| | | Re: unicode(obj, errors='foo') raises TypeError - bug?
Steven Bethard wrote:[color=blue]
> Mike Brown wrote:
>[color=green][color=darkred]
>>>>> class C:[/color]
>>
>> ... def __str__(self):
>> ... return 'asdf\xff'
>> ...
>>[color=darkred]
>>>>> o = C()
>>>>> unicode(o, errors='replace')[/color]
>>
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in ?
>> TypeError: coercing to Unicode: need string or buffer, instance found
>>[/color]
> [snip]
>[color=green]
>>
>> What am I doing wrong? Is this a bug in Python?[/color]
>
>
> No, this is documented behavior[1]:
>
> """
> unicode([object[, encoding [, errors]]])
> ...
> For objects which provide a __unicode__() method, it will call this
> method without arguments to create a Unicode string. For all other
> objects, the 8-bit string version or representation is requested and
> then converted to a Unicode string using the codec for the default
> encoding in 'strict' mode.
> """
>
> Note that the documentation basically says that it will call str() on
> your object, and then convert it in 'strict' mode. You should either
> define __unicode__ or call str() manually on the object.[/color]
Not a bug, I guess, since it is documented, but it seems a bit bizarre that the encoding and errors
parameters are ignored when object does not have a __unicode__ method.
Kent
[color=blue]
>
> STeVe
>
> [1] http://docs.python.org/lib/built-in-funcs.html[/color] | 
July 18th, 2005, 10:57 PM
| | | Re: unicode(obj, errors='foo') raises TypeError - bug?
Kent Johnson wrote:[color=blue]
> Steven Bethard wrote:
>[color=green]
>> No, this is documented behavior[1]:
>>
>> """
>> unicode([object[, encoding [, errors]]])
>> ...
>> For objects which provide a __unicode__() method, it will call
>> this method without arguments to create a Unicode string. For all
>> other objects, the 8-bit string version or representation is requested
>> and then converted to a Unicode string using the codec for the default
>> encoding in 'strict' mode.
>> """
>>
>> Note that the documentation basically says that it will call str() on
>> your object, and then convert it in 'strict' mode. You should either
>> define __unicode__ or call str() manually on the object.[/color]
>
> Not a bug, I guess, since it is documented, but it seems a bit bizarre
> that the encoding and errors parameters are ignored when object does not
> have a __unicode__ method.[/color]
Yeah, I agree it's weird. I suspect if someone supplied a patch for
this behavior it would be accepted -- I don't think this should break
backwards compatibility (much).
STeVe | 
July 18th, 2005, 10:57 PM
| | | Re: unicode(obj, errors='foo') raises TypeError - bug?
Steven Bethard wrote:[color=blue]
> Yeah, I agree it's weird. I suspect if someone supplied a patch for
> this behavior it would be accepted -- I don't think this should break
> backwards compatibility (much).[/color]
Notice that the "right" thing to do would be to pass encoding and errors
to __unicode__. If the string object needs to be told what encoding it
is in, why not any other other object as well?
Unfortunately, this apparently was overlooked, and now it is too late
to change it (or else the existing __unicode__ methods would all break
if they suddenly get an encoding argument).
As for using encoding and errors on the result of str() conversion
of the object: how can the caller know what encoding the result of
str() is in, reasonably? It seems more correct to assume that the
str() result in in the system default encoding.
If you can follow so far(*): if it is the right thing to ignore the
encoding argument for the case that the object was str() converted,
why should the errors argument not be ignored? It is inconsistent
to ignore one parameter to the decoding but not the other.
Regards,
Martin
(*) I admit that the reasoning for ignoring the encoding is
somewhat flawed. There are some types (e.g. numbers) where
str() always uses the system encoding (i.e. ASCII - actually,
it always uses ASCII, no matter what the system encoding is).
There may be types where the encoding of the str() result
is not ASCII, and the caller happens to know what it is,
but I'm not aware of any such type. | 
July 18th, 2005, 10:57 PM
| | | Re: unicode(obj, errors='foo') raises TypeError - bug?
Martin v. Löwis wrote:[color=blue]
> Steven Bethard wrote:
>[color=green]
>> Yeah, I agree it's weird. I suspect if someone supplied a patch for
>> this behavior it would be accepted -- I don't think this should break
>> backwards compatibility (much).[/color]
>
>
> Notice that the "right" thing to do would be to pass encoding and errors
> to __unicode__. If the string object needs to be told what encoding it
> is in, why not any other other object as well?
>
> Unfortunately, this apparently was overlooked, and now it is too late
> to change it (or else the existing __unicode__ methods would all break
> if they suddenly get an encoding argument).[/color]
Could this be handled with a try / except in unicode()? Something like this:[color=blue][color=green][color=darkred]
>>> class A:[/color][/color][/color]
... def u(self): # __unicode__ with no args
... print 'A.u()'
...[color=blue][color=green][color=darkred]
>>> class B:[/color][/color][/color]
... def u(self, enc, err): # __unicode__ with two args
... print 'B.u()', enc, err
...[color=blue][color=green][color=darkred]
>>> def convert(obj, enc='ascii', err='strict'): # unicode() function delegates to u()[/color][/color][/color]
... try:
... obj.u(enc, err)
... except TypeError:
... obj.u()
...[color=blue][color=green][color=darkred]
>>> convert(a)[/color][/color][/color]
A.u()[color=blue][color=green][color=darkred]
>>> convert(a, 'utf-8', 'replace')[/color][/color][/color]
A.u()[color=blue][color=green][color=darkred]
>>> convert(b)[/color][/color][/color]
B.u() ascii strict[color=blue][color=green][color=darkred]
>>> convert(b, 'utf-8', 'replace')[/color][/color][/color]
B.u() utf-8 replace
[color=blue]
>
> As for using encoding and errors on the result of str() conversion
> of the object: how can the caller know what encoding the result of
> str() is in, reasonably?[/color]
The same way that the caller will know the encoding of a byte string, or of the result of
str(some_object) - in my experience, usually by careful detective work on the source of the string
or object followed by attempts to better understand and control the encoding used throughout the
application.
It seems more correct to assume that the[color=blue]
> str() result in in the system default encoding.[/color]
To assume that in absence of any guidance, sure, that is consistent. But to ignore the guidance the
programmer attempts to provide?
One thing that hasn't been pointed out in this thread yet is that the OP could just define
__unicode__() on his class to do what he wants...
Kent | 
July 18th, 2005, 10:57 PM
| | | Re: unicode(obj, errors='foo') raises TypeError - bug?
Kent Johnson wrote:[color=blue]
> Could this be handled with a try / except in unicode()? Something like
> this:[/color]
Perhaps. However, this would cause a significant performance hit, and
possbibly undesired side effects. So due process would require that the
interface of __unicode__ first, and then change the actual calls to it.
[color=blue]
> One thing that hasn't been pointed out in this thread yet is that the OP
> could just define __unicode__() on his class to do what he wants...[/color]
Actually, Steven Bethard wrote "You should either define __unicode__ or
call str() manually on the object."
Regards,
Martin |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over network members.
|