473,839 Members | 1,400 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

unicode(obj, errors='foo') raises TypeError - bug?

This works as expected (this is on an ASCII terminal):
unicode('asdf\x ff', errors='replace ') u'asdf\ufffd'
This does not work as I expect it to:
class C: .... def __str__(self):
.... return 'asdf\xff'
.... o = C()
unicode(o, errors='replace ')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found

Shouldn't it work the same as calling unicode(str(sel f), errors='replace ')?

It doesn't matter what value you use for 'errors' (ignore, replace, strict);
you'll get the same TypeError.

What am I doing wrong? Is this a bug in Python?
Jul 18 '05 #1
6 2216
Mike Brown wrote:
class C: ... def __str__(self):
... return 'asdf\xff'
...o = C()
unicode(o , errors='replace ')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found

[snip]
What am I doing wrong? Is this a bug in Python?


No, this is documented behavior[1]:

"""
unicode([object[, encoding [, errors]]])
...
For objects which provide a __unicode__() method, it will call this
method without arguments to create a Unicode string. For all other
objects, the 8-bit string version or representation is requested and
then converted to a Unicode string using the codec for the default
encoding in 'strict' mode.
"""

Note that the documentation basically says that it will call str() on
your object, and then convert it in 'strict' mode. You should either
define __unicode__ or call str() manually on the object.

STeVe

[1] http://docs.python.org/lib/built-in-funcs.html
Jul 18 '05 #2
Steven Bethard wrote:
Mike Brown wrote:
> class C:
... def __str__(self):
... return 'asdf\xff'
...
> o = C()
> unicode(o, errors='replace ')


Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found

[snip]

What am I doing wrong? Is this a bug in Python?

No, this is documented behavior[1]:

"""
unicode([object[, encoding [, errors]]])
...
For objects which provide a __unicode__() method, it will call this
method without arguments to create a Unicode string. For all other
objects, the 8-bit string version or representation is requested and
then converted to a Unicode string using the codec for the default
encoding in 'strict' mode.
"""

Note that the documentation basically says that it will call str() on
your object, and then convert it in 'strict' mode. You should either
define __unicode__ or call str() manually on the object.


Not a bug, I guess, since it is documented, but it seems a bit bizarre that the encoding and errors
parameters are ignored when object does not have a __unicode__ method.

Kent

STeVe

[1] http://docs.python.org/lib/built-in-funcs.html

Jul 18 '05 #3
Kent Johnson wrote:
Steven Bethard wrote:
No, this is documented behavior[1]:

"""
unicode([object[, encoding [, errors]]])
...
For objects which provide a __unicode__() method, it will call
this method without arguments to create a Unicode string. For all
other objects, the 8-bit string version or representation is requested
and then converted to a Unicode string using the codec for the default
encoding in 'strict' mode.
"""

Note that the documentation basically says that it will call str() on
your object, and then convert it in 'strict' mode. You should either
define __unicode__ or call str() manually on the object.


Not a bug, I guess, since it is documented, but it seems a bit bizarre
that the encoding and errors parameters are ignored when object does not
have a __unicode__ method.


Yeah, I agree it's weird. I suspect if someone supplied a patch for
this behavior it would be accepted -- I don't think this should break
backwards compatibility (much).

STeVe
Jul 18 '05 #4
Steven Bethard wrote:
Yeah, I agree it's weird. I suspect if someone supplied a patch for
this behavior it would be accepted -- I don't think this should break
backwards compatibility (much).


Notice that the "right" thing to do would be to pass encoding and errors
to __unicode__. If the string object needs to be told what encoding it
is in, why not any other other object as well?

Unfortunately, this apparently was overlooked, and now it is too late
to change it (or else the existing __unicode__ methods would all break
if they suddenly get an encoding argument).

As for using encoding and errors on the result of str() conversion
of the object: how can the caller know what encoding the result of
str() is in, reasonably? It seems more correct to assume that the
str() result in in the system default encoding.

If you can follow so far(*): if it is the right thing to ignore the
encoding argument for the case that the object was str() converted,
why should the errors argument not be ignored? It is inconsistent
to ignore one parameter to the decoding but not the other.

Regards,
Martin

(*) I admit that the reasoning for ignoring the encoding is
somewhat flawed. There are some types (e.g. numbers) where
str() always uses the system encoding (i.e. ASCII - actually,
it always uses ASCII, no matter what the system encoding is).
There may be types where the encoding of the str() result
is not ASCII, and the caller happens to know what it is,
but I'm not aware of any such type.
Jul 18 '05 #5
Martin v. Löwis wrote:
Steven Bethard wrote:
Yeah, I agree it's weird. I suspect if someone supplied a patch for
this behavior it would be accepted -- I don't think this should break
backwards compatibility (much).

Notice that the "right" thing to do would be to pass encoding and errors
to __unicode__. If the string object needs to be told what encoding it
is in, why not any other other object as well?

Unfortunately, this apparently was overlooked, and now it is too late
to change it (or else the existing __unicode__ methods would all break
if they suddenly get an encoding argument).


Could this be handled with a try / except in unicode()? Something like this:
class A: ... def u(self): # __unicode__ with no args
... print 'A.u()'
... class B: ... def u(self, enc, err): # __unicode__ with two args
... print 'B.u()', enc, err
... def convert(obj, enc='ascii', err='strict'): # unicode() function delegates to u() ... try:
... obj.u(enc, err)
... except TypeError:
... obj.u()
... convert(a) A.u() convert(a, 'utf-8', 'replace') A.u() convert(b) B.u() ascii strict convert(b, 'utf-8', 'replace')

B.u() utf-8 replace

As for using encoding and errors on the result of str() conversion
of the object: how can the caller know what encoding the result of
str() is in, reasonably?
The same way that the caller will know the encoding of a byte string, or of the result of
str(some_object ) - in my experience, usually by careful detective work on the source of the string
or object followed by attempts to better understand and control the encoding used throughout the
application.

It seems more correct to assume that the str() result in in the system default encoding.


To assume that in absence of any guidance, sure, that is consistent. But to ignore the guidance the
programmer attempts to provide?
One thing that hasn't been pointed out in this thread yet is that the OP could just define
__unicode__() on his class to do what he wants...

Kent
Jul 18 '05 #6
Kent Johnson wrote:
Could this be handled with a try / except in unicode()? Something like
this:
Perhaps. However, this would cause a significant performance hit, and
possbibly undesired side effects. So due process would require that the
interface of __unicode__ first, and then change the actual calls to it.
One thing that hasn't been pointed out in this thread yet is that the OP
could just define __unicode__() on his class to do what he wants...


Actually, Steven Bethard wrote "You should either define __unicode__ or
call str() manually on the object."

Regards,
Martin
Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
3647
by: calfdog | last post by:
Hello, I have been doing some experimenting with Automating Internet Explorer for testing. I would like to read from Excel and use the data to set the values of my text boxes on my web forms. I have been running into problems with Unicode as I can see others have too. I have seemed to find a simple solution when printing
0
1575
by: Matt Price | last post by:
Hello, I'm a python (& xml, & unicode!) newbie working on an interface to a bibliographic reference server (refdb); I'm running into some encoding problems & am ifnding the plethora of tools a little confusing. Here is the basic situation: I connect to the server and receive an xml document whose content is a bibliographic dataset. The document can be encoded in two ways: ISO-8859-1 or unicode. My program simply takes the document...
4
2688
by: Ivan Voras | last post by:
I have a string fetched from database, in iso8859-2, with 8bit characters, and I'm trying to send it over the network, via a socket: File "E:\Python24\lib\socket.py", line 249, in write data = str(data) # XXX Should really reject non-string non-buffers UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in position 123: ordinal not in range(128) The other end knows it should expect this encoding, so how to send it?
4
6076
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3 script that grabs some web pages from the web, regex parse the data and stores it localy to xml file for further use.. at first i had no problem using python minidom and everything concerning
1
3118
by: Bell, Kevin | last post by:
I'm pulling a range of cells from Excel into a list and the data in Excel is a number or possibly some text like an asterisk. Each member of the list is a com object (I think) and I'm converting them to integers (or to None if not numberic) but my method seems very silly. Is this the best way to go about it? It does exactly what it should, but it seems like a lot of extra BS to convert my object to a string to a float to a rounded...
7
5114
by: Me | last post by:
I am trying to compile some code Ive gotten from another and I know I need a 16 bit unicode string, for he passes the pointer to functions that take a (uint16 *), however there are initializations that look like this. typedef unsigned short int ucs2_char; .... ....
1
4486
by: Dennis Benzinger | last post by:
Is there a library with a strftime replacement which supports Unicode format strings? Bye, Dennis
7
3519
by: aine_canby | last post by:
Hi, Im totally new to Python so please bare with me. Data is entered into my program using the folling code - str = raw_input(command) words = str.split() for word in words:
5
3038
by: Holger Joukl | last post by:
Hi there, I consider the behaviour of unicode() inconvenient wrt to conversion of non-string arguments. While you can do: u'17.3' you cannot do:
0
9697
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10293
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9426
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7828
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7017
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5682
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5866
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4484
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3134
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.