469,902 Members | 1,949 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,902 developers. It's quick & easy.

Re: str(bytes) in Python 3.0

Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

Christian

Jun 27 '08 #1
16 2810
On 12 Apr., 14:44, Christian Heimes <li...@cheimes.dewrote:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?

Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

Christian
And making an utf-8 encoding default is not possible without writing a
new function?
Jun 27 '08 #2
And making an utf-8 encoding default is not possible without writing a
new function?
There is no default encoding anymore in Python 3. This is by design,
learning from the problems in Python 2.x.

Regards,
Martin

Jun 27 '08 #3
On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 14:44, Christian Heimes <li...@cheimes.dewrote:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
Christian

And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.
Carl Banks
Jun 27 '08 #4
Christian Heimes <li***@cheimes.dewrites:
Gabriel Genellina schrieb:
>On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?

Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?
John
Jun 27 '08 #5
Carl Banks schrieb:
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
Indeed
I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.
I was against it and I also wanted to have 'egg' == b'egg' raise an
exception but I was overruled by Guido. At least I was allowed to
implement the byte warning feature (-b and -bb arguments). I *highly*
recommend that everybody runs her unit tests with the -bb option.

Christian
Jun 27 '08 #6
Carl Banks schrieb:
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
Indeed
I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.
I was against it and I also wanted to have 'egg' == b'egg' raise an
exception but I was overruled by Guido. At least I was allowed to
implement the byte warning feature (-b and -bb arguments). I *highly*
recommend that everybody runs her unit tests with the -bb option.

Christian

Jun 27 '08 #7
John J. Lee schrieb:
Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?
See for yourself:

$ ./python
Python 3.0a4+ (py3k:0, Apr 11 2008, 15:31:31)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>str(b'')
"b''"
[38544 refs]
>>bytes("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
[38585 refs]

$ ./python -b
Python 3.0a4+ (py3k:0, Apr 11 2008, 15:31:31)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>str(b'')
__main__:1: BytesWarning: str() on a bytes instance
"b''"
[38649 refs]

Christian

Jun 27 '08 #8
On 12 Apr., 16:29, Carl Banks <pavlovevide...@gmail.comwrote:
And making an utf-8 encoding default is not possible without writing a
new function?

I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
How many "encodings" would you define for a Rectangle constructor?

Making things infinitely configurable is very nice and shows that the
programmer has worked hard. Sometimes however it suffices to provide a
mandatory default and some supplementary conversion methods. This
still won't exhaust all possible cases but provides a reasonable
coverage.
Jun 27 '08 #9
On Apr 12, 5:51 pm, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 16:29, Carl Banks <pavlovevide...@gmail.comwrote:
And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

How many "encodings" would you define for a Rectangle constructor?

Making things infinitely configurable is very nice and shows that the
programmer has worked hard. Sometimes however it suffices to provide a
mandatory default and some supplementary conversion methods. This
still won't exhaust all possible cases but provides a reasonable
coverage.
There is no sensible default because many incompatible encodings are
in common use; programmers need to take responsibility for tracking ot
guessing string encodings according to their needs, in ways that
depend on application architecture, characteristics of users and data,
and various risk and quality trade-offs.

In languages that, like Java, have a default encoding for convenience,
documents are routinely mangled by sloppy programmers who think that
they live in an ASCII or UTF-8 fairy land and that they don't need
tight control of the encoding of all text that enters and leaves the
system.
Ceasing to support this obsolete attitude with lenient APIs is the
only way forward; being forced to learn that encodings are important
is better than, say, discovering unrecoverable data corruption in a
working system.

Regards,
Lorenzo Gatti
Jun 27 '08 #10
On Apr 12, 8:52 am, j...@pobox.com (John J. Lee) wrote:
Christian Heimes <li...@cheimes.dewrites:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?

John
Because it's a fundamental rule that you should be able to call str()
on any object and get a sensible result.

The reason that calling str() on a bytes object returns a bytes
literal rather than an unadorned character string is that there are no
default encodings or decodings: there is no way of determining what
the corresponding string should be.

John Roth
Jun 27 '08 #11
On Apr 12, 9:29 am, Carl Banks <pavlovevide...@gmail.comwrote:
On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 14:44, Christian Heimes <li...@cheimes.dewrote:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
And making an utf-8 encoding default is not possible without writing a
new function?

I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
True, you can't KNOW that. Maybe the author of those bytes actually
MEANT to say '¿Cómo estás?' instead of 'Cmo ests?'. However,
it's statistically unlikely for a non-UTF-8-encoded string to just
happen to be valid UTF-8.
Jun 27 '08 #12
Dan Bishop wrote:
On Apr 12, 9:29 am, Carl Banks <pavlovevide...@gmail.comwrote:
>On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
>>On 12 Apr., 14:44, Christian Heimes <li...@cheimes.dewrote:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

True, you can't KNOW that. Maybe the author of those bytes actually
MEANT to say '¿Cómo estás?' instead of 'Cmo ests?'. However,
it's statistically unlikely for a non-UTF-8-encoded string to just
happen to be valid UTF-8.
So you propose to perform a statistical analysis on your input to
determine whether it's UTF-8 or some other encoding?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Jun 27 '08 #13
En Sat, 12 Apr 2008 11:25:59 -0300, Martin v. Lwis <ma****@v.loewis.de>
escribi:
>And making an utf-8 encoding default is not possible without writing a
new function?

There is no default encoding anymore in Python 3. This is by design,
learning from the problems in Python 2.x.
So sys.getdefaultencoding() will disappear? Currently it returns "utf-8".
In case it stays, what is it used for?

--
Gabriel Genellina

Jun 27 '08 #14

"John Roth" <jo*******@gmail.comwrote in message
news:29**********************************@u3g2000h sc.googlegroups.com...
| On Apr 12, 8:52 am, j...@pobox.com (John J. Lee) wrote:
| Christian Heimes <li...@cheimes.dewrites:
| Gabriel Genellina schrieb:
| On the last line, str(x), I would expect 'abc' - same as str(x,
'ascii')
| above. But I get the same as repr(x) - is this on purpose?
| >
| Yes, it's on purpose but it's a bug in your application to call str()
on
| a bytes object or to compare bytes and unicode directly. Several
months
| ago I added a bytes warning option to Python. Start Python as
"python
| -bb" and try it again. ;)
| >
| Why hasn't the one-argument str(bytes_obj) been designed to raise an
| exception in Python 3?
| >
| John
|
| Because it's a fundamental rule that you should be able to call str()
| on any object and get a sensible result.
|
| The reason that calling str() on a bytes object returns a bytes
| literal rather than an unadorned character string is that there are no
| default encodings or decodings: there is no way of determining what
| the corresponding string should be.

In having a double meaning, str is much like type. Type(obj) echoes the
existing class of the object. Type(o,p,q) attempts to construct a new
class. Similarly, Str(obj) gives a string representing the obj (which, for
a string, is the string;-). Str(obj,obj2) attemps to construct a new
string.

tjr

Jun 27 '08 #15
On Apr 12, 11:51 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 16:29, Carl Banks <pavlovevide...@gmail.comwrote:
And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

How many "encodings" would you define for a Rectangle constructor?
I'm not sure what you're insinuating. If you are arguing that it's
inappropriate for a constructor to take an "encoding" argument (as you
put it), be my guest. I wasn't commenting on that specifically.

I was commenting on your suggestion of having str assume utf-8
encoding, which IMO would be very unPythonic, whether you can pass
encodings to it or not.
Whatever happened to the decode method anyway? Why has str() been
coopted for this purpose? I had expected that str objects would
retain the encode method, bytes the decode method, and everyone would
live happily ever after. If decode is a confusing name (and I know I
have to engage a few extra neurons to figure out which way it goes),
why not rename it to something like to_unicode instead of overloading
the constructors more.

Carl Banks
Jun 27 '08 #16
On 13 Apr., 09:24, Carl Banks <pavlovevide...@gmail.comwrote:
On Apr 12, 11:51 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 16:29, Carl Banks <pavlovevide...@gmail.comwrote:
And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
How many "encodings" would you define for a Rectangle constructor?

I'm not sure what you're insinuating. If you are arguing that it's
inappropriate for a constructor to take an "encoding" argument (as you
put it), be my guest. I wasn't commenting on that specifically.

I was commenting on your suggestion of having str assume utf-8
encoding, which IMO would be very unPythonic, whether you can pass
encodings to it or not.
That's o.k. I don't primarily advocate default values or such things
just reduction of mental and scripting overhead. We shouldn't lose the
goal out of sight. I played a bit with several encodings but this
didn't enable much of an impression how it will feel in real code.

I can see though the inadequacy of my original claim mainly due to the
overlooked fact that there isn't even a mapping of the range \x0 -
\xff to utf-8 but only one from \x0 - \x7f. Same with the ASCII
encoding which is limited to 7 bits as well.

One has to be careful not just because "you can select the wrong
encoding" but stringification with an utf-8 encoding can simply
activate the exception handler even though there will be no type
error! A default value shall work under all circumstances supposed you
pass in an object of the correct type.

Jun 27 '08 #17

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

30 posts views Thread by Hallvard B Furuseth | last post: by
3 posts views Thread by sinan . | last post: by
11 posts views Thread by Gerrit Holl | last post: by
3 posts views Thread by Ray | last post: by
15 posts views Thread by Kasrav | last post: by
reply views Thread by Gabriel Genellina | last post: by
reply views Thread by Bryan Olson | last post: by
1 post views Thread by Waqarahmed | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.