473,326 Members | 2,095 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Re: str(bytes) in Python 3.0

Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

Christian

Jun 27 '08 #1
16 3010
On 12 Apr., 14:44, Christian Heimes <li...@cheimes.dewrote:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?

Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

Christian
And making an utf-8 encoding default is not possible without writing a
new function?
Jun 27 '08 #2
And making an utf-8 encoding default is not possible without writing a
new function?
There is no default encoding anymore in Python 3. This is by design,
learning from the problems in Python 2.x.

Regards,
Martin

Jun 27 '08 #3
On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 14:44, Christian Heimes <li...@cheimes.dewrote:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
Christian

And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.
Carl Banks
Jun 27 '08 #4
Christian Heimes <li***@cheimes.dewrites:
Gabriel Genellina schrieb:
>On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?

Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?
John
Jun 27 '08 #5
Carl Banks schrieb:
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
Indeed
I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.
I was against it and I also wanted to have 'egg' == b'egg' raise an
exception but I was overruled by Guido. At least I was allowed to
implement the byte warning feature (-b and -bb arguments). I *highly*
recommend that everybody runs her unit tests with the -bb option.

Christian
Jun 27 '08 #6
Carl Banks schrieb:
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
Indeed
I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.
I was against it and I also wanted to have 'egg' == b'egg' raise an
exception but I was overruled by Guido. At least I was allowed to
implement the byte warning feature (-b and -bb arguments). I *highly*
recommend that everybody runs her unit tests with the -bb option.

Christian

Jun 27 '08 #7
John J. Lee schrieb:
Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?
See for yourself:

$ ./python
Python 3.0a4+ (py3k:0, Apr 11 2008, 15:31:31)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>str(b'')
"b''"
[38544 refs]
>>bytes("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
[38585 refs]

$ ./python -b
Python 3.0a4+ (py3k:0, Apr 11 2008, 15:31:31)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>str(b'')
__main__:1: BytesWarning: str() on a bytes instance
"b''"
[38649 refs]

Christian

Jun 27 '08 #8
On 12 Apr., 16:29, Carl Banks <pavlovevide...@gmail.comwrote:
And making an utf-8 encoding default is not possible without writing a
new function?

I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
How many "encodings" would you define for a Rectangle constructor?

Making things infinitely configurable is very nice and shows that the
programmer has worked hard. Sometimes however it suffices to provide a
mandatory default and some supplementary conversion methods. This
still won't exhaust all possible cases but provides a reasonable
coverage.
Jun 27 '08 #9
On Apr 12, 5:51 pm, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 16:29, Carl Banks <pavlovevide...@gmail.comwrote:
And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

How many "encodings" would you define for a Rectangle constructor?

Making things infinitely configurable is very nice and shows that the
programmer has worked hard. Sometimes however it suffices to provide a
mandatory default and some supplementary conversion methods. This
still won't exhaust all possible cases but provides a reasonable
coverage.
There is no sensible default because many incompatible encodings are
in common use; programmers need to take responsibility for tracking ot
guessing string encodings according to their needs, in ways that
depend on application architecture, characteristics of users and data,
and various risk and quality trade-offs.

In languages that, like Java, have a default encoding for convenience,
documents are routinely mangled by sloppy programmers who think that
they live in an ASCII or UTF-8 fairy land and that they don't need
tight control of the encoding of all text that enters and leaves the
system.
Ceasing to support this obsolete attitude with lenient APIs is the
only way forward; being forced to learn that encodings are important
is better than, say, discovering unrecoverable data corruption in a
working system.

Regards,
Lorenzo Gatti
Jun 27 '08 #10
On Apr 12, 8:52 am, j...@pobox.com (John J. Lee) wrote:
Christian Heimes <li...@cheimes.dewrites:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?

John
Because it's a fundamental rule that you should be able to call str()
on any object and get a sensible result.

The reason that calling str() on a bytes object returns a bytes
literal rather than an unadorned character string is that there are no
default encodings or decodings: there is no way of determining what
the corresponding string should be.

John Roth
Jun 27 '08 #11
On Apr 12, 9:29 am, Carl Banks <pavlovevide...@gmail.comwrote:
On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 14:44, Christian Heimes <li...@cheimes.dewrote:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
And making an utf-8 encoding default is not possible without writing a
new function?

I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
True, you can't KNOW that. Maybe the author of those bytes actually
MEANT to say '¿Cómo estás?' instead of '¿Cómo estás?'. However,
it's statistically unlikely for a non-UTF-8-encoded string to just
happen to be valid UTF-8.
Jun 27 '08 #12
Dan Bishop wrote:
On Apr 12, 9:29 am, Carl Banks <pavlovevide...@gmail.comwrote:
>On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
>>On 12 Apr., 14:44, Christian Heimes <li...@cheimes.dewrote:
Gabriel Genellina schrieb:
On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

True, you can't KNOW that. Maybe the author of those bytes actually
MEANT to say '¿Cómo estás?' instead of '¿Cómo estás?'. However,
it's statistically unlikely for a non-UTF-8-encoded string to just
happen to be valid UTF-8.
So you propose to perform a statistical analysis on your input to
determine whether it's UTF-8 or some other encoding?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Jun 27 '08 #13
En Sat, 12 Apr 2008 11:25:59 -0300, Martin v. Löwis <ma****@v.loewis.de>
escribió:
>And making an utf-8 encoding default is not possible without writing a
new function?

There is no default encoding anymore in Python 3. This is by design,
learning from the problems in Python 2.x.
So sys.getdefaultencoding() will disappear? Currently it returns "utf-8".
In case it stays, what is it used for?

--
Gabriel Genellina

Jun 27 '08 #14

"John Roth" <jo*******@gmail.comwrote in message
news:29**********************************@u3g2000h sc.googlegroups.com...
| On Apr 12, 8:52 am, j...@pobox.com (John J. Lee) wrote:
| Christian Heimes <li...@cheimes.dewrites:
| Gabriel Genellina schrieb:
| On the last line, str(x), I would expect 'abc' - same as str(x,
'ascii')
| above. But I get the same as repr(x) - is this on purpose?
| >
| Yes, it's on purpose but it's a bug in your application to call str()
on
| a bytes object or to compare bytes and unicode directly. Several
months
| ago I added a bytes warning option to Python. Start Python as
"python
| -bb" and try it again. ;)
| >
| Why hasn't the one-argument str(bytes_obj) been designed to raise an
| exception in Python 3?
| >
| John
|
| Because it's a fundamental rule that you should be able to call str()
| on any object and get a sensible result.
|
| The reason that calling str() on a bytes object returns a bytes
| literal rather than an unadorned character string is that there are no
| default encodings or decodings: there is no way of determining what
| the corresponding string should be.

In having a double meaning, str is much like type. Type(obj) echoes the
existing class of the object. Type(o,p,q) attempts to construct a new
class. Similarly, Str(obj) gives a string representing the obj (which, for
a string, is the string;-). Str(obj,obj2) attemps to construct a new
string.

tjr

Jun 27 '08 #15
On Apr 12, 11:51 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 16:29, Carl Banks <pavlovevide...@gmail.comwrote:
And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

How many "encodings" would you define for a Rectangle constructor?
I'm not sure what you're insinuating. If you are arguing that it's
inappropriate for a constructor to take an "encoding" argument (as you
put it), be my guest. I wasn't commenting on that specifically.

I was commenting on your suggestion of having str assume utf-8
encoding, which IMO would be very unPythonic, whether you can pass
encodings to it or not.
Whatever happened to the decode method anyway? Why has str() been
coopted for this purpose? I had expected that str objects would
retain the encode method, bytes the decode method, and everyone would
live happily ever after. If decode is a confusing name (and I know I
have to engage a few extra neurons to figure out which way it goes),
why not rename it to something like to_unicode instead of overloading
the constructors more.

Carl Banks
Jun 27 '08 #16
On 13 Apr., 09:24, Carl Banks <pavlovevide...@gmail.comwrote:
On Apr 12, 11:51 am, Kay Schluehr <kay.schlu...@gmx.netwrote:
On 12 Apr., 16:29, Carl Banks <pavlovevide...@gmail.comwrote:
And making an utf-8 encoding default is not possible without writing a
new function?
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?
How many "encodings" would you define for a Rectangle constructor?

I'm not sure what you're insinuating. If you are arguing that it's
inappropriate for a constructor to take an "encoding" argument (as you
put it), be my guest. I wasn't commenting on that specifically.

I was commenting on your suggestion of having str assume utf-8
encoding, which IMO would be very unPythonic, whether you can pass
encodings to it or not.
That's o.k. I don't primarily advocate default values or such things
just reduction of mental and scripting overhead. We shouldn't lose the
goal out of sight. I played a bit with several encodings but this
didn't enable much of an impression how it will feel in real code.

I can see though the inadequacy of my original claim mainly due to the
overlooked fact that there isn't even a mapping of the range \x0 -
\xff to utf-8 but only one from \x0 - \x7f. Same with the ASCII
encoding which is limited to 7 bits as well.

One has to be careful not just because "you can select the wrong
encoding" but stringification with an utf-8 encoding can simply
activate the exception handler even though there will be no type
error! A default value shall work under all circumstances supposed you
pass in an object of the correct type.

Jun 27 '08 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
by: Hallvard B Furuseth | last post by:
Now that the '-*- coding: <charset> -*-' feature has arrived, I'd like to see an addition: # -*- str7bit:True -*- After the source file has been converted to Unicode, cause a parse error if a...
3
by: sinan . | last post by:
hi all, i have a string and int values in same dictionary like this dict = {'str_name': 'etc' , 'int_name' : 112 } the error occures when do this SQL = "INSERT INTO (`AH`, `BH` ) VALUES ('" +...
11
by: Gerrit Holl | last post by:
Hi, In Python 3, reading from a file gives bytes rather than characters. Some operations currently performed on strings also make sense when performed on bytes, either if it's binary data or if...
1
by: William Connery | last post by:
Hi, I have a small python program with e-mail capabilities that I have pieced together from code snippets found on the internet. The program uses the smtplib module to successfully send an...
3
by: Ray | last post by:
Hi, I'm working on something with mysql and excel. I'm using python and win32com. All major function works, But I have two problems: 1. the output need to do "auto fit" to make it readable. ...
15
by: Kasrav | last post by:
Hey there its me again i got this program but i have to improve on it but i am stuck hopefully you can help here is the code def student(): welcome(); info = ...
0
by: Gabriel Genellina | last post by:
Hello Is this the intended behavior? Python 3.0a4+ (py3k, Apr 12 2008, 02:53:16) on win32 Type "help", "copyright", "credits" or "license" for more information. "b'abc'" 'abc' 'abc'
19
by: est | last post by:
From python manual str( ) Return a string containing a nicely printable representation of an object. For strings, this returns the string itself. The difference with repr(object) is that...
0
by: Bryan Olson | last post by:
Python 3 has the 'bytes' type, which the string type I've long wanted in various languages. Among other advantages, it is immutable, and therefore bytes objects can be dict keys. There's a mutable...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.