473,396 Members | 1,864 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

str() should convert ANY object to a string without EXCEPTIONS !

est
From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to
return a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.
now we try this under windows:
>>str(u'\ue863')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

FAIL.

also almighty Linux

Python 2.3.4 (#1, Feb 6 2006, 10:38:46)
[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>str(u'\ue863')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.4.4 (#2, Apr 5 2007, 20:11:18)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>str(u'\ue863')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.5 (release25-maint, Jul 20 2008, 20:47:25)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>str(u'\ue863')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in
range(256) !!!!!!!!!!

http://bugs.python.org/issue3648

One possible solution(Windows Only)
>>str(u'\ue863'.encode('mbcs'))
'\xfe\x9f'
>>print u'\ue863'.encode('mbcs')
þŸ
I now spending 60% of my developing time dealing with ASCII range(128)
errors. It was PAIN!!!!!!

Please fix this issue.

http://bugs.python.org/issue3648

Please.
Sep 28 '08 #1
19 5294
In message
<98**********************************@q26g2000prq. googlegroups.com>, est
wrote:
The problem is, why the f**k set ASCII encoding to range(128) ????????
Because that's how ASCII is defined.
while str() is internally byte array it should be handled in
range(256) !!!!!!!!!!
But that's for random bytes. How would you convert an arbitrary object to
random bytes?
Sep 28 '08 #2
On Sat, 27 Sep 2008 22:37:09 -0700, est wrote:
The problem is, why the f**k set ASCII encoding to range(128) ????????
Because that's how ASCII is defined. ASCII is a 7-bit code.
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!
Yes `str` can handle that, but that's not the point. The point is how to
translate the contents of a `unicode` object into that range. There are
many different possibilities and Python refuses to guess and tries the
lowest common denominator -- ASCII -- instead.
I now spending 60% of my developing time dealing with ASCII range(128)
errors. It was PAIN!!!!!!

Please fix this issue.

http://bugs.python.org/issue3648

Please.
The issue was closed as 'invalid'. Dealing with Unicode can be a pain
and frustrating, but that's not a Python problem, it's the subject itself
that needs some thoughts. If you think this through, the relationship
between characters, encodings, and bytes, and stop dreaming of a magic
solution that works without dealing with this stuff explicitly, the pain
will go away -- or ease at least.

Ciao,
Marc 'BlackJack' Rintsch
Sep 28 '08 #3
est wrote:
>>From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to
return a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.
now we try this under windows:
>>>str(u'\ue863')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)
In 3.0 this is fixed:
>>str('\ue863') # u prefix is gone
'\ue863'
>>str(b'123') # b prefix is added
"b'123'"

Problems like this at least partly motivated the change to unicode
instead of bytes as the string type.

tjr

Sep 28 '08 #4
On Sun, 28 Sep 2008 19:03:42 +1300, Lawrence D'Oliveiro wrote:
In message
<98**********************************@q26g2000prq. googlegroups.com>, est
wrote:
>The problem is, why the f**k set ASCII encoding to range(128) ????????

Because that's how ASCII is defined.
>while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!

But that's for random bytes. How would you convert an arbitrary object
to random bytes?
from random import randint
''.join(chr(randint(0, 255)) for i in xrange(len(input)))

of course. How else should you get random bytes? :)

--
Steven
Sep 28 '08 #5
On Sun, 28 Sep 2008 03:55:46 -0400, Terry Reedy wrote:
est wrote:
>>>From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to return
a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.
now we try this under windows:
>>>>str(u'\ue863')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

In 3.0 this is fixed:
>>>str('\ue863') # u prefix is gone
'\ue863'
>>>str(b'123') # b prefix is added
"b'123'"

Problems like this at least partly motivated the change to unicode
instead of bytes as the string type.

I'm not sure that "fixed" is the right word. Isn't that more or less the
same as telling the OP to use unicode() instead of str()? It merely
avoids the problem of converting Unicode to ASCII by leaving your string
as Unicode, rather than fixing it. Perhaps that's the right thing to do,
but it's a bit like the old joke:

"Doctor, it hurts when I do this."
"Then don't do it!"

As for the second example you give:
>>str(b'123') # b prefix is added
"b'123'"
Perhaps I'm misinterpreting it, but from here it looks to me that str()
is doing what repr() used to do, and I'm really not sure that's a good
thing. I would have expected that str(b'123') in Python 3 should do the
same thing as unicode('123') does now:
>>unicode('123')
u'123'

(except without the u prefix).
--
Steven
Sep 28 '08 #6
est
Because that's how ASCII is defined.
Because that's how ASCII is defined. ASCII is a 7-bit code.
Then why can't python use another default encoding internally
range(256)?
Python refuses to guess and tries the lowest common denominator -- ASCII -- instead.
That's the problem. ASCII is INCOMPLETE!

If Python choose another default encoding which handles range(256),
80% of python unicode encoding problems are gone.

It's not HARD to process unicode, it's just python & python community
refuse to correct it.
stop dreaming of a magic solution
It's not 'magic' it's a BUG. Just print 0x7F to 0xFF to console,
what's wrong????
Isn't that more or less the same as telling the OP to use unicode() instead of str()?
sockets could handle str() only. If you throw unicode objects to a
socket, it will automatically call str() and cause an error.
Sep 28 '08 #7
On Sat, 27 Sep 2008 22:37:09 -0700, est wrote:
>>>str(u'\ue863')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

FAIL.
What result did you expect?
[...]
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!

To quote Terry Pratchett:

"What sort of person," said Salzella patiently, "sits down and
*writes* a maniacal laugh? And all those exclamation marks, you
notice? Five? A sure sign of someone who wears his underpants
on his head." -- (Terry Pratchett, Maskerade)

In any case, even if the ASCII encoding used all 256 possible bytes, you
still have a problem. Your unicode string is a single character with
ordinal value 59491:
>>ord(u'\ue863')
59491

You can't fit 59491 (or more) characters into 256, so obviously some
unicode chars aren't going to fit into ASCII without some sort of
encoding. You show that yourself:

u'\ue863'.encode('mbcs') # Windows only

But of course 'mbcs' is only one possible encoding. There are others.
Python refuses to guess which encoding you want. Here's another:

u'\ue863'.encode('utf-8')


--
Steven
Sep 28 '08 #8
est
On Sep 28, 4:38*pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:
On Sat, 27 Sep 2008 22:37:09 -0700, est wrote:
>>str(u'\ue863')
Traceback (most recent call last):
* File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)
FAIL.

What result did you expect?

[...]
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!

To quote Terry Pratchett:

* * "What sort of person," said Salzella patiently, "sits down and
* * *writes* a maniacal laugh? And all those exclamation marks, you
* * notice? Five? A sure sign of someone who wears his underpants
* * on his head." -- (Terry Pratchett, Maskerade)

In any case, even if the ASCII encoding used all 256 possible bytes, you
still have a problem. Your unicode string is a single character with
ordinal value 59491:
>ord(u'\ue863')

59491

You can't fit 59491 (or more) characters into 256, so obviously some
unicode chars aren't going to fit into ASCII without some sort of
encoding. You show that yourself:

u'\ue863'.encode('mbcs') *# Windows only

But of course 'mbcs' is only one possible encoding. There are others.
Python refuses to guess which encoding you want. Here's another:

u'\ue863'.encode('utf-8')

--
Steven
OK, I am tired of arguing these things since python 3.0 fixed it
somehow.

Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?
Sep 28 '08 #9
Lie
On Sep 28, 12:37Â*pm, est <electronix...@gmail.comwrote:
From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to
return a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.

now we try this under windows:
>str(u'\ue863')

Traceback (most recent call last):
Â* File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

FAIL.
And it is correct to fail, ASCII is only defined within range(128),
the rest (i.e. range(128, 256)) is not defined in ASCII. The
range(128, 256) are extension slots, with many conflicting meanings.
>
also almighty Linux

Python 2.3.4 (#1, Feb Â*6 2006, 10:38:46)
[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>>str(u'\ue863')

Traceback (most recent call last):
Â* File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.4.4 (#2, Apr Â*5 2007, 20:11:18)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>>str(u'\ue863')

Traceback (most recent call last):
Â* File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.5 (release25-maint, Jul 20 2008, 20:47:25)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>>str(u'\ue863')

Traceback (most recent call last):
Â* File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)
If that str() function has returned anything but error on this, I'd
file a bug report.
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in
range(256) !!!!!!!!!!
string is a byte array, but unicode and ASCII is NOT. Unicode string
is a character array defined up to range(65535). Each character in
unicode may be one or two bytes long. ASCII string is a character
array defined up to range(127). Other than Unicode (actually utf-8,
utf-16, and utf-32) and ASCII, there are many other encodings (ECBDIC,
iso-8859-1', ..., 'iso-8859-16', 'KOI8', 'GB18030', 'Shift-JIS', etc,
etc, etc) each with conflicting byte to characters mappings.
Fortunately, most of these encodings do share a common ground: ASCII.

Actually, when a strictly stupid str() receives a Unicode string (i.e.
character array), it should return a <unicode s at
0x423549af813e4954>, but it doesn't, str() is smarter than that, it
tries to convert whatever fits into ASCII, i.e. characters lower than
128. Why ASCII? Because character from range(128, 256) varies widely
and it doesn't know which encoding you want to use, so if you don't
tell me what encoding to use it'd not guess (Python Zen: In the face
of ambiguity, refuse the temptation to guess).

If you're trying to convert a character array (Unicode) into a byte
string, it's done by specifying which codec you want to use. str()
tries to convert your character array (Unicode) to byte string using
ASCII codec. s.encode(codec) would convert a given character array
into byte string using codec.
http://bugs.python.org/issue3648

One possible solution(Windows Only)
>str(u'\ue863'.encode('mbcs'))
'\xfe\x9f'
actually str() is not needed, you need only: u'\ue863'.encode('mbcs')
>print u'\ue863'.encode('mbcs')
䶮

I now spending 60% of my developing time dealing with ASCII range(128)
errors. It was PAIN!!!!!!
Despair not, there is a quick hack:
# but only use it as temporary solution, FIX YOUR CODE PROPERLY
str_ = str
str = lambda s = '': s.encode('mbcs') if isinstance(s, basestring)
else str_(s)
Please fix this issue.

http://bugs.python.org/issue3648

Please.
Sep 28 '08 #10
On Sep 28, 11:21 am, est <electronix...@gmail.comwrote:
On Sep 28, 4:38 pm, Steven D'Aprano <st...@REMOVE-THIS-
Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?
I assume you are using python2.5
Edit the file /usr/lib/python2.5/site.py

There is a method called
def setencoding():
[...]
encoding = "ascii"
[...]

Change "encoding = "ascii" to encoding = "utf-8"

On windows you may have to use "mbsc" or something like that. I have
no idea what windows use at its encoding.

As long as all systems don't use the same encoding (let's say utf-8
since it is becoming the standard on unixes and on the web) using
ascii as a default encoding makes sense.

Sep 28 '08 #11
On Sun, 28 Sep 2008 01:35:11 -0700, est wrote:
>Because that's how ASCII is defined.
Because that's how ASCII is defined. ASCII is a 7-bit code.

Then why can't python use another default encoding internally
range(256)?
Because that doesn't suffice. Unicode code points can be >255.
If Python choose another default encoding which handles range(256), 80%
of python unicode encoding problems are gone.
80% of *your* problems with it *seems* to be gone then.
It's not HARD to process unicode, it's just python & python community
refuse to correct it.
It is somewhat hard to deal with unicode because many don't want to think
about it or don't grasp the relationship between encodings, byte values,
and characters. Including you.
>stop dreaming of a magic solution

It's not 'magic' it's a BUG. Just print 0x7F to 0xFF to console, what's
wrong????
What do you mean by "just print 0x7F to 0xFF"? For example if I have ``s
= u'Smørebrød™'`` what bytes should ``str(s)`` produce and why those and
not others?
>Isn't that more or less the same as telling the OP to use unicode()
instead of str()?

sockets could handle str() only. If you throw unicode objects to a
socket, it will automatically call str() and cause an error.
Because *you* have to tell explicitly how the unicode object should be
encoded as bytes. Python can't do this automatically because it has *no
idea* what the process at the other end of the socket expects.

Now you are complaining that Python chooses ASCII. If it is changed to
something else, like MBCS, others start complaining why it is MBCS and
not something different. See: No fix, just moving the problem to someone
else.

Ciao,
Marc 'BlackJack' Rintsch
Sep 28 '08 #12
Lie
On Sep 28, 4:21*pm, est <electronix...@gmail.comwrote:
On Sep 28, 4:38*pm, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.auwrote:
On Sat, 27 Sep 2008 22:37:09 -0700, est wrote:
>>>str(u'\ue863')
Traceback (most recent call last):
* File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)
FAIL.
What result did you expect?
[...]
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!
To quote Terry Pratchett:
* * "What sort of person," said Salzella patiently, "sits down and
* * *writes* a maniacal laugh? And all those exclamation marks, you
* * notice? Five? A sure sign of someone who wears his underpants
* * on his head." -- (Terry Pratchett, Maskerade)
In any case, even if the ASCII encoding used all 256 possible bytes, you
still have a problem. Your unicode string is a single character with
ordinal value 59491:
>>ord(u'\ue863')
59491
You can't fit 59491 (or more) characters into 256, so obviously some
unicode chars aren't going to fit into ASCII without some sort of
encoding. You show that yourself:
u'\ue863'.encode('mbcs') *# Windows only
But of course 'mbcs' is only one possible encoding. There are others.
Python refuses to guess which encoding you want. Here's another:
u'\ue863'.encode('utf-8')
--
Steven

OK, I am tired of arguing these things since python 3.0 fixed it
somehow.
I'm against calling python 3.0 fixed it, python 3.0's default encoding
is utf-8/Unicode, and that is why your problem magically disappears.
Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?
Python used to have sys.setdefaultencoding, but that feature was an
accident. sys.setdefaultencoding was intended to be used for testing
purpose when the developers haven't decided what to use as default
encoding (what use is default when you can change it).
sys.setdefaultencoding has been removed, programmers should encode
characters manually if they want to use something other than the
default encoding (ASCII).
Sep 28 '08 #13
est
On Sep 28, 6:15*pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
On Sun, 28 Sep 2008 01:35:11 -0700, est wrote:
Because that's how ASCII is defined.
Because that's how ASCII is defined. *ASCII is a 7-bit code.
Then why can't python use another default encoding internally
range(256)?

Because that doesn't suffice. *Unicode code points can be >255.
If Python choose another default encoding which handles range(256), 80%
of python unicode encoding problems are gone.

80% of *your* problems with it *seems* to be gone then.
It's not HARD to process unicode, it's just python & python community
refuse to correct it.

It is somewhat hard to deal with unicode because many don't want to think
about it or don't grasp the relationship between encodings, byte values,
and characters. *Including you.
stop dreaming of a magic solution
It's not 'magic' it's a BUG. Just print 0x7F to 0xFF to console, what's
wrong????

What do you mean by "just print 0x7F to 0xFF"? *For example if I have ``s
= u'Smørebrød™'`` what bytes should ``str(s)`` produce and why those and
not others?
Isn't that more or less the same as telling the OP to use unicode()
instead of str()?
sockets could handle str() only. If you throw unicode objects to a
socket, it will automatically call str() and cause an error.

Because *you* have to tell explicitly how the unicode object should be
encoded as bytes. *Python can't do this automatically because it has *no
idea* what the process at the other end of the socket expects.

Now you are complaining that Python chooses ASCII. *If it is changed to
something else, like MBCS, others start complaining why it is MBCS and
not something different. *See: No fix, just moving the problem to someone
else.

Ciao,
* * * * Marc 'BlackJack' Rintsch
Well, you succeseded in putting all blame to myself alone. Great.

When you guy's are dealing with CJK characters in the future, you'll
find out what I mean.

In fact Boa Constructor keeps prompting ASCII and range(128) error on
my Windows. That's pretty cool.
Sep 28 '08 #14
Lie
On Sep 28, 3:35*pm, est <electronix...@gmail.comwrote:
Because that's how ASCII is defined.
Because that's how ASCII is defined. *ASCII is a 7-bit code.

Then why can't python use another default encoding internally
range(256)?
Python refuses to guess and tries the lowest common denominator -- ASCII -- instead.

That's the problem. ASCII is INCOMPLETE!
What do you propose? Use mbsc and smack out linux computers? Use KOI
and make non-Russians suicide? Use GB and shot dead non-Chinese? Use
latin-1 and make emails servers scream?
If Python choose another default encoding which handles range(256),
80% of python unicode encoding problems are gone.

It's not HARD to process unicode, it's just python & python community
refuse to correct it.
Python's unicode support is already correct. Only your brainwave have
not been tuned to it yet.
stop dreaming of a magic solution

It's not 'magic' it's a BUG. Just print 0x7F to 0xFF to console,
what's wrong????
Isn't that more or less the same as telling the OP to use unicode() instead of str()?

sockets could handle str() only. If you throw unicode objects to a
socket, it will automatically call str() and cause an error.
Sep 28 '08 #15
est
On Sep 28, 7:12*pm, Lie <Lie.1...@gmail.comwrote:
On Sep 28, 3:35*pm, est <electronix...@gmail.comwrote:
Because that's how ASCII is defined.
Because that's how ASCII is defined. *ASCII is a 7-bit code.
Then why can't python use another default encoding internally
range(256)?
Python refuses to guess and tries the lowest common denominator -- ASCII -- instead.
That's the problem. ASCII is INCOMPLETE!

What do you propose? Use mbsc and smack out linux computers? Use KOI
and make non-Russians suicide? Use GB and shot dead non-Chinese? Use
latin-1 and make emails servers scream?
If Python choose another default encoding which handles range(256),
80% of python unicode encoding problems are gone.
It's not HARD to process unicode, it's just python & python community
refuse to correct it.

Python's unicode support is already correct. Only your brainwave have
not been tuned to it yet.
stop dreaming of a magic solution
It's not 'magic' it's a BUG. Just print 0x7F to 0xFF to console,
what's wrong????
Isn't that more or less the same as telling the OP to use unicode() instead of str()?
sockets could handle str() only. If you throw unicode objects to a
socket, it will automatically call str() and cause an error.
Have you ever programmed with CJK characters before?
Sep 28 '08 #16
Steven D'Aprano wrote:
>>>str(b'123') # b prefix is added
"b'123'"
Perhaps I'm misinterpreting it, but from here it looks to me that str()
is doing what repr() used to do, and I'm really not sure that's a good
thing. I would have expected that str(b'123') in Python 3 should do the
same thing as unicode('123') does now:
No, you are getting it right and yes, it's problematic. Guido wanted
str(b'') to succeed. But the behavior can easily mask bugs in code.
Therefor a byte warning mode was implemented.
$ ./python -b
>>str(b'123')
__main__:1: BytesWarning: str() on a bytes instance
"b'123'"

$ ./python -bb
>>str(b'123')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BytesWarning: str() on a bytes instance
>>b'' == ''
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BytesWarning: Comparison between bytes and string

Sep 28 '08 #17
In article <00**********************@news.astraweb.com>,
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:
from random import randint
''.join(chr(randint(0, 255)) for i in xrange(len(input)))
of course. How else should you get random bytes? :)
That a UUOL (Useless Usage Of Len; by analogy to UUOC). This works just as
well:

''.join(chr(randint(0, 255)) for i in input)
Sep 28 '08 #18
In message
<91**********************************@b38g2000prf. googlegroups.com>, est
wrote:
Well, you succeseded in putting all blame to myself alone. Great.
Take it as a hint.
When you guy's are dealing with CJK characters in the future, you'll
find out what I mean.
Speaking as somebody who HAS dealt with CJK characters in the past--see
above.
Sep 28 '08 #19
En Sun, 28 Sep 2008 07:01:12 -0300, Olivier Lauzanne
<ne**********@gmail.comescribió:
On Sep 28, 11:21 am, est <electronix...@gmail.comwrote:
>Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?

I assume you are using python2.5
Edit the file /usr/lib/python2.5/site.py

There is a method called
def setencoding():
[...]
encoding = "ascii"
[...]

Change "encoding = "ascii" to encoding = "utf-8"

On windows you may have to use "mbsc" or something like that. I have
no idea what windows use at its encoding.
*Not* a good idea at all.
You're just masking errors, and making your programs incompatible with all
other Pythons installed around the world.

--
Gabriel Genellina

Sep 30 '08 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Peter Kwan | last post by:
Hi, I believe I have discovered a bug in Python 2.3. Could anyone suggest a get around? When I tested my existing Python code with the newly released Python 2.3, I get the following warning: ...
30
by: Hallvard B Furuseth | last post by:
Now that the '-*- coding: <charset> -*-' feature has arrived, I'd like to see an addition: # -*- str7bit:True -*- After the source file has been converted to Unicode, cause a parse error if a...
46
by: Leo Breebaart | last post by:
I've tried Googling for this, but practically all discussions on str.join() focus on the yuck-ugly-shouldn't-it-be-a-list-method? issue, which is not my problem/question at all. What I can't...
11
by: Brent | last post by:
I'd like to subclass the built-in str type. For example: -- class MyString(str): def __init__(self, txt, data): super(MyString,self).__init__(txt) self.data = data
12
by: Brian | last post by:
I want to use regxp to check that a form input contains at least 1 non-space charcter. I'd like to only run this if the browser supports it. For DOM stuff, I'd use if (documentGetElementById) {}...
2
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is...
4
by: a_agaga | last post by:
Hi! Do you know different alternatives to convert exceptions in many methods of some wrapper classes. User -Wrapper classes -LibraryClasses -... Wrapper classes catch an exception of only...
13
by: 7stud | last post by:
I can't get the str() method to work in the following code(the last line produces an error): ============ class test: """class test""" def __init__(self): """I am init func!""" self.num = 10...
14
by: Russell E. Owen | last post by:
I have code like this: except Exception, e: self.setState(self.Failed, str(e)) which fails if the exception contains a unicode argument. I did, of course, try unicode(e) but that fails. The...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.