473,408 Members | 1,822 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

WTF? Printing unicode strings

>>> u'\xbd'
u'\xbd'
print _ Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)

May 18 '06 #1
29 3476
Ron Garret wrote:
u'\xbd' u'\xbd' print _ Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)


Not sure if this really helps you, but:
u'\xbd' u'\xbd' print _

May 18 '06 #2
Ron Garret wrote:
u'\xbd' u'\xbd' print _

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)


so stdout on your machine is ascii, and you don't understand why you
cannot print a non-ascii unicode character to it? wtf?

</F>

May 18 '06 #3
In article <ma***************************************@python. org>,
Fredrik Lundh <fr*****@pythonware.com> wrote:
Ron Garret wrote:
> u'\xbd'

u'\xbd'
> print _

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)


so stdout on your machine is ascii, and you don't understand why you
cannot print a non-ascii unicode character to it? wtf?

</F>


I forgot to mention:
sys.getdefaultencoding() 'utf-8' print u'\xbd' Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)

May 18 '06 #4
Ron Garret wrote:
I forgot to mention:
sys.getdefaultencoding()
'utf-8'


A) You shouldn't be able to do that.
B) Don't do that.
C) It's not relevant to the encoding of stdout which determines how unicode
strings get converted to bytes when printing them:
import sys
sys.stdout.encoding 'UTF-8' sys.getdefaultencoding() 'ascii' print u'\xbd'

½

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 18 '06 #5
In article <ma***************************************@python. org>,
Robert Kern <ro*********@gmail.com> wrote:
Ron Garret wrote:
I forgot to mention:
>sys.getdefaultencoding()
'utf-8'


A) You shouldn't be able to do that.


What can I say? I can.
B) Don't do that.
OK. What should I do instead?
C) It's not relevant to the encoding of stdout which determines how unicode
strings get converted to bytes when printing them:
import sys
sys.stdout.encoding 'UTF-8' sys.getdefaultencoding() 'ascii' print u'\xbd' 1⁄2


OK, so how am I supposed to change the encoding of sys.stdout? It comes
up as US-ASCII on my system. Simply setting it doesn't work:
import sys
sys.stdout.encoding='utf-8' Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: readonly attribute


rg
May 18 '06 #6
Ron Garret wrote:
In article <ma***************************************@python. org>,
Robert Kern <ro*********@gmail.com> wrote:
Ron Garret wrote:
I forgot to mention:
>>sys.getdefaultencoding()

'utf-8'


A) You shouldn't be able to do that.


What can I say? I can.


See B).
B) Don't do that.


OK. What should I do instead?


See below.
C) It's not relevant to the encoding of stdout which determines how unicode
strings get converted to bytes when printing them:
>import sys
>sys.stdout.encoding


'UTF-8'
>sys.getdefaultencoding()


'ascii'
>print u'\xbd'


1⁄2


OK, so how am I supposed to change the encoding of sys.stdout? It comes
up as US-ASCII on my system. Simply setting it doesn't work:


You will have to use a terminal that accepts UTF-8.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 18 '06 #7
Ron Garret wrote:
In article <ma***************************************@python. org>,
Robert Kern <ro*********@gmail.com> wrote:
Ron Garret wrote:
I forgot to mention:

>>>sys.getdefaultencoding()

'utf-8'


A) You shouldn't be able to do that.


What can I say? I can.
B) Don't do that.


OK. What should I do instead?


Exact answer depends on what OS and terminal you are using and what
your program is supposed to do, are you going to distribute the program
or it's just for internal use.

May 18 '06 #8
In article <11**********************@j73g2000cwa.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
In article <ma***************************************@python. org>,
Robert Kern <ro*********@gmail.com> wrote:
Ron Garret wrote:

> I forgot to mention:
>
>>>>sys.getdefaultencoding()
>
> 'utf-8'

A) You shouldn't be able to do that.


What can I say? I can.
B) Don't do that.


OK. What should I do instead?


Exact answer depends on what OS and terminal you are using and what
your program is supposed to do, are you going to distribute the program
or it's just for internal use.


I'm using an OS X terminal to ssh to a Linux machine.

But what about this:
f2=open('foo','w')
f2.write(u'\xFF') Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)


That should have nothing to do with my terminal, right?

I just found http://www.amk.ca/python/howto/unicode, which seems to be
enlightening. The answer seems to be something like:

import codecs
f = codecs.open('foo','w','utf-8')

but that seems pretty awkward.

rg
May 18 '06 #9
Ron Garret wrote:
I'm using an OS X terminal to ssh to a Linux machine.
Click on the "Terminal" menu, then "Window Settings...". Choose "Display" from
the combobox. At the bottom you will see a combobox title "Character Set
Encoding". Choose "Unicode (UTF-8)".
But what about this:
f2=open('foo','w')
f2.write(u'\xFF')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)

That should have nothing to do with my terminal, right?


Correct, that is a different problem. f.write() expects a string of bytes, not a
unicode string. In order to convert unicode strings to byte strings without an
explicit .encode() method call, Python uses the default encoding which is
'ascii'. It's not easily changeable for a good reason. Your modules won't work
on anyone else's machine if you hack that setting.
I just found http://www.amk.ca/python/howto/unicode, which seems to be
enlightening. The answer seems to be something like:

import codecs
f = codecs.open('foo','w','utf-8')

but that seems pretty awkward.


<shrug> About as clean as it gets when dealing with text encodings.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 18 '06 #10
Ron Garret wrote:

But what about this:
f2=open('foo','w')
f2.write(u'\xFF') Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)


That should have nothing to do with my terminal, right?


Correct. But first try to answer this: given that you want to write the
Unicode character value 255 to a file, how is that character to be
represented in the file?

For example, one might think that one could just get a byte whose value
is 255 and write that to a file, but what happens if one chooses a
Unicode character whose value is greater than 255? One could use two
bytes or three bytes or as many as one needs, but what if the lowest 8
bits of that value are all set? How would one know, if one reads a file
back and gets a byte whose value is 255 whether it represents a
character all by itself or is part of another character's
representation? It gets complicated!

The solution is that you choose an encoding which allows you to store
the characters in the file, thus answering indirectly the question
above: encodings determine how the characters are represented in the
file and allow you to read the file and get back the characters you put
into it. One of the most common encodings suitable for the storage of
Unicode character values is UTF-8, which has been designed with the
above complications in mind, but as long as you remember to choose an
encoding, you don't have to think about it: Python takes care of the
difficult stuff on your behalf. In the above code you haven't made that
choice.

So, to answer the above question, you can either...

* Use the encode method on Unicode objects to turn them into plain
strings, then write them to a file - at that point, you are
writing specific byte values.
* Use the codecs.open function and other codecs module features to
write Unicode objects directly to files and streams - here, the
module's infrastructure deals with byte-level issues.
* If you're using something like an XML library, you can often pass a
normal file or stream object to some function or method whilst
stating the output encoding.

There is no universally correct answer to which encoding should be used
when writing Unicode character values to files, contrary to some
beliefs and opinions which, for example, lead to people pretending that
everything is in UTF-8 in order to appease legacy applications with the
minimum of tweaks necessary to stop them from breaking completely.
Thus, Python doesn't make a decision for you here.

Paul

May 18 '06 #11
In article <ma***************************************@python. org>,
Robert Kern <ro*********@gmail.com> wrote:
Ron Garret wrote:
I'm using an OS X terminal to ssh to a Linux machine.


Click on the "Terminal" menu, then "Window Settings...". Choose "Display"
from
the combobox. At the bottom you will see a combobox title "Character Set
Encoding". Choose "Unicode (UTF-8)".


It was already set to UTF-8.
But what about this:
>f2=open('foo','w')
>f2.write(u'\xFF')


Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)

That should have nothing to do with my terminal, right?


Correct, that is a different problem. f.write() expects a string of bytes,
not a
unicode string. In order to convert unicode strings to byte strings without
an
explicit .encode() method call, Python uses the default encoding which is
'ascii'. It's not easily changeable for a good reason. Your modules won't
work
on anyone else's machine if you hack that setting.


OK.
I just found http://www.amk.ca/python/howto/unicode, which seems to be
enlightening. The answer seems to be something like:

import codecs
f = codecs.open('foo','w','utf-8')

but that seems pretty awkward.


<shrug> About as clean as it gets when dealing with text encodings.


OK. Thanks.

rg
May 19 '06 #12
Ron Garret wrote:
In article <11**********************@j73g2000cwa.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
In article <ma***************************************@python. org>,
Robert Kern <ro*********@gmail.com> wrote:

> Ron Garret wrote:
>
> > I forgot to mention:
> >
> >>>>sys.getdefaultencoding()
> >
> > 'utf-8'
>
> A) You shouldn't be able to do that.

What can I say? I can.

> B) Don't do that.

OK. What should I do instead?


Exact answer depends on what OS and terminal you are using and what
your program is supposed to do, are you going to distribute the program
or it's just for internal use.


I'm using an OS X terminal to ssh to a Linux machine.


In theory it should work out of the box. OS X terminal should set
enviromental variable LANG=en_US.utf-8, then ssh should transfer this
variable to Linux and python will know that your terminal is utf-8.
Unfortunately AFAIK OS X terminal doesn't set that variable and most
(all?) ssh clients don't transfer it between machines. As a workaround
you can set that variable on linux yourself . This should work in the
command line right away:

LANG=en_US.utf-8 python -c "print unichr(0xbd)"

Or put the following line in ~/.bashrc and logout/login

export LANG=en_US.utf-8

May 19 '06 #13
Ron Garret wrote:
In article <ma***************************************@python. org>,
Robert Kern <ro*********@gmail.com> wrote:
Ron Garret wrote:
I'm using an OS X terminal to ssh to a Linux machine.


Click on the "Terminal" menu, then "Window Settings...". Choose "Display"
from
the combobox. At the bottom you will see a combobox title "Character Set
Encoding". Choose "Unicode (UTF-8)".


It was already set to UTF-8.


Then take a look at your LANG environment variable on your Linux machine. For
example, I have LANG=en_US.UTF-8 on my Linux machine, and I can ssh into it from
a UTF-8-configured Terminal.app and print unicode strings just fine.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 19 '06 #14
In article <11**********************@y43g2000cwc.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
In article <11**********************@j73g2000cwa.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
> In article <ma***************************************@python. org>,
> Robert Kern <ro*********@gmail.com> wrote:
>
> > Ron Garret wrote:
> >
> > > I forgot to mention:
> > >
> > >>>>sys.getdefaultencoding()
> > >
> > > 'utf-8'
> >
> > A) You shouldn't be able to do that.
>
> What can I say? I can.
>
> > B) Don't do that.
>
> OK. What should I do instead?

Exact answer depends on what OS and terminal you are using and what
your program is supposed to do, are you going to distribute the program
or it's just for internal use.


I'm using an OS X terminal to ssh to a Linux machine.


In theory it should work out of the box. OS X terminal should set
enviromental variable LANG=en_US.utf-8, then ssh should transfer this
variable to Linux and python will know that your terminal is utf-8.
Unfortunately AFAIK OS X terminal doesn't set that variable and most
(all?) ssh clients don't transfer it between machines. As a workaround
you can set that variable on linux yourself . This should work in the
command line right away:

LANG=en_US.utf-8 python -c "print unichr(0xbd)"

Or put the following line in ~/.bashrc and logout/login

export LANG=en_US.utf-8


No joy.

ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
Traceback (most recent call last):
File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)
ron@www01:~$

rg
May 19 '06 #15
Ron Garret wrote:
I'm using an OS X terminal to ssh to a Linux machine.


In theory it should work out of the box. OS X terminal should set
enviromental variable LANG=en_US.utf-8, then ssh should transfer this
variable to Linux and python will know that your terminal is utf-8.
Unfortunately AFAIK OS X terminal doesn't set that variable and most
(all?) ssh clients don't transfer it between machines. As a workaround
you can set that variable on linux yourself . This should work in the
command line right away:

LANG=en_US.utf-8 python -c "print unichr(0xbd)"

Or put the following line in ~/.bashrc and logout/login

export LANG=en_US.utf-8


No joy.

ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
Traceback (most recent call last):
File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)
ron@www01:~$


What version of python and what shell do you run? What the following
commands print:

python -V
echo $SHELL
$SHELL --version

May 19 '06 #16
In article <11**********************@g10g2000cwb.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
> I'm using an OS X terminal to ssh to a Linux machine.

In theory it should work out of the box. OS X terminal should set
enviromental variable LANG=en_US.utf-8, then ssh should transfer this
variable to Linux and python will know that your terminal is utf-8.
Unfortunately AFAIK OS X terminal doesn't set that variable and most
(all?) ssh clients don't transfer it between machines. As a workaround
you can set that variable on linux yourself . This should work in the
command line right away:

LANG=en_US.utf-8 python -c "print unichr(0xbd)"

Or put the following line in ~/.bashrc and logout/login

export LANG=en_US.utf-8


No joy.

ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
Traceback (most recent call last):
File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)
ron@www01:~$


What version of python and what shell do you run? What the following
commands print:

python -V
echo $SHELL
$SHELL --version


ron@www01:~$ python -V
Python 2.3.4
ron@www01:~$ echo $SHELL
/bin/bash
ron@www01:~$ $SHELL --version
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
Copyright (C) 2002 Free Software Foundation, Inc.
ron@www01:~$
May 19 '06 #17
Ron Garret wrote:
In article <11**********************@g10g2000cwb.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
> > I'm using an OS X terminal to ssh to a Linux machine.
>
> In theory it should work out of the box. OS X terminal should set
> enviromental variable LANG=en_US.utf-8, then ssh should transfer this
> variable to Linux and python will know that your terminal is utf-8.
> Unfortunately AFAIK OS X terminal doesn't set that variable and most
> (all?) ssh clients don't transfer it between machines. As a workaround
> you can set that variable on linux yourself . This should work in the
> command line right away:
>
> LANG=en_US.utf-8 python -c "print unichr(0xbd)"
>
> Or put the following line in ~/.bashrc and logout/login
>
> export LANG=en_US.utf-8

No joy.

ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
Traceback (most recent call last):
File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)
ron@www01:~$


What version of python and what shell do you run? What the following
commands print:

python -V
echo $SHELL
$SHELL --version


ron@www01:~$ python -V
Python 2.3.4
ron@www01:~$ echo $SHELL
/bin/bash
ron@www01:~$ $SHELL --version
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
Copyright (C) 2002 Free Software Foundation, Inc.
ron@www01:~$


That's recent enough. I guess the distribution you're using set LC_*
variables for no good reason. Either unset all enviromental variables
starting with LC_ and set LANG variable or overide LC_CTYPE variable:

LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"

Should be working now :)

May 19 '06 #18
Serge Orlov wrote:
Ron Garret wrote:
In article <11**********************@g10g2000cwb.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
> > > I'm using an OS X terminal to ssh to a Linux machine.
> >
> > In theory it should work out of the box. OS X terminal should set
> > enviromental variable LANG=en_US.utf-8, then ssh should transfer this
> > variable to Linux and python will know that your terminal is utf-8.
> > Unfortunately AFAIK OS X terminal doesn't set that variable and most
> > (all?) ssh clients don't transfer it between machines. As a workaround
> > you can set that variable on linux yourself . This should work in the
> > command line right away:
> >
> > LANG=en_US.utf-8 python -c "print unichr(0xbd)"
> >
> > Or put the following line in ~/.bashrc and logout/login
> >
> > export LANG=en_US.utf-8
>
> No joy.
>
> ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
> Traceback (most recent call last):
> File "<string>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
> position 0: ordinal not in range(128)
> ron@www01:~$

What version of python and what shell do you run? What the following
commands print:

python -V
echo $SHELL
$SHELL --version


ron@www01:~$ python -V
Python 2.3.4
ron@www01:~$ echo $SHELL
/bin/bash
ron@www01:~$ $SHELL --version
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
Copyright (C) 2002 Free Software Foundation, Inc.
ron@www01:~$


That's recent enough. I guess the distribution you're using set LC_*
variables for no good reason. Either unset all enviromental variables
starting with LC_ and set LANG variable or overide LC_CTYPE variable:

LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"

Should be working now :)


I've pulled myself together and installed linux in vwware player.
Apparently there is another way linux distributors can screw up. I
chose debian 3.1 minimal network install and after answering all
installation questions I found that only ascii and latin-1 english
locales were installed:
$ locale -a
C
en_US
en_US.iso88591
POSIX

In 2006, I would expect utf-8 english locale to be present even in
minimal install. I had to edit /etc/locale.gen and run locale-gen as
root. After that python started to print unicode characters.

May 19 '06 #19
Ron Garret a crit :
In article <ma***************************************@python. org>,
Fredrik Lundh <fr*****@pythonware.com> wrote:
Ron Garret wrote:
>> u'\xbd'
u'\xbd'
>> print _
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)

so stdout on your machine is ascii, and you don't understand why you
cannot print a non-ascii unicode character to it? wtf?

</F>


I forgot to mention:
sys.getdefaultencoding() 'utf-8' print u'\xbd' Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)


This is default encoding for evaluation of expressions in u"..."
strings, this has nothing to do with printing.

For the output encoding, see sys.stdout.encoding.
import sys
sys.stdout.encoding 'cp850'


A+

Laurent.
May 19 '06 #20
Fredrik Lundh wrote:
Ron Garret wrote:
> u'\xbd'

u'\xbd'
> print _

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)


so stdout on your machine is ascii, and you don't understand why you
cannot print a non-ascii unicode character to it? wtf?

</F>


AFAIK, I'm all ASCII (at least, I never made explicit changes to the
default Python install), so how am I able to print out the character?
May 19 '06 #21
John Salerno wrote:
AFAIK, I'm all ASCII (at least, I never made explicit changes to the
default Python install), so how am I able to print out the character?


Because sys.stdout.encoding isn't determined by your Python configuration, but
your terminal's.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 19 '06 #22

Robert> Because sys.stdout.encoding isn't determined by your Python
Robert> configuration, but your terminal's.

Learn something every day. I take it "646" is an alias for "ascii" (or vice
versa)?

% python
Python 2.4.2 (#1, Feb 23 2006, 12:48:31)
[GCC 3.4.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
import sys
sys.stdout.encoding '646' import codecs
codecs.lookup("646")

(<built-in function ascii_encode>, <built-in function ascii_decode>, <class encodings.ascii.StreamReader at 0x819aa4c>, <class encodings.ascii.StreamWriter at 0x819aa1c>)

Skip
May 19 '06 #23
sk**@pobox.com wrote:
Robert> Because sys.stdout.encoding isn't determined by your Python
Robert> configuration, but your terminal's.

Learn something every day. I take it "646" is an alias for "ascii" (or vice
versa)?

Hmm, not that this helps me any :)
import sys
sys.stdout.encoding 'cp1252' import codecs
codecs.lookup('cp1252') (<bound method Codec.encode of <encodings.cp1252.Codec instance at
0x009D6670>>, <bound method Codec.decode of <encodings.cp1252.Codec
instance at 0x009D6698>>, <class encodings.cp1252.StreamReader at
0x009CF360>, <class encodings.cp1252.StreamWriter at 0x009CF330>)

May 19 '06 #24

John> Hmm, not that this helps me any :)
import sys
sys.stdout.encoding

John> 'cp1252'

Sure it does. You can print Unicode objects which map to cp1252. I assume
that means you're on Windows or that for some perverse reason you have your
Mac's Terminal window set to cp1252. (Does it go there? I'm at work right
now so I can't check).

Skip
May 19 '06 #25
sk**@pobox.com wrote:
John> Hmm, not that this helps me any :)
>>>> import sys
>>>> sys.stdout.encoding

John> 'cp1252'

Sure it does. You can print Unicode objects which map to cp1252. I assume
that means you're on Windows or that for some perverse reason you have your
Mac's Terminal window set to cp1252. (Does it go there? I'm at work right
now so I can't check).

Skip


You're right, I'm on XP. I just couldn't make sense of the lookup call,
although some of the names looked like .NET classes.
May 19 '06 #26
sk**@pobox.com wrote:
Robert> Because sys.stdout.encoding isn't determined by your Python
Robert> configuration, but your terminal's.

Learn something every day. I take it "646" is an alias for "ascii" (or vice
versa)?

% python
Python 2.4.2 (#1, Feb 23 2006, 12:48:31)
[GCC 3.4.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding '646' >>> import codecs
>>> codecs.lookup("646")

(<built-in function ascii_encode>, <built-in function ascii_decode>, <class encodings.ascii.StreamReader at 0x819aa4c>, <class encodings.ascii.StreamWriter at 0x819aa1c>)


Yes. In encodings/aliases.py in the standard library:

"""
aliases = {

# Please keep this list sorted alphabetically by value !

# ascii codec
'646' : 'ascii',

"""

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 19 '06 #27
In article <11**********************@i40g2000cwc.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
In article <11**********************@g10g2000cwb.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Ron Garret wrote:
> > > I'm using an OS X terminal to ssh to a Linux machine.
> >
> > In theory it should work out of the box. OS X terminal should set
> > enviromental variable LANG=en_US.utf-8, then ssh should transfer this
> > variable to Linux and python will know that your terminal is utf-8.
> > Unfortunately AFAIK OS X terminal doesn't set that variable and most
> > (all?) ssh clients don't transfer it between machines. As a workaround
> > you can set that variable on linux yourself . This should work in the
> > command line right away:
> >
> > LANG=en_US.utf-8 python -c "print unichr(0xbd)"
> >
> > Or put the following line in ~/.bashrc and logout/login
> >
> > export LANG=en_US.utf-8
>
> No joy.
>
> ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
> Traceback (most recent call last):
> File "<string>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
> position 0: ordinal not in range(128)
> ron@www01:~$

What version of python and what shell do you run? What the following
commands print:

python -V
echo $SHELL
$SHELL --version
ron@www01:~$ python -V
Python 2.3.4
ron@www01:~$ echo $SHELL
/bin/bash
ron@www01:~$ $SHELL --version
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
Copyright (C) 2002 Free Software Foundation, Inc.
ron@www01:~$


That's recent enough. I guess the distribution you're using set LC_*
variables for no good reason.


Nope:

ron@www01:~$ export | grep LC
ron@www01:~$
Either unset all enviromental variables
starting with LC_ and set LANG variable or overide LC_CTYPE variable:

LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"

Should be working now :)


Nope:

ron@www01:~$ LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"
Traceback (most recent call last):
File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)

rg
May 19 '06 #28
In article <11**********************@g10g2000cwb.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:
Serge Orlov wrote:
Ron Garret wrote:
In article <11**********************@g10g2000cwb.googlegroups .com>,
"Serge Orlov" <Se*********@gmail.com> wrote:

> Ron Garret wrote:
> > > > I'm using an OS X terminal to ssh to a Linux machine.
> > >
> > > In theory it should work out of the box. OS X terminal should set
> > > enviromental variable LANG=en_US.utf-8, then ssh should transfer
> > > this
> > > variable to Linux and python will know that your terminal is utf-8.
> > > Unfortunately AFAIK OS X terminal doesn't set that variable and
> > > most
> > > (all?) ssh clients don't transfer it between machines. As a
> > > workaround
> > > you can set that variable on linux yourself . This should work in
> > > the
> > > command line right away:
> > >
> > > LANG=en_US.utf-8 python -c "print unichr(0xbd)"
> > >
> > > Or put the following line in ~/.bashrc and logout/login
> > >
> > > export LANG=en_US.utf-8
> >
> > No joy.
> >
> > ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
> > Traceback (most recent call last):
> > File "<string>", line 1, in ?
> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
> > position 0: ordinal not in range(128)
> > ron@www01:~$
>
> What version of python and what shell do you run? What the following
> commands print:
>
> python -V
> echo $SHELL
> $SHELL --version

ron@www01:~$ python -V
Python 2.3.4
ron@www01:~$ echo $SHELL
/bin/bash
ron@www01:~$ $SHELL --version
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
Copyright (C) 2002 Free Software Foundation, Inc.
ron@www01:~$


That's recent enough. I guess the distribution you're using set LC_*
variables for no good reason. Either unset all enviromental variables
starting with LC_ and set LANG variable or overide LC_CTYPE variable:

LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"

Should be working now :)


I've pulled myself together and installed linux in vwware player.
Apparently there is another way linux distributors can screw up. I
chose debian 3.1 minimal network install and after answering all
installation questions I found that only ascii and latin-1 english
locales were installed:
$ locale -a
C
en_US
en_US.iso88591
POSIX

In 2006, I would expect utf-8 english locale to be present even in
minimal install. I had to edit /etc/locale.gen and run locale-gen as
root. After that python started to print unicode characters.


That's it. Thanks!

rg
May 19 '06 #29
sk**@pobox.com wrote:
Learn something every day. I take it "646" is an alias for "ascii" (or vice
versa)?


Usage of "646" as an alias for ASCII is primarily a Sun invention. When
ASCII became an international standard, its standard number became
ISO/IEC 646:1968. It's not *quite* the same as ASCII, as it leaves a
certain number of code points unassigned that ASCII defines (most
notably, the dollar sign, and the square and curly braces). What Sun
means is probably the "International Reference Version" of ISO 646,
which is (now) identical to ASCII.

Regards,
Martin

May 21 '06 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Pekka Niiranen | last post by:
Hi, I have a multiuser script, that I would like to convert to Python. The users open simultaneous telnet -sessions from win2000 to an unix machine and possibly edit unicode textfiles. Currently...
11
by: Marian Aldenhvel | last post by:
Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain...
2
by: Fuzzyman | last post by:
How does the print statement decode unicode strings itis passed ? (By that I mean which encoding does it use). Under windows it doesn't appear to use defaultencoding. On my system the default...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is...
1
by: sheldon.regular | last post by:
I am new to unicode so please bear with my stupidity. I am doing the following in a Python IDE called Wing with Python 23. äöü äöü '\xc3\xa4\xc3\xb6\xc3\xbc' u'\xe4\xf6\xfc'...
3
by: 7stud | last post by:
Can anyone tell me why I can print out the individual variables in the following code, but when I print them out combined into a single string, I get an error? symbol = u'ibm' price = u'4 \xbd'...
5
by: Xah Lee | last post by:
If i have a nested list, where the atoms are unicode strings, e.g. # -*- coding: utf-8 -*- ttt=, ,...] print ttt how can i print it without getting the u'\u1234' notation? i.e. i want it...
2
by: David | last post by:
Hi list. I've never used unicode in a Python script before, but I need to now. I'm not sure where to start. I'm hoping that a kind soul can help me out here. My current (almost non-existant)...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.