472,139 Members | 1,474 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,139 software developers and data experts.

How to print a unicode string?

I'd like to print out a unicode string.

I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.

From what I've googled, I think I need to set my locale. I don't
understand how.

import locale
print locale.getlocale()
--(None,None)
print locale.getdefaultlocal()
--('en_GB','cp1252')
print locale.normalize('en_GB.utf-8')
--en_GB.UTF8
locale.setlocale(locale.LC_ALL,'en_GB.UTF8')
-- locale.Error: unsupported locale setting

I'd be grateful for advice.
Damon.
Jun 27 '08 #1
13 3052
da**********@gmail.com writes:
From what I've googled, I think I need to set my locale. I don't
understand how.

import locale
print locale.getlocale()
--(None,None)
print locale.getdefaultlocal()
--('en_GB','cp1252')
print locale.normalize('en_GB.utf-8')
--en_GB.UTF8
locale.setlocale(locale.LC_ALL,'en_GB.UTF8')
-- locale.Error: unsupported locale setting

I'd be grateful for advice.
Just because the locale library knows the normalised name for it
doesn't mean it's available on your OS. Have you confirmed that your
OS (independent of Python) supports the locale you're trying to set?

--
\ "Oh, I love your magazine. My favorite section is 'How To |
`\ Increase Your Word Power'. That thing is really, really, |
_o__) really... good." -- Homer, _The Simpsons_ |
Ben Finney
Jun 27 '08 #2
On Apr 19, 12:51 am, Ben Finney wrote:
Just because the locale library knows the normalised name for it
doesn't mean it's available on your OS. Have you confirmed that your
OS (independent of Python) supports the locale you're trying to set?
No. How do I found out which locales my OS supports? (I'm running
Windows
XP.) Why does it matter what locales my OS supports, when all I want
is to set the encoding to be used for the output, and the output is
all
going to Emacs, and I know that Emacs supports utf8?
(Emacs 22.2.1 i386-mingw-nt5.1.2600.)

Damon.
Jun 27 '08 #3
From what I've googled, I think I need to set my locale.

Not on this operating system. On Windows, you need to change
your console. If it is a cmd.exe-style console, use chcp.
For IDLE, changing the output encoding is not supported.

If you want to output into a file, use codecs.open.

If you absolutely want to output UTF-8 to the terminal even
though the terminal will not be able to render it correctly,
use

sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)

HTH,
Martin
Jun 27 '08 #4
On Apr 18, 5:38*pm, damonwisc...@gmail.com wrote:
I'd like to print out a unicode string.

I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.

From what I've googled, I think I need to set my locale. I don't
understand how.

import locale
print locale.getlocale()
--(None,None)
print locale.getdefaultlocal()
--('en_GB','cp1252')
print locale.normalize('en_GB.utf-8')
--en_GB.UTF8
locale.setlocale(locale.LC_ALL,'en_GB.UTF8')
--*locale.Error: unsupported locale setting

I'd be grateful for advice.
Damon.
u_str = u'hell\u00F6 w\u00F6rld' #o's with umlauts

print u_str.encode('utf-8')

--output:--
hellö wörld
Jun 27 '08 #5
On Apr 18, 6:36*pm, 7stud <bbxx789_0...@yahoo.comwrote:
On Apr 18, 5:38*pm, damonwisc...@gmail.com wrote:
I'd like to print out a unicode string.
I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.
From what I've googled, I think I need to set my locale. I don't
understand how.
import locale
print locale.getlocale()
--(None,None)
print locale.getdefaultlocal()
--('en_GB','cp1252')
print locale.normalize('en_GB.utf-8')
--en_GB.UTF8
locale.setlocale(locale.LC_ALL,'en_GB.UTF8')
--*locale.Error: unsupported locale setting
I'd be grateful for advice.
Damon.

u_str = u'hell\u00F6 w\u00F6rld' *#o's with umlauts

print u_str.encode('utf-8')

--output:--
hellö wörld
Or maybe you want this:

u_str = u'hell\u00F6 w\u00F6rld'
regular_str = u_str.encode('utf-8')
print repr(regular_str)

--output:--
'hell\_x_c3\_x_b6 w\_x_c3\_x_b6rld'
#underscores added to keep your browser from rendering the utf-8
characters
Jun 27 '08 #6
On Apr 19, 1:36 am, 7stud <bbxx789_0...@yahoo.comwrote:
u_str = u'hell\u00F6 w\u00F6rld' #o's with umlauts
print u_str.encode('utf-8')

--output:--
hellö wörld
Maybe on your system. On my system, those same commands produce
hell\303\266 w\303\266rld

Those \303\266 symbols are single characters -- when I move around
with cursor keys, the cursor jumps across them with a single key-
press.

As I wrote, I'm running Python inside Emacs 22.2.1 (using python-
mode).

Damon.
Jun 27 '08 #7
da**********@gmail.com writes:
On Apr 19, 12:51 am, Ben Finney wrote:
Just because the locale library knows the normalised name for it
doesn't mean it's available on your OS. Have you confirmed that
your OS (independent of Python) supports the locale you're trying
to set?

No. How do I found out which locales my OS supports? (I'm running
Windows XP.)
Can't help you there.
Why does it matter what locales my OS supports, when all I want is
to set the encoding to be used for the output
Because the Python 'locale' module is all about using the OS's
(actually, the underlying C library's) locale support.

The locale you request with 'locale.setlocale' needs to be supported
by the locale database, which is independent of any specific
application, be it Python, Emacs, or otherwise.

--
\ "Two rules to success in life: 1. Don't tell people everything |
`\ you know." -- Sassan Tat |
_o__) |
Ben Finney
Jun 27 '08 #8
On Apr 19, 1:14 am, "Martin v. Löwis" <mar...@v.loewis.dewrote:
From what I've googled, I think I need to set my locale.

Not on this operating system. On Windows, you need to change
your console. If it is a cmd.exe-style console, use chcp.
For IDLE, changing the output encoding is not supported.

If you want to output into a file, use codecs.open.

If you absolutely want to output UTF-8 to the terminal even
though the terminal will not be able to render it correctly,
use

sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)
Thank you for the suggestion. As I said, I am running Python through
Emacs 22.2.1, so I doubt it is a cmd.exe-style console, and it most
certainly is not IDLE. I want to output to the Emacs buffer, via the
python-mode plugin for Emacs, not to a file.

I tried your suggestion of setting sys.stdout, and it works perfectly.
As I said, the output is going to Emacs, and Emacs _does_ know how to
render UTF-8.

How can I make this a global setting? Is it possible to change an
environment variable, so that Python uses this coding automatically?
Or pass a command-line argument when Emacs python-mode invokes the
Python interpreter? Or execute this line of Python in a startup script
which is invoked whenever a new Python session is started?

Thank you again for your help,
Damon.
Jun 27 '08 #9
On Apr 19, 1:53 am, Ben Finney wrote:
Damon Wischik writes:
>Why does it matter what locales my OS supports, when all I want is
to set the encoding to be used for the output

Because the Python 'locale' module is all about using the OS's
(actually, the underlying C library's) locale support.

The locale you request with 'locale.setlocale' needs to be supported
by the locale database, which is independent of any specific
application, be it Python, Emacs, or otherwise.
Let me try to ask a better question. It seems that the logical choice
of locale (en_GB.utf8) is not supported by my operating system.
Nonetheless, I want Python to output in utf-8, because I know for
certain that the terminal I am using (Emacs 22.2.1 with python-mode)
will display utf-8 correctly. It therefore seems that I cannot use the
locale mechanism to indicate to Python the encoding I want for
sys.stdout. What other mechanisms are there for me to indicate what I
want to Python?

Another poster pointed me to
>sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)
and this works great. All I want now is some reassurance that this is
the most appropriate way for me to achieve what I want (e.g. least
likely to break with future versions of Python, most in keeping with
the design of Python, easiest for me to maintain, etc.).

Damon.
Jun 27 '08 #10
On Apr 19, 12:38 am, Damon Wischik wrote:
I'd like to print out a unicode string.

I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.
Thank you everyone who was sent suggestions. Here is my solution (for
making Python output utf-8, and persuading Emacs 22.2.1 with python-
mode to print it).

1. Set the registry key HKEY_CURRENT_USER\Software\GNU\Emacs\Home to
have value "d:\documents\home". This makes Emacs look for a .emacs
file in this directory (the home directory).

2. Put a file called .emacs file in the home directory. It should
include these lines:
(setenv "PYTHONPATH" "d:/documents/home")
(prefer-coding-system 'utf-8)
The first line means that python will look in my home directory for
libraries etc. The second line tells Emacs to default to utf-8 for its
buffers. Without the second line, Emacs may default to a different
coding, and it will not know what to do when it receives utf-8.

3. Put a file called sitecustomize.py in the home directory. This file
should contain these lines:
import codecs
import sys
sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)

4. Now it should all work. If I enter
print u'La Pe\xf1a'
then it comes out with a n-tilde.

NB. An alternative solution is to edit site.py in the Python install
directory, and replace the line
encoding = "ascii" # Default value set by _PyUnicode_Init()
with
encoding = 'utf8'
But the trouble with this is that it will be overwritten if I install
a new version of Python.
NB. I also have these lines in my .emacs file, to load python-mode,
and to make it so that ctrl+enter executes the current paragraph:
; Python file association
(load "c:/program files/emacs-plugins/python-mode-1.0/python-mode.el")
(setq auto-mode-alist
(cons '("\\.py$" . python-mode) auto-mode-alist))
(setq interpreter-mode-alist
(cons '("python" . python-mode)
interpreter-mode-alist))
(autoload 'python-mode "python-mode" "Python editing mode." t)
; Note: the command for invoking Python is specified at the end,
; as a custom variable.
;; DJW's command to select the current paragraph, then execute-region.
(defun py-execute-paragraph (vis)
"Send the current paragraph to Python
Don't know what vis does."
(interactive "P")
(save-excursion
(forward-paragraph)
(let ((end (point)))
(backward-paragraph)
(py-execute-region (point) end ))))
(setq py-shell-switch-buffers-on-execute nil)
(global-set-key [(ctrl return)] 'py-execute-paragraph)

(custom-set-variables
;; custom-set-variables was added by Custom -- don't edit or cut/
paste it!
;; Your init file should contain only one such instance.
'(py-python-command "c:/program files/Python25/python.exe"))
Damon.
Jun 27 '08 #11
On Apr 18, 7:14 pm, "Martin v. Löwis" <mar...@v.loewis.dewrote:
From what I've googled, I think I need to set my locale.

Not on this operating system. On Windows, you need to change
your console. If it is a cmd.exe-style console, use chcp.
For IDLE, changing the output encoding is not supported.

If you want to output into a file, use codecs.open.

If you absolutely want to output UTF-8 to the terminal even
though the terminal will not be able to render it correctly,
use

sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)
And in Py3k?
>
HTH,
Martin
Jun 27 '08 #12
Is it possible to change an
environment variable, so that Python uses this coding automatically?
No.
Or pass a command-line argument when Emacs python-mode invokes the
Python interpreter?
No.
Or execute this line of Python in a startup script
which is invoked whenever a new Python session is started?
Yes, you can add the code I suggested to sitecustomize.py.

Regards,
Martin
Jun 27 '08 #13
On 2008-04-19 03:09, da**********@gmail.com wrote:
Another poster pointed me to
>>sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)
and this works great. All I want now is some reassurance that this is
the most appropriate way for me to achieve what I want (e.g. least
likely to break with future versions of Python, most in keeping with
the design of Python, easiest for me to maintain, etc.).
While the above works nicely for Unicode objects you write
to sys.stdout, you are going to have problems with non-ASCII
8-bit strings, e.g. binary data.

Python will have to convert these to Unicode before applying
the UTF-8 codec and uses the default encoding for this, which
is ASCII.

You could wrap sys.stdout using a codecs.EncodedFile() which provides
transparent recoding, but then you have problems with Unicode objects,
since the recoder assumes that it has to work with strings on input
(to e.g. the .write() method).

There's no ideal solution - it really depends a lot on what
your application does and how it uses strings and Unicode.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Apr 19 2008)
>>Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
__________________________________________________ ______________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
Jun 27 '08 #14

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Ricardo Bugalho | last post: by
9 posts views Thread by François Pinard | last post: by
5 posts views Thread by Tristan Miller | last post: by
12 posts views Thread by Peter Lin | last post: by
3 posts views Thread by Terry Hancock | last post: by
8 posts views Thread by =?gb2312?B?yMvR1MLkyNXKx8zs0cSjrM37vKvM7NHEsru8+7z | last post: by
reply views Thread by WaterWalk | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.