468,769 Members | 2,310 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,769 developers. It's quick & easy.

unicode newbie - printing mixed languages to the terminal

Hi list.

I've never used unicode in a Python script before, but I need to now.
I'm not sure where to start. I'm hoping that a kind soul can help me
out here.

My current (almost non-existant) knowledge of unicode:
>From the docs I know about the unicode string type, and how to declare
string types. What I don't understand yet is what encodings are and
when you'd want/need to use them. What I'd like is to just be able to
print out unicode strings in mixed languages, and they'd appear on the
terminal the same way they get shown in a web browser (when you have
appropriate fonts installed), without any fuss.

Here is an example of how I'd like my script to work:

$ ./test.py

Random hiragana: <some jp characters>
Random romaji: kakikukeko

Is this possible?
>From my limited knowledge, I *think* I need to do to things:
1) In my Python script, run .encode() on my unicode variable before
printing it out (I assume I need to encode into Japanese)

Question: How does this work when you have multiple languages in a
single unicode string? Do you need to separate them into separate
strings (one per language) and print them separately?

Or is it the case that you can (unlike a web browser) *only*
display/print one language at a time? (I really want mixed language -
English AND Japanese).

2) Setup the terminal to display the output. From various online docs
it looks like I need to set the LANG environment variable to Japanese,
and then start konsole (or gnome-terminal if that will work better).
But again, it looks like this limits me to 1 language.

If what I want to do is very hard, I'll output html instead and view
it in a web browser. But I'd prefer to use the terminal instead if
possible :-)

Thanks in advance.

David.
Jun 27 '08 #1
2 1392
David wrote:
Hi list.

I've never used unicode in a Python script before, but I need to now.
I'm not sure where to start. I'm hoping that a kind soul can help me
out here.

My current (almost non-existant) knowledge of unicode:
>>From the docs I know about the unicode string type, and how to declare
string types. What I don't understand yet is what encodings are and
when you'd want/need to use them. What I'd like is to just be able to
print out unicode strings in mixed languages, and they'd appear on the
terminal the same way they get shown in a web browser (when you have
appropriate fonts installed), without any fuss.

Here is an example of how I'd like my script to work:

$ ./test.py

Random hiragana: <some jp characters>
Random romaji: kakikukeko

Is this possible?
>>From my limited knowledge, I *think* I need to do to things:

1) In my Python script, run .encode() on my unicode variable before
printing it out (I assume I need to encode into Japanese)

Question: How does this work when you have multiple languages in a
single unicode string? Do you need to separate them into separate
strings (one per language) and print them separately?

Or is it the case that you can (unlike a web browser) *only*
display/print one language at a time? (I really want mixed language -
English AND Japanese).

2) Setup the terminal to display the output. From various online docs
it looks like I need to set the LANG environment variable to Japanese,
and then start konsole (or gnome-terminal if that will work better).
But again, it looks like this limits me to 1 language.

If what I want to do is very hard, I'll output html instead and view
it in a web browser. But I'd prefer to use the terminal instead if
possible :-)
I suggest you read http://www.amk.ca/python/howto/unicode to demystify
what Unicode is and does, and how to use it in Python.

Printing text from different languages is possible if and only if the
output device (terminal, in this case) supports a character encoding
that accommodates all the characters you wish to print. UTF-8 is a
fairly ubiquitous candidate that fits that criteria, since it
encompasses Unicode in its entirety (as opposed to latin-1, for example,
which only includes a very small subset of Unicode).

HTH,

--
Carsten Haese
http://informixdb.sourceforge.net
Jun 27 '08 #2
I suggest you read http://www.amk.ca/python/howto/unicode to demystify what
Unicode is and does, and how to use it in Python.
That document really helped.

This page helped me to setup the console:http://www.jw-stumpel.nl/stestu.html#T3

I ran:

dpkg-reconfigure locales # And enabled a en_ZA.utf8
update-locale LANG=en_ZA.utf8

(And then rebooted, but I don't know if that was necessary).

I can now print mixed language unicode to the console from Python.

Thanks for your help.

David.
Jun 27 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Fuzzyman | last post: by
29 posts views Thread by Ron Garret | last post: by
22 posts views Thread by Filipe | last post: by
8 posts views Thread by sonald | last post: by
9 posts views Thread by Jim | last post: by
4 posts views Thread by Rehceb Rotkiv | last post: by
7 posts views Thread by 7stud | last post: by
8 posts views Thread by Yves Dorfsman | last post: by
1 post views Thread by Marin | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.