471,312 Members | 1,762 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,312 software developers and data experts.

printing list containing unicode string

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"",u""], [u"¦"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u""], ...]

I can of course write a loop then for each string use
"encode("utf-8")", but is there a easier way?

Thx.

Xah
xa*@xahlee.org
http://xahlee.org/

Sep 10 '07 #1
5 6654
On Mon, 2007-09-10 at 06:59 -0700, Xah Lee wrote:
If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"→",u"↑"], [u"αβγ"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"→"], ...]

I can of course write a loop then for each string use
"encode("utf-8")", but is there a easier way?
It's not quite clear why you want to do this, but this is how you could
do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")

However, I am getting the impression that this is a "How can I use 'X'
to achieve 'Y'?" question instead of the preferable "How can I achieve
'Y'?" type of question. In other words, printing the repr() of a list
might not be the best solution to reach the actual goal, which you have
not stated.

HTH,

--
Carsten Haese
http://informixdb.sourceforge.net
Sep 10 '07 #2
On Sep 10, 8:12 am, Carsten Haese <cars...@uniqsys.comwrote:
Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"",u""], [u"¦"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u""], ...]


Carsten Haese wrote:

It's not quite clear why you want to do this, but this is how you
could
do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")


Super! Thanks a lot.

About why i want to... i think it's just simpler and easier on the
eye?

here's a example output from my program:
[[u' ', 1022], [u'', 472], [u' ', 128], [u'w', 300], [u's', 12],
[u'|', 184],...]

wouldn't it be preferable if Python print like this by default...

Xah
xa*@xahlee.org
http://xahlee.org/

Sep 10 '07 #3
Google groups seems to be stripping my quotation marks lately.
Here's a retry to post my previous message.

--------------------------------------------------------------

Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.
# -*- coding: utf-8 -*-
ttt=[[u"",u""], [u"¦"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u""], ...]


Carsten Haese wrote:

It's not quite clear why you want to do this, but this is how you
could do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")


Super! Thanks a lot.

About why i want to... i think it's just simpler and easier on the
eye?

here's a example output from my program:
[[u' ', 1022], [u'', 472], [u' ', 128], [u'w', 300], [u's', 12],
[u'|', 184],...]

wouldn't it be preferable if Python print like this by default...

Xah
x...@xahlee.org
http://xahlee.org/

Sep 10 '07 #4
Xah Lee wrote:
This post is about some notes and corrections to a online article
regarding unicod and python.

--------------

by happenstance i was reading:

Unicode HOWTO
http://www.amk.ca/python/howto/unicode

Here's some problems i see:

No conspicuous authorship. (however, oddly, it has a conspicuous
acknowledgement of names listing.) (This problem is a indirect
consequence of communism fanatism ushered by OpenSource movement)
(Originally i was just going to write to the author on some
corrections.)

It's very wasteful of space. In most texts, the majority of the
code points are less than 127, or less than 255, so a lot of space is
occupied by zero bytes.

Not true. In Asia, most chars has unicode number above 255. Considered
globally, *possibly* today there are more computer files in Chinese
than in all latin-alphabet based lang.
That's an interesting point. I'd be interested to see numbers on
that, and how those numbers have changed over the past five years.
Sadly, such data is most likely impossible to obtain.

However, it should be pointed out that most *code*, whether written in
the United States, New Zealand, India, China, or Botswana is written
in English. In part because it has become a standard of sorts, much
as italian was a standard for musical notation, due in part to the
US's former (and perhaps current, but certainly fading) dominance in
the field, and in part to the lack of solid support for unicode among
many programming languages and compilers. Thus the author's bias, while
inaccurate, is still understandable.
Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.

Not sure what he mean by "can't handle content with embedded zero
bytes". Overall i think this sentence is silly, and he's probably
thinking in unix/linux.

Encodings don't have to handle every possible Unicode
character, ....

This is inane. A encoding, by definition, turns numbers into binary
numbers (in our context, it means a encoding handles all unicode chars
by definition). What he really meant to say is something like this:
"Practically speaking, most computer languages in western society
don't need to support unicode with respect to the language's source
file"


UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
...
As mentioned before, by definition, any Unicode encoding encodes all
unicode char set. The mentioning of above as a "convenient property"
is inane.
No, it's not inane. UCS-2, for example, is a fixed width, 2-byte
encoding that can handle any unicode code point up to 0xffff, but
cannot handle the 3 and 4 byte extension sets. UCS-2 was developed
for applications in which having fixed width characters is essential,
but has the limitations of not being able to handle any Unicode code
point. IIRC, when it was developed, it did handle every code point,
and then Unicode grew. There is also a UCS-4 to handle this
limitation. UTF-16 is based on a two-byte unit, but is variable
width, like UTF-8, which makes it flexible enough to handle any code
point, but harder to process, and a bear to seek through to a certain
point.

(I'm politely ignoring your ill-reasoned attacks on non-Microsoft OSes).

Cheers,
Cliff
Sep 11 '07 #5
On Mon, 10 Sep 2007 19:26:20 -0700, Xah Lee wrote:
・ Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.

Not sure what he mean by "can't handle content with embedded zero
bytes". Overall i think this sentence is silly, and he's probably
thinking in unix/linux.
No he's probably thinking of all the text based protocols (HTTP, SMTP, …)
and that one of the most used programming languages, C, can't cope with
embedded null bytes in strings.
・ Encodings don't have to handle every possible Unicode
character, ....

This is inane. A encoding, by definition, turns numbers into binary
numbers (in our context, it means a encoding handles all unicode chars
by definition).
How do you encode chinese characters with the ISO-8859-1 encoding? This
encoding obviously doesn't handle *all* unicode characters.

UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
...
As mentioned before, by definition, any Unicode encoding encodes all
unicode char set. The mentioning of above as a "convenient property"
is inane.
You are being silly here.

Ciao,
Marc 'BlackJack' Rintsch
Sep 11 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

11 posts views Thread by Marian Aldenhvel | last post: by
reply views Thread by Chris | last post: by
29 posts views Thread by Ron Garret | last post: by
1 post views Thread by sheldon.regular | last post: by
9 posts views Thread by Jim | last post: by
reply views Thread by =?Utf-8?B?TWlrZTk5MDA=?= | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.