473,241 Members | 1,501 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,241 software developers and data experts.

printing list containing unicode string

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"",u""], [u"¦"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u""], ...]

I can of course write a loop then for each string use
"encode("utf-8")", but is there a easier way?

Thx.

Xah
xa*@xahlee.org
http://xahlee.org/

Sep 10 '07 #1
5 6870
On Mon, 2007-09-10 at 06:59 -0700, Xah Lee wrote:
If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"→",u"↑"], [u"αβγ"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"→"], ...]

I can of course write a loop then for each string use
"encode("utf-8")", but is there a easier way?
It's not quite clear why you want to do this, but this is how you could
do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")

However, I am getting the impression that this is a "How can I use 'X'
to achieve 'Y'?" question instead of the preferable "How can I achieve
'Y'?" type of question. In other words, printing the repr() of a list
might not be the best solution to reach the actual goal, which you have
not stated.

HTH,

--
Carsten Haese
http://informixdb.sourceforge.net
Sep 10 '07 #2
On Sep 10, 8:12 am, Carsten Haese <cars...@uniqsys.comwrote:
Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"",u""], [u"¦"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u""], ...]


Carsten Haese wrote:

It's not quite clear why you want to do this, but this is how you
could
do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")


Super! Thanks a lot.

About why i want to... i think it's just simpler and easier on the
eye?

here's a example output from my program:
[[u' ', 1022], [u'', 472], [u' ', 128], [u'w', 300], [u's', 12],
[u'|', 184],...]

wouldn't it be preferable if Python print like this by default...

Xah
xa*@xahlee.org
http://xahlee.org/

Sep 10 '07 #3
Google groups seems to be stripping my quotation marks lately.
Here's a retry to post my previous message.

--------------------------------------------------------------

Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.
# -*- coding: utf-8 -*-
ttt=[[u"",u""], [u"¦"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u""], ...]


Carsten Haese wrote:

It's not quite clear why you want to do this, but this is how you
could do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")


Super! Thanks a lot.

About why i want to... i think it's just simpler and easier on the
eye?

here's a example output from my program:
[[u' ', 1022], [u'', 472], [u' ', 128], [u'w', 300], [u's', 12],
[u'|', 184],...]

wouldn't it be preferable if Python print like this by default...

Xah
x...@xahlee.org
http://xahlee.org/

Sep 10 '07 #4
Xah Lee wrote:
This post is about some notes and corrections to a online article
regarding unicod and python.

--------------

by happenstance i was reading:

Unicode HOWTO
http://www.amk.ca/python/howto/unicode

Here's some problems i see:

No conspicuous authorship. (however, oddly, it has a conspicuous
acknowledgement of names listing.) (This problem is a indirect
consequence of communism fanatism ushered by OpenSource movement)
(Originally i was just going to write to the author on some
corrections.)

It's very wasteful of space. In most texts, the majority of the
code points are less than 127, or less than 255, so a lot of space is
occupied by zero bytes.

Not true. In Asia, most chars has unicode number above 255. Considered
globally, *possibly* today there are more computer files in Chinese
than in all latin-alphabet based lang.
That's an interesting point. I'd be interested to see numbers on
that, and how those numbers have changed over the past five years.
Sadly, such data is most likely impossible to obtain.

However, it should be pointed out that most *code*, whether written in
the United States, New Zealand, India, China, or Botswana is written
in English. In part because it has become a standard of sorts, much
as italian was a standard for musical notation, due in part to the
US's former (and perhaps current, but certainly fading) dominance in
the field, and in part to the lack of solid support for unicode among
many programming languages and compilers. Thus the author's bias, while
inaccurate, is still understandable.
Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.

Not sure what he mean by "can't handle content with embedded zero
bytes". Overall i think this sentence is silly, and he's probably
thinking in unix/linux.

Encodings don't have to handle every possible Unicode
character, ....

This is inane. A encoding, by definition, turns numbers into binary
numbers (in our context, it means a encoding handles all unicode chars
by definition). What he really meant to say is something like this:
"Practically speaking, most computer languages in western society
don't need to support unicode with respect to the language's source
file"


UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
...
As mentioned before, by definition, any Unicode encoding encodes all
unicode char set. The mentioning of above as a "convenient property"
is inane.
No, it's not inane. UCS-2, for example, is a fixed width, 2-byte
encoding that can handle any unicode code point up to 0xffff, but
cannot handle the 3 and 4 byte extension sets. UCS-2 was developed
for applications in which having fixed width characters is essential,
but has the limitations of not being able to handle any Unicode code
point. IIRC, when it was developed, it did handle every code point,
and then Unicode grew. There is also a UCS-4 to handle this
limitation. UTF-16 is based on a two-byte unit, but is variable
width, like UTF-8, which makes it flexible enough to handle any code
point, but harder to process, and a bear to seek through to a certain
point.

(I'm politely ignoring your ill-reasoned attacks on non-Microsoft OSes).

Cheers,
Cliff
Sep 11 '07 #5
On Mon, 10 Sep 2007 19:26:20 -0700, Xah Lee wrote:
・ Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.

Not sure what he mean by "can't handle content with embedded zero
bytes". Overall i think this sentence is silly, and he's probably
thinking in unix/linux.
No he's probably thinking of all the text based protocols (HTTP, SMTP, …)
and that one of the most used programming languages, C, can't cope with
embedded null bytes in strings.
・ Encodings don't have to handle every possible Unicode
character, ....

This is inane. A encoding, by definition, turns numbers into binary
numbers (in our context, it means a encoding handles all unicode chars
by definition).
How do you encode chinese characters with the ISO-8859-1 encoding? This
encoding obviously doesn't handle *all* unicode characters.

UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
...
As mentioned before, by definition, any Unicode encoding encodes all
unicode char set. The mentioning of above as a "convenient property"
is inane.
You are being silly here.

Ciao,
Marc 'BlackJack' Rintsch
Sep 11 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Marian Aldenhvel | last post by:
Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain...
0
by: Chris | last post by:
Hi, I found this code to send print direct to printer. It works perfect. Imports System Imports System.Text Imports System.Runtime.InteropServices <StructLayout(LayoutKind.Sequential)> _...
29
by: Ron Garret | last post by:
>>> u'\xbd' u'\xbd' >>> print _ Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in...
1
by: sheldon.regular | last post by:
I am new to unicode so please bear with my stupidity. I am doing the following in a Python IDE called Wing with Python 23. äöü äöü '\xc3\xa4\xc3\xb6\xc3\xbc' u'\xe4\xf6\xfc'...
7
by: aine_canby | last post by:
Hi, Im totally new to Python so please bare with me. Data is entered into my program using the folling code - str = raw_input(command) words = str.split() for word in words:
9
by: Jim | last post by:
Hello, I'm trying to write exception-handling code that is OK in the presence of unicode error messages. I seem to have gotten all mixed up and I'd appreciate any un-mixing that anyone can...
0
by: =?Utf-8?B?TWlrZTk5MDA=?= | last post by:
Hello, I would like to print a variable in VB 6 in Unicode string. The pariable is passed by a .NET app to VB 6 and we want the VB 6 app to print the string in unicode string, because the .NET...
2
by: David | last post by:
Hi list. I've never used unicode in a Python script before, but I need to now. I'm not sure where to start. I'm hoping that a kind soul can help me out here. My current (almost non-existant)...
6
by: dudeja.rajat | last post by:
Hi, How to check if something is a list or a dictionary or just a string? Eg: for item in self.__libVerDict.itervalues(): self.cbAnalysisLibVersion(END, item) where __libVerDict is a...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, youll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.