473,388 Members | 1,326 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

printing list containing unicode string

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"¡ú",u"¡ü"], [u"¦Á¦Â¦Ã"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"¡ú"], ...]

I can of course write a loop then for each string use
"encode("utf-8")", but is there a easier way?

Thx.

Xah
xa*@xahlee.org
¡Æ http://xahlee.org/

Sep 10 '07 #1
5 6892
On Mon, 2007-09-10 at 06:59 -0700, Xah Lee wrote:
If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"→",u"↑"], [u"αβγ"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"→"], ...]

I can of course write a loop then for each string use
"encode("utf-8")", but is there a easier way?
It's not quite clear why you want to do this, but this is how you could
do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")

However, I am getting the impression that this is a "How can I use 'X'
to achieve 'Y'?" question instead of the preferable "How can I achieve
'Y'?" type of question. In other words, printing the repr() of a list
might not be the best solution to reach the actual goal, which you have
not stated.

HTH,

--
Carsten Haese
http://informixdb.sourceforge.net
Sep 10 '07 #2
On Sep 10, 8:12 am, Carsten Haese <cars...@uniqsys.comwrote:
Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"¡ú",u"¡ü"], [u"¦Á¦Â¦Ã"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"¡ú"], ...]


Carsten Haese wrote:

It's not quite clear why you want to do this, but this is how you
could
do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")


Super! Thanks a lot.

About why i want to... i think it's just simpler and easier on the
eye?

here's a example output from my program:
[[u' ', 1022], [u'¡ü', 472], [u' ', 128], [u'¡úw', 300], [u'¡ús', 12],
[u'¡ú|', 184],...]

wouldn't it be preferable if Python print like this by default...

Xah
xa*@xahlee.org
¡Æ http://xahlee.org/

Sep 10 '07 #3
Google groups seems to be stripping my quotation marks lately.
Here's a retry to post my previous message.

--------------------------------------------------------------

Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.
# -*- coding: utf-8 -*-
ttt=[[u"¡ú",u"¡ü"], [u"¦Á¦Â¦Ã"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"¡ú"], ...]


Carsten Haese wrote:

It's not quite clear why you want to do this, but this is how you
could do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")


Super! Thanks a lot.

About why i want to... i think it's just simpler and easier on the
eye?

here's a example output from my program:
[[u' ', 1022], [u'¡ü', 472], [u' ', 128], [u'¡úw', 300], [u'¡ús', 12],
[u'¡ú|', 184],...]

wouldn't it be preferable if Python print like this by default...

Xah
x...@xahlee.org
¡Æ http://xahlee.org/

Sep 10 '07 #4
Xah Lee wrote:
This post is about some notes and corrections to a online article
regarding unicod and python.

--------------

by happenstance i was reading:

Unicode HOWTO
http://www.amk.ca/python/howto/unicode

Here's some problems i see:

¡¤ No conspicuous authorship. (however, oddly, it has a conspicuous
acknowledgement of names listing.) (This problem is a indirect
consequence of communism fanatism ushered by OpenSource movement)
(Originally i was just going to write to the author on some
corrections.)

¡¤ It's very wasteful of space. In most texts, the majority of the
code points are less than 127, or less than 255, so a lot of space is
occupied by zero bytes.

Not true. In Asia, most chars has unicode number above 255. Considered
globally, *possibly* today there are more computer files in Chinese
than in all latin-alphabet based lang.
That's an interesting point. I'd be interested to see numbers on
that, and how those numbers have changed over the past five years.
Sadly, such data is most likely impossible to obtain.

However, it should be pointed out that most *code*, whether written in
the United States, New Zealand, India, China, or Botswana is written
in English. In part because it has become a standard of sorts, much
as italian was a standard for musical notation, due in part to the
US's former (and perhaps current, but certainly fading) dominance in
the field, and in part to the lack of solid support for unicode among
many programming languages and compilers. Thus the author's bias, while
inaccurate, is still understandable.
¡¤ Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.

Not sure what he mean by "can't handle content with embedded zero
bytes". Overall i think this sentence is silly, and he's probably
thinking in unix/linux.

¡¤ Encodings don't have to handle every possible Unicode
character, ....

This is inane. A encoding, by definition, turns numbers into binary
numbers (in our context, it means a encoding handles all unicode chars
by definition). What he really meant to say is something like this:
"Practically speaking, most computer languages in western society
don't need to support unicode with respect to the language's source
file"

¡¤
UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
...
As mentioned before, by definition, any Unicode encoding encodes all
unicode char set. The mentioning of above as a "convenient property"
is inane.
No, it's not inane. UCS-2, for example, is a fixed width, 2-byte
encoding that can handle any unicode code point up to 0xffff, but
cannot handle the 3 and 4 byte extension sets. UCS-2 was developed
for applications in which having fixed width characters is essential,
but has the limitations of not being able to handle any Unicode code
point. IIRC, when it was developed, it did handle every code point,
and then Unicode grew. There is also a UCS-4 to handle this
limitation. UTF-16 is based on a two-byte unit, but is variable
width, like UTF-8, which makes it flexible enough to handle any code
point, but harder to process, and a bear to seek through to a certain
point.

(I'm politely ignoring your ill-reasoned attacks on non-Microsoft OSes).

Cheers,
Cliff
Sep 11 '07 #5
On Mon, 10 Sep 2007 19:26:20 -0700, Xah Lee wrote:
・ Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.

Not sure what he mean by "can't handle content with embedded zero
bytes". Overall i think this sentence is silly, and he's probably
thinking in unix/linux.
No he's probably thinking of all the text based protocols (HTTP, SMTP, …)
and that one of the most used programming languages, C, can't cope with
embedded null bytes in strings.
・ Encodings don't have to handle every possible Unicode
character, ....

This is inane. A encoding, by definition, turns numbers into binary
numbers (in our context, it means a encoding handles all unicode chars
by definition).
How do you encode chinese characters with the ISO-8859-1 encoding? This
encoding obviously doesn't handle *all* unicode characters.
・
UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
...
As mentioned before, by definition, any Unicode encoding encodes all
unicode char set. The mentioning of above as a "convenient property"
is inane.
You are being silly here.

Ciao,
Marc 'BlackJack' Rintsch
Sep 11 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Marian Aldenhövel | last post by:
Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain...
0
by: Chris | last post by:
Hi, I found this code to send print direct to printer. It works perfect. Imports System Imports System.Text Imports System.Runtime.InteropServices <StructLayout(LayoutKind.Sequential)> _...
29
by: Ron Garret | last post by:
>>> u'\xbd' u'\xbd' >>> print _ Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in...
1
by: sheldon.regular | last post by:
I am new to unicode so please bear with my stupidity. I am doing the following in a Python IDE called Wing with Python 23. äöü äöü '\xc3\xa4\xc3\xb6\xc3\xbc' u'\xe4\xf6\xfc'...
7
by: aine_canby | last post by:
Hi, Im totally new to Python so please bare with me. Data is entered into my program using the folling code - str = raw_input(command) words = str.split() for word in words:
9
by: Jim | last post by:
Hello, I'm trying to write exception-handling code that is OK in the presence of unicode error messages. I seem to have gotten all mixed up and I'd appreciate any un-mixing that anyone can...
0
by: =?Utf-8?B?TWlrZTk5MDA=?= | last post by:
Hello, I would like to print a variable in VB 6 in Unicode string. The pariable is passed by a .NET app to VB 6 and we want the VB 6 app to print the string in unicode string, because the .NET...
2
by: David | last post by:
Hi list. I've never used unicode in a Python script before, but I need to now. I'm not sure where to start. I'm hoping that a kind soul can help me out here. My current (almost non-existant)...
6
by: dudeja.rajat | last post by:
Hi, How to check if something is a list or a dictionary or just a string? Eg: for item in self.__libVerDict.itervalues(): self.cbAnalysisLibVersion(END, item) where __libVerDict is a...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.