printing list containing unicode string

Xah Lee

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"¡ú",u"¡ü"], [u"¦Á¦Â¦Ã"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"¡ú"], ...]

I can of course write a loop then for each string use
"encode("utf-8")", but is there a easier way?

Thx.

Xah
xa*@xahlee.org
¡Æ http://xahlee.org/

Sep 10 '07 #1

Subscribe Post Reply

6892

Carsten Haese

On Mon, 2007-09-10 at 06:59 -0700, Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"â†’",u"â†‘"], [u"Î±Î²Î³"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"â†’"], ...]

I can of course write a loop then for each string use
"encode("utf-8")", but is there a easier way?

It's not quite clear why you want to do this, but this is how you could
do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")

However, I am getting the impression that this is a "How can I use 'X'
to achieve 'Y'?" question instead of the preferable "How can I achieve
'Y'?" type of question. In other words, printing the repr() of a list
might not be the best solution to reach the actual goal, which you have
not stated.

HTH,

--
Carsten Haese
http://informixdb.sourceforge.net

Sep 10 '07 #2

Xah Lee

On Sep 10, 8:12 am, Carsten Haese <cars...@uniqsys.comwrote:
Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.

# -*- coding: utf-8 -*-
ttt=[[u"¡ú",u"¡ü"], [u"¦Á¦Â¦Ã"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"¡ú"], ...]

Carsten Haese wrote:

It's not quite clear why you want to do this, but this is how you
could
do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")

Super! Thanks a lot.

About why i want to... i think it's just simpler and easier on the
eye?

here's a example output from my program:
[[u' ', 1022], [u'¡ü', 472], [u' ', 128], [u'¡úw', 300], [u'¡ús', 12],
[u'¡ú|', 184],...]

wouldn't it be preferable if Python print like this by default...

Xah
xa*@xahlee.org
¡Æ http://xahlee.org/

Sep 10 '07 #3

Xah Lee

Google groups seems to be stripping my quotation marks lately.
Here's a retry to post my previous message.

--------------------------------------------------------------

Xah Lee wrote:

If i have a nested list, where the atoms are unicode strings, e.g.
# -*- coding: utf-8 -*-
ttt=[[u"¡ú",u"¡ü"], [u"¦Á¦Â¦Ã"],...]
print ttt

how can i print it without getting the u'\u1234' notation?
i.e. i want it print just like this: [[u"¡ú"], ...]

Carsten Haese wrote:

It's not quite clear why you want to do this, but this is how you
could do it:

print repr(ttt).decode("unicode_escape").encode("utf-8")

Super! Thanks a lot.

About why i want to... i think it's just simpler and easier on the
eye?

here's a example output from my program:
[[u' ', 1022], [u'¡ü', 472], [u' ', 128], [u'¡úw', 300], [u'¡ús', 12],
[u'¡ú|', 184],...]

wouldn't it be preferable if Python print like this by default...

Xah
x...@xahlee.org
¡Æ http://xahlee.org/

Sep 10 '07 #4

J. Cliff Dyer

Xah Lee wrote:

This post is about some notes and corrections to a online article
regarding unicod and python.

--------------

by happenstance i was reading:

Unicode HOWTO
http://www.amk.ca/python/howto/unicode

Here's some problems i see:

¡¤ No conspicuous authorship. (however, oddly, it has a conspicuous
acknowledgement of names listing.) (This problem is a indirect
consequence of communism fanatism ushered by OpenSource movement)
(Originally i was just going to write to the author on some
corrections.)

¡¤ It's very wasteful of space. In most texts, the majority of the
code points are less than 127, or less than 255, so a lot of space is
occupied by zero bytes.

Not true. In Asia, most chars has unicode number above 255. Considered
globally, *possibly* today there are more computer files in Chinese
than in all latin-alphabet based lang.

That's an interesting point. I'd be interested to see numbers on
that, and how those numbers have changed over the past five years.
Sadly, such data is most likely impossible to obtain.

However, it should be pointed out that most *code*, whether written in
the United States, New Zealand, India, China, or Botswana is written
in English. In part because it has become a standard of sorts, much
as italian was a standard for musical notation, due in part to the
US's former (and perhaps current, but certainly fading) dominance in
the field, and in part to the lack of solid support for unicode among
many programming languages and compilers. Thus the author's bias, while
inaccurate, is still understandable.

¡¤ Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.

Not sure what he mean by "can't handle content with embedded zero
bytes". Overall i think this sentence is silly, and he's probably
thinking in unix/linux.

¡¤ Encodings don't have to handle every possible Unicode
character, ....

This is inane. A encoding, by definition, turns numbers into binary
numbers (in our context, it means a encoding handles all unicode chars
by definition). What he really meant to say is something like this:
"Practically speaking, most computer languages in western society
don't need to support unicode with respect to the language's source
file"

¡¤
UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
...
As mentioned before, by definition, any Unicode encoding encodes all
unicode char set. The mentioning of above as a "convenient property"
is inane.

No, it's not inane. UCS-2, for example, is a fixed width, 2-byte
encoding that can handle any unicode code point up to 0xffff, but
cannot handle the 3 and 4 byte extension sets. UCS-2 was developed
for applications in which having fixed width characters is essential,
but has the limitations of not being able to handle any Unicode code
point. IIRC, when it was developed, it did handle every code point,
and then Unicode grew. There is also a UCS-4 to handle this
limitation. UTF-16 is based on a two-byte unit, but is variable
width, like UTF-8, which makes it flexible enough to handle any code
point, but harder to process, and a bear to seek through to a certain
point.

(I'm politely ignoring your ill-reasoned attacks on non-Microsoft OSes).

Cheers,
Cliff

Sep 11 '07 #5

Marc 'BlackJack' Rintsch

On Mon, 10 Sep 2007 19:26:20 -0700, Xah Lee wrote:

ãƒ» Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.

Not sure what he mean by "can't handle content with embedded zero
bytes". Overall i think this sentence is silly, and he's probably
thinking in unix/linux.

No he's probably thinking of all the text based protocols (HTTP, SMTP, â€¦)
and that one of the most used programming languages, C, can't cope with
embedded null bytes in strings.

ãƒ» Encodings don't have to handle every possible Unicode
character, ....

This is inane. A encoding, by definition, turns numbers into binary
numbers (in our context, it means a encoding handles all unicode chars
by definition).

How do you encode chinese characters with the ISO-8859-1 encoding? This
encoding obviously doesn't handle *all* unicode characters.

ãƒ»
UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
...
As mentioned before, by definition, any Unicode encoding encodes all
unicode char set. The mentioning of above as a "convenient property"
is inane.

You are being silly here.

Ciao,
Marc 'BlackJack' Rintsch

Sep 11 '07 #6

by: Marian Aldenhövel | last post by:

Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain...

Python

Help with printing a barcode

by: Chris | last post by:

Hi, I found this code to send print direct to printer. It works perfect. Imports System Imports System.Text Imports System.Runtime.InteropServices <StructLayout(LayoutKind.Sequential)> _...

.NET Framework

WTF? Printing unicode strings

by: Ron Garret | last post by:

>>> u'\xbd' u'\xbd' >>> print _ Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in...

Python

Printing UTF-8

by: sheldon.regular | last post by:

I am new to unicode so please bear with my stupidity. I am doing the following in a Python IDE called Wing with Python 23. Ã¤Ã¶Ã¼ Ã¤Ã¶Ã¼ '\xc3\xa4\xc3\xb6\xc3\xbc' u'\xe4\xf6\xfc'...

Python

Novice: replacing strings with unicode variables in a list

by: aine_canby | last post by:

Hi, Im totally new to Python so please bare with me. Data is entered into my program using the folling code - str = raw_input(command) words = str.split() for word in words:

Python

error messages containing unicode

by: Jim | last post by:

Hello, I'm trying to write exception-handling code that is OK in the presence of unicode error messages. I seem to have gotten all mixed up and I'd appreciate any un-mixing that anyone can...

Python

Receiving unicode string from .NET app in VB 6 and printing

by: =?Utf-8?B?TWlrZTk5MDA=?= | last post by:

Hello, I would like to print a variable in VB 6 in Unicode string. The pariable is passed by a .NET app to VB 6 and we want the VB 6 app to print the string in unicode string, because the .NET...

Visual Basic .NET

unicode newbie - printing mixed languages to the terminal

by: David | last post by:

Hi list. I've never used unicode in a Python script before, but I need to now. I'm not sure where to start. I'm hoping that a kind soul can help me out here. My current (almost non-existant)...

Python

How to check is something is a list or a dictionary or a string?

by: dudeja.rajat | last post by:

Hi, How to check if something is a list or a dictionary or just a string? Eg: for item in self.__libVerDict.itervalues(): self.cbAnalysisLibVersion(END, item) where __libVerDict is a...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

printing list containing unicode string

Similar topics