473,320 Members | 1,724 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Printing UTF-8

I am new to unicode so please bear with my stupidity.

I am doing the following in a Python IDE called Wing with Python 23.
>>s = "äöü"
print s
äöü
>>print s
äöü
>>s
'\xc3\xa4\xc3\xb6\xc3\xbc'
>>s.decode('utf-8')
u'\xe4\xf6\xfc'
>>u = s.decode('utf-8')
u
u'\xe4\xf6\xfc'
>>print u.encode('utf-8')
äöü
>>print u.encode('latin1')
äöü

Why can't I get äöü printed from utf-8 and I can from latin1? How
can I use utf-8 exclusivly and be able to print the characters?

I also did the same thing an the same machine in a command window...
ActivePython 2.3.2 Build 230 (ActiveState Corp.) based on
Python 2.3.2 (#49, Oct 24 2003, 13:37:57) [MSC v.1200 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>s = "äöü"
print s
äöü
>>s
'\x84\x94\x81'
>>s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
>>u = s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
>>>
Why such a difference from the IDE to the command window in what it can
do and the internal representation of the unicode?

Thanks,
Shel

Sep 21 '06 #1
1 7256
sh*************@gmail.com wrote:
I am new to unicode so please bear with my stupidity.

I am doing the following in a Python IDE called Wing with Python 23.
>s = "äöü"
>From later evidence, this string is encoded as utf-8. Looks like Wing
must be using an implicit "# coding: utf-8" for interactive input ...
>print s
äöü
.... but uses some other encoding for output. Try doing this, and see
what you get:
import sys
print sys.stdout.encoding
>print s
äöü
>s
'\xc3\xa4\xc3\xb6\xc3\xbc'
Yup, looks like utf-8 ...
>s.decode('utf-8')
u'\xe4\xf6\xfc'
Yup, decodes from utf-8 without error
>u = s.decode('utf-8')
u
u'\xe4\xf6\xfc'
and those Unicode characters actually look like what you started with:

| >>import unicodedata as ucd
| >>[ucd.name(x) for x in u'\xe4\xf6\xfc']
| ['LATIN SMALL LETTER A WITH DIAERESIS', 'LATIN SMALL LETTER O WITH
DIAERESIS',
| LATIN SMALL LETTER U WITH DIAERESIS']
| >>>

So, 3 yups, it must be utf-8.

>print u.encode('utf-8')
äöü
>print u.encode('latin1')
äöü

Why can't I get äöü printed from utf-8 and I can from latin1?
Because str objects are just strings of anonymous bytes. They don't
have an attribute that says what encoding their creator had in mind.
Consequently output channels like stdout have an encoding which is
applied to all output. On Windows, in a GUI, this encoding depends on
your locale, and in your case is probably cp1252. cp1252 is very
similar to latin1 but has extra symbols in it. Try repeating the above
exercise, but this time include a trademark symbol in your s string,
and add
print u.encode("cp1252")
at the end of the exercise.
How
can I use utf-8 exclusivly and be able to print the characters?
print exclusiveutf8.decode('utf-8').encode(whateverittakes)

Why do you want to use utf-8 exclusively? Use it for what?

Basic principle when working with non-ASCII data: decode 8-bit input
into Unicode; process using Unicode-aware software (in Python's case,
the built-in unicode type); if 8-bit output is required, encode your
Unicode data with whatever encoding is required.
>
I also did the same thing an the same machine in a command window...
ActivePython 2.3.2 Build 230 (ActiveState Corp.) based on
Python 2.3.2 (#49, Oct 24 2003, 13:37:57) [MSC v.1200 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>s = "äöü"
print s
äöü
>s
'\x84\x94\x81'
>s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
>u = s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
>>

Why such a difference from the IDE to the command window in what it can
do
Because the command window is the child of MS-DOS, which was the child
of CP/M, and maintains the ancient traditions (like ctrl-Z being taken
as EOF, for example).
and the internal representation of the unicode?
Unicode? There's no Unicode involved here. In each case you are sending
a string of bytes (0 <= ordinal <= 255) to an output device, each to be
rendered as a bitmap on the screen. Wing evidently causes the renderer
to reach for the latin1 or cp1252 table; the command window is probably
(in your case) using cp850 (or something similar).

On my box, in a command window:
| >>sys.stdout.encoding
| 'cp850'
| >>'\x84\x94\x81'.decode('cp850')
| u'\xe4\xf6\xfc'
.... which is what you had before.

HTH,
John

Sep 21 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Marian Aldenhövel | last post by:
Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain...
14
by: Steven T. Hatton | last post by:
I'm trying to write a program like hexel. I guess I could fish out the source for hexel and look at that, but for now I'm trying to figure out how I can do with with std::stringstream and...
1
by: reto.hadorn | last post by:
Hi, I am computing an XML file with all necessary tags, to be used in dedicated XML software. I am computing it as a text file but don't know how to do to save it to the disk UTF-encoded....
5
by: John Bowman | last post by:
Hi, I've been reading through the threads on using the WebBrowser control in C# app's and how to print it's contents. I've got that much working. However, the big problem is I can't figure out...
2
by: yulyos | last post by:
Hi Answer To Printing from Right To Left To The Printer in VB.NE Printing From Right To Left To The Printer in VB.NE you can download a small example http://www.geocities.com/vmtnl/rtl.htm...
29
by: Ron Garret | last post by:
>>> u'\xbd' u'\xbd' >>> print _ Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in...
3
by: 7stud | last post by:
Can anyone tell me why I can print out the individual variables in the following code, but when I print them out combined into a single string, I get an error? symbol = u'ibm' price = u'4 \xbd'...
5
by: Xah Lee | last post by:
If i have a nested list, where the atoms are unicode strings, e.g. # -*- coding: utf-8 -*- ttt=, ,...] print ttt how can i print it without getting the u'\u1234' notation? i.e. i want it...
2
by: David | last post by:
Hi list. I've never used unicode in a Python script before, but I need to now. I'm not sure where to start. I'm hoping that a kind soul can help me out here. My current (almost non-existant)...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.