473,836 Members | 1,596 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Printing UTF-8

I am new to unicode so please bear with my stupidity.

I am doing the following in a Python IDE called Wing with Python 23.
>>s = ""
print s
äöü
>>print s
äöü
>>s
'\xc3\xa4\xc3\x b6\xc3\xbc'
>>s.decode('u tf-8')
u'\xe4\xf6\xfc'
>>u = s.decode('utf-8')
u
u'\xe4\xf6\xfc'
>>print u.encode('utf-8')
äöü
>>print u.encode('latin 1')


Why can't I get printed from utf-8 and I can from latin1? How
can I use utf-8 exclusivly and be able to print the characters?

I also did the same thing an the same machine in a command window...
ActivePython 2.3.2 Build 230 (ActiveState Corp.) based on
Python 2.3.2 (#49, Oct 24 2003, 13:37:57) [MSC v.1200 32 bit (Intel)]
on win32
Type "help", "copyright" , "credits" or "license" for more information.
>>s = ""
print s
>>s
'\x84\x94\x81'
>>s.decode('u tf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeEr ror: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
>>u = s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeEr ror: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
>>>
Why such a difference from the IDE to the command window in what it can
do and the internal representation of the unicode?

Thanks,
Shel

Sep 21 '06 #1
1 7299
sh************* @gmail.com wrote:
I am new to unicode so please bear with my stupidity.

I am doing the following in a Python IDE called Wing with Python 23.
>s = ""
>From later evidence, this string is encoded as utf-8. Looks like Wing
must be using an implicit "# coding: utf-8" for interactive input ...
>print s
äöü
.... but uses some other encoding for output. Try doing this, and see
what you get:
import sys
print sys.stdout.enco ding
>print s
äöü
>s
'\xc3\xa4\xc3\x b6\xc3\xbc'
Yup, looks like utf-8 ...
>s.decode('ut f-8')
u'\xe4\xf6\xfc'
Yup, decodes from utf-8 without error
>u = s.decode('utf-8')
u
u'\xe4\xf6\xfc'
and those Unicode characters actually look like what you started with:

| >>import unicodedata as ucd
| >>[ucd.name(x) for x in u'\xe4\xf6\xfc']
| ['LATIN SMALL LETTER A WITH DIAERESIS', 'LATIN SMALL LETTER O WITH
DIAERESIS',
| LATIN SMALL LETTER U WITH DIAERESIS']
| >>>

So, 3 yups, it must be utf-8.

>print u.encode('utf-8')
äöü
>print u.encode('latin 1')


Why can't I get printed from utf-8 and I can from latin1?
Because str objects are just strings of anonymous bytes. They don't
have an attribute that says what encoding their creator had in mind.
Consequently output channels like stdout have an encoding which is
applied to all output. On Windows, in a GUI, this encoding depends on
your locale, and in your case is probably cp1252. cp1252 is very
similar to latin1 but has extra symbols in it. Try repeating the above
exercise, but this time include a trademark symbol in your s string,
and add
print u.encode("cp125 2")
at the end of the exercise.
How
can I use utf-8 exclusivly and be able to print the characters?
print exclusiveutf8.d ecode('utf-8').encode(what everittakes)

Why do you want to use utf-8 exclusively? Use it for what?

Basic principle when working with non-ASCII data: decode 8-bit input
into Unicode; process using Unicode-aware software (in Python's case,
the built-in unicode type); if 8-bit output is required, encode your
Unicode data with whatever encoding is required.
>
I also did the same thing an the same machine in a command window...
ActivePython 2.3.2 Build 230 (ActiveState Corp.) based on
Python 2.3.2 (#49, Oct 24 2003, 13:37:57) [MSC v.1200 32 bit (Intel)]
on win32
Type "help", "copyright" , "credits" or "license" for more information.
>s = ""
print s
>s
'\x84\x94\x81'
>s.decode('ut f-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeEr ror: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
>u = s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeEr ror: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
>>

Why such a difference from the IDE to the command window in what it can
do
Because the command window is the child of MS-DOS, which was the child
of CP/M, and maintains the ancient traditions (like ctrl-Z being taken
as EOF, for example).
and the internal representation of the unicode?
Unicode? There's no Unicode involved here. In each case you are sending
a string of bytes (0 <= ordinal <= 255) to an output device, each to be
rendered as a bitmap on the screen. Wing evidently causes the renderer
to reach for the latin1 or cp1252 table; the command window is probably
(in your case) using cp850 (or something similar).

On my box, in a command window:
| >>sys.stdout.en coding
| 'cp850'
| >>'\x84\x94\x81 '.decode('cp850 ')
| u'\xe4\xf6\xfc'
.... which is what you had before.

HTH,
John

Sep 21 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
4372
by: Marian Aldenhvel | last post by:
Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain non-ascii characters.
14
7921
by: Steven T. Hatton | last post by:
I'm trying to write a program like hexel. I guess I could fish out the source for hexel and look at that, but for now I'm trying to figure out how I can do with with std::stringstream and std::string. I had something working with std::string. I simply treated it as an STL container, and iterated over its elements. The results were a bit confusing to me. Some of the stuff was printing out as 1 or 2 characters hex numbers, as I...
1
2827
by: reto.hadorn | last post by:
Hi, I am computing an XML file with all necessary tags, to be used in dedicated XML software. I am computing it as a text file but don't know how to do to save it to the disk UTF-encoded. strConv() with constant = 64 does not work, because the default code page of the system is UTF-16 and the XML editor I am using, XMLSpy, expects structured text when importing text files (so it cannot import my tagged files). The only way I have...
5
7329
by: John Bowman | last post by:
Hi, I've been reading through the threads on using the WebBrowser control in C# app's and how to print it's contents. I've got that much working. However, the big problem is I can't figure out how to detect when the printing is finished. I've read the article "Printing with the Internet Explorer WebBrowser Control" and it mentions some sample code in pwaitvb. This example pretty much useless to me because VB hides the how to do it. I...
2
1614
by: yulyos | last post by:
Hi Answer To Printing from Right To Left To The Printer in VB.NE Printing From Right To Left To The Printer in VB.NE you can download a small example http://www.geocities.com/vmtnl/rtl.htm Have a nice da
29
3534
by: Ron Garret | last post by:
>>> u'\xbd' u'\xbd' >>> print _ Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in range(128) >>>
3
2229
by: 7stud | last post by:
Can anyone tell me why I can print out the individual variables in the following code, but when I print them out combined into a single string, I get an error? symbol = u'ibm' price = u'4 \xbd' # 4 1/2 print "%s" % symbol print "%s" % price.encode("utf-8") print "%s %s" % (symbol, price.encode("utf-8") )
5
6925
by: Xah Lee | last post by:
If i have a nested list, where the atoms are unicode strings, e.g. # -*- coding: utf-8 -*- ttt=, ,...] print ttt how can i print it without getting the u'\u1234' notation? i.e. i want it print just like this: , ...] I can of course write a loop then for each string use
2
1608
by: David | last post by:
Hi list. I've never used unicode in a Python script before, but I need to now. I'm not sure where to start. I'm hoping that a kind soul can help me out here. My current (almost non-existant) knowledge of unicode: string types. What I don't understand yet is what encodings are and when you'd want/need to use them. What I'd like is to just be able to
0
9825
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
9673
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10859
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10560
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10602
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10260
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5653
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5829
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3116
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.