473,807 Members | 2,851 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Using Unicode scripts

Hi,

I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "", it prints é. If I use a unicode string:
a=u"" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"é".

How can I solve this problem?

Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.
Jul 18 '05 #1
3 2787
"yzzzzz" <yz****@netcour rier.com> writes:
Hi,

I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "", it prints é. If I use a unicode string:
a=u"" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"é".

How can I solve this problem?

Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.


Use Python 2.3, and read PEP 263.

Thomas
Jul 18 '05 #2
yzzzzz wrote:
Hi,
Hi "yzzzzz",
I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "", it prints é. If I use a unicode string:
a=u"" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"é".

How can I solve this problem?
You might want to read the thread on this list/newsgroup I started
yesterday called "Unicode problem"

Is it feasible for you to upgrade to Python 2.3? If so I'd recommend you
do it already. 2.3 is pretty close to release now and it has support for
source files in Unicode format. If your Unicode editor saves the text
file with a BOM (it should) then under Python 2.3 your scripts will work
as expected.
Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.


See http://www.python.org/peps/pep-0263.html This is how it is
implemented in Python 2.3.

-- Gerhard

Jul 18 '05 #3
OK, problem solved!
I got the new Python, it all works. I just had to add the UTF-8 BOM myself
(UltraEdit doesn't do it by default) but that wasn't too difficult to do
(copy and paste a ZWNBSP).

One last question: I'm using windows, so the console's encoding is CP437. If
I try to print a unicode string, the string is converted to CP437 and
printed and that works fine. However if I try to print a normal
(non-unicode) string from a UTF-8 encoded file with BOM, for example print
"", it sends out the two UTF-8 bytes é which appear as lines in the CP437
charset. But if I print the exact same character in a Latin 1 encoded file,
it comes out as the Latin 1 byte for "" which shows up as a theta in CP437.
This means that Python doesn't take into account the specified encoding
(Latin 1 or UTF-8) and prints out the raw bytes as they appear in the source
file, regardless of the encoding used. Is this normal? (this isn't really a
problem for me as I am only going to use unicode strings now)
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

23
25964
by: Hallvard B Furuseth | last post by:
Has someone got a Python routine or module which converts Unicode strings to lowercase (or uppercase)? What I actually need to do is to compare a number of strings in a case-insensitive manner, so I assume it's simplest to convert to lower/upper first. Possibly all strings will be from the latin-1 character set, so I could convert to 8-bit latin-1, map to lowercase, and convert back, but that seems rather cumbersome.
10
1404
by: Michael Foord | last post by:
I'm trying to become 'unicode-aware'... *sigh*. What's that quote - 'a native speaker of ascii will never learn to speak unicode like a native'. The trouble is I think I've been a native speaker of latin-1 without realising it. My main problem with udnerstanding unicode is what to do with arbitrary text without an encoding specified. To the best of my knowledge the technical term for this situation is 'buggered'. E.g. I have a CGI...
6
5497
by: nico | last post by:
In my python scripts, I use a lot of accented characters as I work in french. In order to do this, I put the line # -*- coding: UTF-8 -*- at the beginning of the script file. Then, when I need to store accented characters in a string, I used to prefix the literal string with 'u', like this: mystring = u"prnom" But if I understand well, prefixing a unicode string literal with 'u'
6
3805
by: Chris | last post by:
hi, to convert excel files via csv to xml or whatever I frequently use the csv module which is really nice for quick scripts. problem are of course non ascii characters like german umlauts, EURO currency symbol etc. the current csv module cannot handle unicode the docs say, is there any workaround or is unicode support planned for the near future? in most cases support for characters in iso-8859-1(5) would be ok for my purposes but of...
5
1890
by: Xah Lee | last post by:
python has this nice unicodedata module that deals with unicode nicely. #-*- coding: utf-8 -*- # python from unicodedata import * # each unicode char has a unique name. # one can use the “lookup” func to find it
1
2387
by: Joerg | last post by:
I am in the process of creating an international GUI application with C# on ..NET1.1 (Win2k), which is supposed to implement a particular look/design. In order to achieve this, I plan amongst others to define a certain font (MS Arial Unicode) for the user controls, and provide custom controls which have the font property fixed set to this font. The application is supposed to be used in different countries (including China), so I wonder...
7
5087
by: Csaba Gabor | last post by:
If I do alert(encodeURI(String.fromCharCode(250))); (in FF 1.5+ or IE6 on my winXP Pro) then I get: %C3%BA Now I was sort of expecting something like %u... (and a single (4 digit?) unicode hex character num). Is that something for the future, or am I guaranteed that all % encodings (from encodeURI) will have exactly two hex digits following? Perhaps someone could shed some light on this or point me to quality site. Be gentle, I know...
5
4090
by: Nicolas Pontoizeau | last post by:
Hi, I am handling a mixed languages text file encoded in UTF-8. Theres is mainly French, English and Asian languages. I need to detect every asian characters in order to enclose it by a special tag for latex. Does anybody know if there is a unicode "table of character" implementation in python? I mean, I give a character and python replys me with the language in which the character occurs. Thanks in advance
4
3682
by: Teresa Masino | last post by:
We have set up a couple of SQL Server 2005 systems and I have found that the format of the ERRORLOG files and the SQL Agent's log files are Unicode or some format that findstr cannot parse properly. "find" parses them fine, but it doesn't have the capabilities that I need -- specifically, I can't search for multiple strings in one search. I see the checkbox on the SQL Agent's for "Write OEM File", but it is grayed out so I am not able...
0
9720
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
10626
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10112
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7650
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5546
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5685
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4330
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3854
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3011
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.