Hi,
I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.
For example, if I type print "", it prints é. If I use a unicode string:
a=u"" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"é".
How can I solve this problem?
Thank you
PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write. 3 2787
"yzzzzz" <yz****@netcour rier.com> writes: Hi,
I am writing my python programs using a Unicode text editor. The files are encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1) or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.
For example, if I type print "", it prints é. If I use a unicode string: a=u"" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters, which makes sense if the interpreter thinks I typed in u"é".
How can I solve this problem?
Thank you
PS. I have no problem using Unicode strings in Python, I know how to manipulate and convert them, I'm just looking for how to specify the default encoding for the scripts I write.
Use Python 2.3, and read PEP 263.
Thomas
yzzzzz wrote: Hi,
Hi "yzzzzz",
I am writing my python programs using a Unicode text editor. The files are encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1) or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8. For example, if I type print "", it prints é. If I use a unicode string: a=u"" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters, which makes sense if the interpreter thinks I typed in u"é". How can I solve this problem?
You might want to read the thread on this list/newsgroup I started
yesterday called "Unicode problem"
Is it feasible for you to upgrade to Python 2.3? If so I'd recommend you
do it already. 2.3 is pretty close to release now and it has support for
source files in Unicode format. If your Unicode editor saves the text
file with a BOM (it should) then under Python 2.3 your scripts will work
as expected.
Thank you PS. I have no problem using Unicode strings in Python, I know how to manipulate and convert them, I'm just looking for how to specify the default encoding for the scripts I write.
See http://www.python.org/peps/pep-0263.html This is how it is
implemented in Python 2.3.
-- Gerhard
OK, problem solved!
I got the new Python, it all works. I just had to add the UTF-8 BOM myself
(UltraEdit doesn't do it by default) but that wasn't too difficult to do
(copy and paste a ZWNBSP).
One last question: I'm using windows, so the console's encoding is CP437. If
I try to print a unicode string, the string is converted to CP437 and
printed and that works fine. However if I try to print a normal
(non-unicode) string from a UTF-8 encoded file with BOM, for example print
"", it sends out the two UTF-8 bytes é which appear as lines in the CP437
charset. But if I print the exact same character in a Latin 1 encoded file,
it comes out as the Latin 1 byte for "" which shows up as a theta in CP437.
This means that Python doesn't take into account the specified encoding
(Latin 1 or UTF-8) and prints out the raw bytes as they appear in the source
file, regardless of the encoding used. Is this normal? (this isn't really a
problem for me as I am only going to use unicode strings now) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Hallvard B Furuseth |
last post by:
Has someone got a Python routine or module which converts Unicode
strings to lowercase (or uppercase)?
What I actually need to do is to compare a number of strings in a
case-insensitive manner, so I assume it's simplest to convert to
lower/upper first.
Possibly all strings will be from the latin-1 character set, so I could
convert to 8-bit latin-1, map to lowercase, and convert back, but that
seems rather cumbersome.
|
by: Michael Foord |
last post by:
I'm trying to become 'unicode-aware'... *sigh*. What's that quote - 'a
native speaker of ascii will never learn to speak unicode like a
native'. The trouble is I think I've been a native speaker of latin-1
without realising it.
My main problem with udnerstanding unicode is what to do with
arbitrary text without an encoding specified. To the best of my
knowledge the technical term for this situation is 'buggered'. E.g. I
have a CGI...
|
by: nico |
last post by:
In my python scripts, I use a lot of accented characters as I work in
french.
In order to do this, I put the line
# -*- coding: UTF-8 -*-
at the beginning of the script file.
Then, when I need to store accented characters in a string, I used to
prefix the literal string with 'u', like this:
mystring = u"prnom"
But if I understand well, prefixing a unicode string literal with 'u'
|
by: Chris |
last post by:
hi,
to convert excel files via csv to xml or whatever I frequently use the
csv module which is really nice for quick scripts. problem are of course
non ascii characters like german umlauts, EURO currency symbol etc.
the current csv module cannot handle unicode the docs say, is there any
workaround or is unicode support planned for the near future? in most
cases support for characters in iso-8859-1(5) would be ok for my
purposes but of...
|
by: Xah Lee |
last post by:
python has this nice unicodedata module that deals with unicode nicely.
#-*- coding: utf-8 -*-
# python
from unicodedata import *
# each unicode char has a unique name.
# one can use the “lookup” func to find it
| |
by: Joerg |
last post by:
I am in the process of creating an international GUI application with C# on
..NET1.1 (Win2k), which is supposed to implement a particular look/design. In
order to achieve this, I plan amongst others to define a certain font (MS
Arial Unicode) for the user controls, and provide custom controls which have
the font property fixed set to this font.
The application is supposed to be used in different countries (including
China), so I wonder...
|
by: Csaba Gabor |
last post by:
If I do alert(encodeURI(String.fromCharCode(250)));
(in FF 1.5+ or IE6 on my winXP Pro) then I get: %C3%BA
Now I was sort of expecting something like %u... (and a single (4
digit?) unicode hex character num). Is that something for the future,
or am I guaranteed that all % encodings (from encodeURI) will have
exactly two hex digits following?
Perhaps someone could shed some light on this or point me to quality
site. Be gentle, I know...
|
by: Nicolas Pontoizeau |
last post by:
Hi,
I am handling a mixed languages text file encoded in UTF-8. Theres is
mainly French, English and Asian languages. I need to detect every
asian characters in order to enclose it by a special tag for latex.
Does anybody know if there is a unicode "table of character"
implementation in python? I mean, I give a character and python replys
me with the language in which the character occurs.
Thanks in advance
|
by: Teresa Masino |
last post by:
We have set up a couple of SQL Server 2005 systems and I have found
that the format of the ERRORLOG files and the SQL Agent's log files
are Unicode or some format that findstr cannot parse properly. "find"
parses them fine, but it doesn't have the capabilities that I need --
specifically, I can't search for multiple strings in one search.
I see the checkbox on the SQL Agent's for "Write OEM File", but it is
grayed out so I am not able...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |