Hi,
I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.
For example, if I type print "é", it prints é. If I use a unicode string:
a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"é".
How can I solve this problem?
Thank you
PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write. 3 2762
"yzzzzz" <yz****@netcourrier.com> writes: Hi,
I am writing my python programs using a Unicode text editor. The files are encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1) or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.
For example, if I type print "é", it prints é. If I use a unicode string: a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters, which makes sense if the interpreter thinks I typed in u"é".
How can I solve this problem?
Thank you
PS. I have no problem using Unicode strings in Python, I know how to manipulate and convert them, I'm just looking for how to specify the default encoding for the scripts I write.
Use Python 2.3, and read PEP 263.
Thomas
yzzzzz wrote: Hi,
Hi "yzzzzz",
I am writing my python programs using a Unicode text editor. The files are encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1) or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8. For example, if I type print "é", it prints é. If I use a unicode string: a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters, which makes sense if the interpreter thinks I typed in u"é". How can I solve this problem?
You might want to read the thread on this list/newsgroup I started
yesterday called "Unicode problem"
Is it feasible for you to upgrade to Python 2.3? If so I'd recommend you
do it already. 2.3 is pretty close to release now and it has support for
source files in Unicode format. If your Unicode editor saves the text
file with a BOM (it should) then under Python 2.3 your scripts will work
as expected.
Thank you PS. I have no problem using Unicode strings in Python, I know how to manipulate and convert them, I'm just looking for how to specify the default encoding for the scripts I write.
See http://www.python.org/peps/pep-0263.html This is how it is
implemented in Python 2.3.
-- Gerhard
OK, problem solved!
I got the new Python, it all works. I just had to add the UTF-8 BOM myself
(UltraEdit doesn't do it by default) but that wasn't too difficult to do
(copy and paste a ZWNBSP).
One last question: I'm using windows, so the console's encoding is CP437. If
I try to print a unicode string, the string is converted to CP437 and
printed and that works fine. However if I try to print a normal
(non-unicode) string from a UTF-8 encoded file with BOM, for example print
"é", it sends out the two UTF-8 bytes é which appear as lines in the CP437
charset. But if I print the exact same character in a Latin 1 encoded file,
it comes out as the Latin 1 byte for "é" which shows up as a theta in CP437.
This means that Python doesn't take into account the specified encoding
(Latin 1 or UTF-8) and prints out the raw bytes as they appear in the source
file, regardless of the encoding used. Is this normal? (this isn't really a
problem for me as I am only going to use unicode strings now) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Hallvard B Furuseth |
last post by:
Has someone got a Python routine or module which converts Unicode
strings to lowercase (or uppercase)?
What I actually need to do is to compare a number of strings in a
case-insensitive manner,...
|
by: Michael Foord |
last post by:
I'm trying to become 'unicode-aware'... *sigh*. What's that quote - 'a
native speaker of ascii will never learn to speak unicode like a
native'. The trouble is I think I've been a native speaker of...
|
by: nico |
last post by:
In my python scripts, I use a lot of accented characters as I work in
french.
In order to do this, I put the line
# -*- coding: UTF-8 -*-
at the beginning of the script file.
Then, when I need...
|
by: Chris |
last post by:
hi,
to convert excel files via csv to xml or whatever I frequently use the
csv module which is really nice for quick scripts. problem are of course
non ascii characters like german umlauts, EURO...
|
by: Xah Lee |
last post by:
python has this nice unicodedata module that deals with unicode nicely.
#-*- coding: utf-8 -*-
# python
from unicodedata import *
# each unicode char has a unique name.
# one can use the...
|
by: Joerg |
last post by:
I am in the process of creating an international GUI application with C# on
..NET1.1 (Win2k), which is supposed to implement a particular look/design. In
order to achieve this, I plan amongst...
|
by: Csaba Gabor |
last post by:
If I do alert(encodeURI(String.fromCharCode(250)));
(in FF 1.5+ or IE6 on my winXP Pro) then I get: %C3%BA
Now I was sort of expecting something like %u... (and a single (4
digit?) unicode hex...
|
by: Nicolas Pontoizeau |
last post by:
Hi,
I am handling a mixed languages text file encoded in UTF-8. Theres is
mainly French, English and Asian languages. I need to detect every
asian characters in order to enclose it by a special...
|
by: Teresa Masino |
last post by:
We have set up a couple of SQL Server 2005 systems and I have found
that the format of the ERRORLOG files and the SQL Agent's log files
are Unicode or some format that findstr cannot parse...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |