Using Unicode scripts

yzzzzz

Hi,

I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "é", it prints Ã©. If I use a unicode string:
a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"Ã©".

How can I solve this problem?

Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.

Jul 18 '05 #1

Subscribe Post Reply

2762

Thomas Heller

"yzzzzz" <yz****@netcourrier.com> writes:

Hi,

I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "é", it prints Ã©. If I use a unicode string:
a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"Ã©".

How can I solve this problem?

Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.

Use Python 2.3, and read PEP 263.

Thomas

Jul 18 '05 #2

Gerhard Häring

yzzzzz wrote:

Hi,
Hi "yzzzzz",
I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "é", it prints Ã©. If I use a unicode string:
a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"Ã©".

How can I solve this problem?
You might want to read the thread on this list/newsgroup I started
yesterday called "Unicode problem"

Is it feasible for you to upgrade to Python 2.3? If so I'd recommend you
do it already. 2.3 is pretty close to release now and it has support for
source files in Unicode format. If your Unicode editor saves the text
file with a BOM (it should) then under Python 2.3 your scripts will work
as expected.
Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.

See http://www.python.org/peps/pep-0263.html This is how it is
implemented in Python 2.3.

-- Gerhard

Jul 18 '05 #3

yzzzzz

OK, problem solved!
I got the new Python, it all works. I just had to add the UTF-8 BOM myself
(UltraEdit doesn't do it by default) but that wasn't too difficult to do
(copy and paste a ZWNBSP).

One last question: I'm using windows, so the console's encoding is CP437. If
I try to print a unicode string, the string is converted to CP437 and
printed and that works fine. However if I try to print a normal
(non-unicode) string from a UTF-8 encoded file with BOM, for example print
"é", it sends out the two UTF-8 bytes Ã© which appear as lines in the CP437
charset. But if I print the exact same character in a Latin 1 encoded file,
it comes out as the Latin 1 byte for "é" which shows up as a theta in CP437.
This means that Python doesn't take into account the specified encoding
(Latin 1 or UTF-8) and prints out the raw bytes as they appear in the source
file, regardless of the encoding used. Is this normal? (this isn't really a
problem for me as I am only going to use unicode strings now)

Jul 18 '05 #4

Similar topics

convert Unicode to lower/uppercase?

by: Hallvard B Furuseth | last post by:

Has someone got a Python routine or module which converts Unicode strings to lowercase (or uppercase)? What I actually need to do is to compare a number of strings in a case-insensitive manner,...

Python

Becoming Unicode Aware

by: Michael Foord | last post by:

I'm trying to become 'unicode-aware'... *sigh*. What's that quote - 'a native speaker of ascii will never learn to speak unicode like a native'. The trouble is I think I've been a native speaker of...

Python

unicode string literals and "u" prefix

by: nico | last post by:

In my python scripts, I use a lot of accented characters as I work in french. In order to do this, I put the line # -*- coding: UTF-8 -*- at the beginning of the script file. Then, when I need...

Python

csv module and unicode, when or workaround?

by: Chris | last post by:

hi, to convert excel files via csv to xml or whatever I frequently use the csv module which is really nice for quick scripts. problem are of course non ascii characters like german umlauts, EURO...

Python

[perl-python] unicode study with unicodedata module

by: Xah Lee | last post by:

python has this nice unicodedata module that deals with unicode nicely. #-*- coding: utf-8 -*- # python from unicodedata import * # each unicode char has a unique name. # one can use the...

Python

Unicode font in international application

by: Joerg | last post by:

I am in the process of creating an international GUI application with C# on ..NET1.1 (Win2k), which is supposed to implement a particular look/design. In order to achieve this, I plan amongst...

C# / C Sharp

encodeURI and unicode

by: Csaba Gabor | last post by:

If I do alert(encodeURI(String.fromCharCode(250))); (in FF 1.5+ or IE6 on my winXP Pro) then I get: %C3%BA Now I was sort of expecting something like %u... (and a single (4 digit?) unicode hex...

Javascript

unicode "table of character" implementation in python

by: Nicolas Pontoizeau | last post by:

Hi, I am handling a mixed languages text file encoded in UTF-8. Theres is mainly French, English and Asian languages. I need to detect every asian characters in order to enclose it by a special...

Python

Using findstr on SQL 2005 ERRORLOG file

by: Teresa Masino | last post by:

We have set up a couple of SQL Server 2005 systems and I have found that the format of the ERRORLOG files and the SQL Agent's log files are Unicode or some format that findstr cannot parse...

Microsoft SQL Server

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing