How to use 8bit character sets?

copx

For some reason Python (on Windows) doesn't use the system's default
character set and that's a serious problem for me.
I need to process German textfiles (containing umlauts and other > 7bit
ASCII characters) and generally work with strings which need to be processed
using the local encoding (I need to display the text using a Tk-based GUI
for example). The only solution I managed to find was converting between
unicode and latin-1 all the time (the textfiles aren't unicode, the output
of the program isn't supposed to be unicode either). Everything worked fine
until I tried to run the program on a Windows 9x machine.. It seems that
Python on Win9x doesn't really support unicode (IIRC Win9x doesn't have real
unicode support so that's not suprising).
Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
for textfile and string processing by default?

copx

Jul 19 '05 #1

Subscribe Post Reply

1743

Chris Curvey

Check out sitecustomize.py.

http://diveintopython.org/xml_processing/unicode.html

Jul 19 '05 #2

copx

"Chris Curvey" <cc*****@gmail.com> schrieb im Newsbeitrag
news:11**********************@g14g2000cwa.googlegr oups.com...

Check out sitecustomize.py.

http://diveintopython.org/xml_processing/unicode.html

Thanks but I'm looking for a way to do this on application level (i.e. I
want my app to run in an unmodified interpreter enviroment).

copx

Jul 19 '05 #3

John Machin

copx wrote:

For some reason Python (on Windows) doesn't use the system's default
character set and that's a serious problem for me.
I need to process German textfiles (containing umlauts and other > 7bit
ASCII characters) and generally work with strings which need to be processed
using the local encoding (I need to display the text using a Tk-based GUI
for example). The only solution I managed to find was converting between
unicode and latin-1 all the time (the textfiles aren't unicode, the output
of the program isn't supposed to be unicode either). Everything worked fine
until I tried to run the program on a Windows 9x machine.. It seems that
Python on Win9x doesn't really support unicode (IIRC Win9x doesn't have real
unicode support so that's not suprising).
Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
for textfile and string processing by default?

copx

1. Your description of your problem is extremely vague. If you were to
supply a minimal script that "works" [on what platform?? what version of
Python??], with a description of what you understand by "works", and
what happens differently when you run that script on a Win9x box [for
what value(s) of x?? what version of Python??], we might be able to help
you. N.B. somewhere near the top of the script you should have something
like:

import sys
print "Python version:", sys.version
print "platform:", sys.platform
print "default encoding:", sys.getdefaultencoding()
try:
print "Windows version:", sys.getwindowsversion()
except AttributeError:
print "sys.getwindowsversion not available"

2. You should read this:

http://www.catb.org/~esr/faqs/smart-questions.html

3. You should not rely on a crutch like a default encoding, especially
one obtained by a kludge like sitecustomize.py. If your app expects to
receive data in encoding x and send data in encoding y, these facts are
properties of the application and the data, NOT the box you are running
on. If you had a requirement to read MacCyrillic from a Classic Mac and
write KOI8 for consumption on a Windows PC, you should be able to do it
on a SPARC Solaris box in Timbuktu or Walla Walla, Wa., without having
to fiddle with site-wide configuration.

4. AFAIK, support for Unicode is provided by Python with no assistance
from the operating system. The multitudinous deficiencies in Win9x
should have no bearing on the problem. Have you tried to run your
program on a Win2K or WinXP box?

HTH,

John

Jul 19 '05 #4

Martin v. Löwis

copx wrote:

For some reason Python (on Windows) doesn't use the system's default
character set and that's a serious problem for me.
I very much doubt this statement: Python does "use" the system's default
character set on Windows. What makes you think it doesn't?
Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
for textfile and string processing by default?

That is the default.

Regards,
Martin

Jul 19 '05 #5

John Roth

""Martin v. Löwis"" <ma****@v.loewis.de> wrote in message
news:42***********************@news.freenet.de...

copx wrote:
For some reason Python (on Windows) doesn't use the system's default
character set and that's a serious problem for me.
I very much doubt this statement: Python does "use" the system's default
character set on Windows. What makes you think it doesn't?
Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
for textfile and string processing by default?

That is the default.

As far as I can tell, there are actually two defaults, which tends
to confuse things. One is used whenever a unicode to 8-bit
conversion is needed on output to stdout, stderr or similar;
that's usually Latin-1 (or whatever the installation has set up.)
The other is used whenever the unicode to 8-bit conversion
doesn't have a context - that's usually Ascii-7.

John Roth

Regards,
Martin

Jul 19 '05 #6

Martin v. Löwis

John Roth wrote:

That is the default.

As far as I can tell, there are actually two defaults, which tends
to confuse things.

Notice that there are two defaults already in the operating system:
Windows has the notion of the "ANSI code page" and the "OEM code
page", which are used in different contexts.
One is used whenever a unicode to 8-bit
conversion is needed on output to stdout, stderr or similar;
that's usually Latin-1 (or whatever the installation has set up.)
You mean, in Python? No, this is not how it works. On output
of 8-bit strings to stdout, no conversion is ever performed:
the byte strings are written to stdout as-is.
The other is used whenever the unicode to 8-bit conversion
doesn't have a context - that's usually Ascii-7.

Again, you seem to be talking about Unicode conversions -
it's not clear that the OP is actually interested in
Unicode conversion in the first place.

Regards,
Martin

Jul 19 '05 #7

John Roth

""Martin v. Löwis"" <ma****@v.loewis.de> wrote in message
news:42************@v.loewis.de...

John Roth wrote:
That is the default.

As far as I can tell, there are actually two defaults, which tends
to confuse things.

Notice that there are two defaults already in the operating system:
Windows has the notion of the "ANSI code page" and the "OEM code
page", which are used in different contexts.
One is used whenever a unicode to 8-bit
conversion is needed on output to stdout, stderr or similar;
that's usually Latin-1 (or whatever the installation has set up.)

You mean, in Python? No, this is not how it works. On output
of 8-bit strings to stdout, no conversion is ever performed:
the byte strings are written to stdout as-is.

That's true, but I was talking about outputing unicode strings,
not 8-bit strings. As you say below, the OP may not have
been talking about that.

The other is used whenever the unicode to 8-bit conversion
doesn't have a context - that's usually Ascii-7.

Again, you seem to be talking about Unicode conversions -
it's not clear that the OP is actually interested in
Unicode conversion in the first place.

Regards,
Martin

John Roth

Jul 19 '05 #8

Similar topics

char 8bit wide or 7bit wide in c++?

by: Web Developer | last post by:

Hi, Have a book that says a char in c++ is 8bit wide. But i remember reading that C++ uses ANSI character set which uses 7bit per character. Who is right? WD

C / C++

Simple high-ascii character encoding

by: chandy | last post by:

Hi, I have an Html document that declares that it uses the utf-8 character set. As this document is editable via a web interface I need to make sure than high-ascii characters that may be...

HTML / CSS

charset=unknown-8bit

by: Andreas Prilop | last post by:

http://www.iana.org/assignments/character-sets and RFC 1428 have an encoding (charset) "unknown-8bit". There is also the widely recognized "x-user-defined", which means the same thing, afaik....

HTML / CSS

Can I get the 8bit-string representation of any unicode string

by: wanghz | last post by:

Hello, everyone. I have a problem when I'm processing unicode strings. Is it possible to get the 8bit-string representation of any unicode string? Suppose I get a unicode string: a =...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware