By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,171 Members | 809 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,171 IT Pros & Developers. It's quick & easy.

Faulty encoding settings

P: n/a
How do I cope with faulty encoding settings?

I'm writing an application that needs all internal character data
to be stored in iso-8859-1. It also must allow input and output
using stdin and stdout.

This works just fine with the Windows binary of Python.
sys.stdin.encoding is correctly set to the encoding of the
current terminal ('cp437').

s = sys.stdin.readline()
# Convert to iso-8859-1.
s = s.decode(sys.stdin.encoding).encode('iso-8859-1')

Granted, users are constrained to entering characters in the
cp437 charset, but that's better than the following.

The Cygwin binary I have (2.4.3) reports sys.stdin.encoding as
'US-ASCII', which is quite wrong. A Cygwin terminal uses, as far
as I can tell, iso-8859-1. This renders the above construction
useless if the user enters any character codes above 128.
Using raw_input instead of readline addresses the problem by making
it impossible to enter non-ascii text.

Please advise.

This is only a temporary problem, as eventually this application
will use Tkinter as an interface instead. But of course then I'll
probably have a bunch of new problems. ;)

--
Neil Cerutti
Oct 17 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
In <sl*******************@FIAD06.norwich.edu>, Neil Cerutti wrote:
I'm writing an application that needs all internal character data
to be stored in iso-8859-1. It also must allow input and output
using stdin and stdout.

This works just fine with the Windows binary of Python.
sys.stdin.encoding is correctly set to the encoding of the
current terminal ('cp437').

s = sys.stdin.readline()
# Convert to iso-8859-1.
s = s.decode(sys.stdin.encoding).encode('iso-8859-1')

Granted, users are constrained to entering characters in the
cp437 charset, but that's better than the following.

The Cygwin binary I have (2.4.3) reports sys.stdin.encoding as
'US-ASCII', which is quite wrong. A Cygwin terminal uses, as far
as I can tell, iso-8859-1. This renders the above construction
useless if the user enters any character codes above 128.
Using raw_input instead of readline addresses the problem by making
it impossible to enter non-ascii text.

Please advise.
Give the user the ability to explicitly give an encoding. Using the
encoding attribute of files is quite fragile. If you redirect stdin or
stdout the encoding is set to None for example because the interpreter
can't tell what encoding the "other side" of the redirection produces or
expects.

BTW the US-ASCII isn't wrong but just limiting as everything in the ASCII
range is the same in ISO-8859-1.

Ciao,
Marc 'BlackJack' Rintsch
Oct 17 '06 #2

P: n/a
Neil Cerutti schrieb:
The Cygwin binary I have (2.4.3) reports sys.stdin.encoding as
'US-ASCII', which is quite wrong. A Cygwin terminal uses, as far
as I can tell, iso-8859-1. This renders the above construction
useless if the user enters any character codes above 128.
Using raw_input instead of readline addresses the problem by making
it impossible to enter non-ascii text.

Please advise.
In principle, setting the LANG environment variable should help.
Unfortunately, Cygwin doesn't implement locales correctly (neither
in the Unix way, nor in the Windows way), hence Python's machinery
fails.

If you believe that a Cygwin terminal always uses Latin-1 (try
entering , though - it could be windows-1252 instead), you should
be able to hard-code that, by determining that it is a Cygwin
Python, or that you are running in a Cygwin terminal.

Regards,
Martin
Oct 17 '06 #3

P: n/a
On 2006-10-17, Marc 'BlackJack' Rintsch <bj****@gmx.netwrote:
In <sl*******************@FIAD06.norwich.edu>, Neil Cerutti wrote:
>I'm writing an application that needs all internal character data
to be stored in iso-8859-1. It also must allow input and output
using stdin and stdout.

Give the user the ability to explicitly give an encoding.
Using the encoding attribute of files is quite fragile. If you
redirect stdin or stdout the encoding is set to None for
example because the interpreter can't tell what encoding the
"other side" of the redirection produces or expects.
Thanks for that sensible idea.

On the other hand, if Python's implementors couldn't figure out
what the encoding is, I doubt the average user has a prayer. ;-)

--
Neil Cerutti
Oct 17 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.