By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
426,012 Members | 988 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 426,012 IT Pros & Developers. It's quick & easy.

codecs latin1 unicode standard output file

P: n/a
Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are , , ", "latin-1")
This works fine. This is not exactly what I wanted to have. I would like to
write this to standard output so that I can use same code to produce output
lines on console or to use this to pipe into file. It was possible before
Python 2.3. Isn't possible anymore with same code?
--
Marko Faldix
M+R Infosysteme
Hubert-Wienen-Str. 24 52070 Aachen
Tel.: 0241-93878-16 Fax.:0241-875095
E-Mail: markopointfaldix@mplusrpointde
Jul 18 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
"Marko Faldix" <ma********************@mplusr.de> writes:
Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")
This works fine. This is not exactly what I wanted to have. I would like to
write this to standard output so that I can use same code to produce output
lines on console or to use this to pipe into file. It was possible before
Python 2.3. Isn't possible anymore with same code?


If your locale is setup up in an appropriate way, you should be able
to print latin-1 characters to stdout without any intervention at all.

If that doesn't work, we need more details.

Cheers,
mwh

--
Also, remember to put the galaxy back when you've finished, or an
angry mob of astronomers will come round and kneecap you with a
small telescope for littering.
-- Simon Tatham, ucam.chat, from Owen Dunn's review of the year
Jul 18 '05 #2

P: n/a
Hi,

"Michael Hudson" <mw*@python.net> schrieb im Newsbeitrag
news:m3************@pc150.maths.bris.ac.uk...
"Marko Faldix" <ma********************@mplusr.de> writes:
Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are , , ", "latin-1")
This works fine. This is not exactly what I wanted to have. I would like to write this to standard output so that I can use same code to produce output lines on console or to use this to pipe into file. It was possible before Python 2.3. Isn't possible anymore with same code?


If your locale is setup up in an appropriate way, you should be able
to print latin-1 characters to stdout without any intervention at all.

If that doesn't work, we need more details.

Cheers,
mwh

I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py:

# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are , , ", "latin-1")
print "My umlauts are , , "

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\ klotentest1.py", line
3, in ?
print unicode("My umlauts are , , ", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)
( By the way: error result is same if I call it this way: python
klotentest1.py > klotentest1.txt )

In my point of view python shouldn't act in different ways whether result is
piped to file or not.

Marko Faldix

Jul 18 '05 #3

P: n/a
Marko Faldix wrote:
I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py:

# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are , , ", "latin-1")
print "My umlauts are , , "

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.
your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\ klotentest1.py", line
3, in ?
print unicode("My umlauts are , , ", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)

In my point of view python shouldn't act in different ways whether result is
piped to file or not.


when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>


Jul 18 '05 #4

P: n/a
Hi,

"Fredrik Lundh" <fr*****@pythonware.com> schrieb im Newsbeitrag
news:ma*************************************@pytho n.org...
Marko Faldix wrote:
I try to describe. It's a Window machine with Python 2.3.2 installed. Using command line (cmd). Put these lines of code in a file called klotentest1.py:
# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are , , ", "latin-1")
print "My umlauts are , , "

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers strange letters but no error.


your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\ klotentest1.py", line 3, in ?
print unicode("My umlauts are , , ", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)

In my point of view python shouldn't act in different ways whether result is piped to file or not.


when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>


So I just have to use only this:

print "My umlauts are , , "

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?
Marko Faldix


Jul 18 '05 #5

P: n/a
"Marko Faldix" <ma********************@mplusr.de> writes:
print "My umlauts are ä, ö, ü"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?


Wrong. On your operating system, notepad.exe and the console use
*different* encodings. If you think this is stupid, please complain to
Microsoft. If you print byte strings, it will come out wrong either in
the terminal, or in notepad - there is *no way* to have the same byte
string show correctly in both encodings.

If you want to output to a file, you should open the file in
locale.getpreferredencoding(). If you want to output to a terminal,
Python should automatically find out what the terminal's encoding is
(to make things worse, the user can override the terminal encoding
on Windows, on a per-terminal basis, using chcp.exe).

Regards,
Martin

Jul 18 '05 #6

P: n/a
On Mon, 15 Dec 2003 12:38:50 +0100, "Fredrik Lundh" <fr*****@pythonware.com> wrote:
Marko Faldix wrote:
I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py: ^^^^[1]
# -*- coding: iso-8859-1 -*- ^^^^^^^^^^[2]
print unicode("My umlauts are , , ", "latin-1")
print "My umlauts are , , " ^^^^^^^^^^^^^^^^^^^^^^^^[3] [...] Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.
your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...

I think the OP is suggesting that given [1] & [2], [3] should implicitly carry the [2] info
and be converted for output just like the result of unicode(...) is.

(I know that's not the way it works now, and I know it's not an easy problem ;-)
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\ klotentest1.py", line
3, in ?
print unicode("My umlauts are , , ", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)

In my point of view python shouldn't act in different ways whether result is
piped to file or not.


when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence (I have to get back to a previous thread with Martin, where
I owe a reply. This same issue is key there). (I realize that's not the way it works now,
and that it's a hard problem, to repeat myself ;-)

Regards,
Bengt Richter
Jul 18 '05 #7

P: n/a

"Marko Faldix" <ma********************@mplusr.de> wrote in message news:br************@ID-108329.news.uni-berlin.de...
In my point of view python shouldn't act in different ways whether result is piped to file or not.


when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>


So I just have to use only this:

print "My umlauts are , , "

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?


No, the right code is
=============================
# -*- coding: iso-8859-1 -*-
import locale, codecs, sys

if not sys.stdout.isatty():
sys.stdout = codecs.lookup(locale.getpreferedencoding())[3](sys.stdout)

print u"My umlauts are , , "
=============================
The difference between console and file output is that while
there's only one way to output on cp850 console, there
are many ways to output the same character to file (latin-1,
utf-8, utf-7, utf-16le, utf-16be, cp850 and maybe more).
So python refuses to guess.
Another rule to follow is to store non-ascii character in
unicode strings. Otherwise either you will have to track
the encodings yourself or assume that all 8-bits strings
in your program have the same encoding. That's not
a good idea. I'm not sure if you will have proper .upper()
and .lower() methods on 8-bit strings. (don't have python
here to check)

-- Serge.
Jul 18 '05 #8

P: n/a
Bengt Richter wrote:
I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence


The OP could easily overcome this aspect of the problem with a Unicode
literal (and in fact, he originally did convert the string literal to
a Unicode object before further processing).

This does not solve the problem, though: Writing the Unicode object to
a file still gives an encoding error, since he did not specify the
encoding of the file.

Regards,
Martin

Jul 18 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.