codecs latin1 unicode standard output file

Marko Faldix

Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")
This works fine. This is not exactly what I wanted to have. I would like to
write this to standard output so that I can use same code to produce output
lines on console or to use this to pipe into file. It was possible before
Python 2.3. Isn't possible anymore with same code?
--
Marko Faldix
M+R Infosysteme
Hubert-Wienen-Str. 24 52070 Aachen
Tel.: 0241-93878-16 Fax.:0241-875095
E-Mail: markopointfaldix@mplusrpointde

Jul 18 '05 #1

Subscribe Post Reply

2927

Michael Hudson

"Marko Faldix" <ma********************@mplusr.de> writes:

Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are Ã¤, Ã¶, Ã¼", "latin-1")
This works fine. This is not exactly what I wanted to have. I would like to
write this to standard output so that I can use same code to produce output
lines on console or to use this to pipe into file. It was possible before
Python 2.3. Isn't possible anymore with same code?

If your locale is setup up in an appropriate way, you should be able
to print latin-1 characters to stdout without any intervention at all.

If that doesn't work, we need more details.

Cheers,
mwh

--
Also, remember to put the galaxy back when you've finished, or an
angry mob of astronomers will come round and kneecap you with a
small telescope for littering.
-- Simon Tatham, ucam.chat, from Owen Dunn's review of the year

Jul 18 '05 #2

Marko Faldix

Hi,

"Michael Hudson" <mw*@python.net> schrieb im Newsbeitrag
news:m3************@pc150.maths.bris.ac.uk...

"Marko Faldix" <ma********************@mplusr.de> writes:
Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")
This works fine. This is not exactly what I wanted to have. I would like to write this to standard output so that I can use same code to produce output lines on console or to use this to pipe into file. It was possible before Python 2.3. Isn't possible anymore with same code?

If your locale is setup up in an appropriate way, you should be able
to print latin-1 characters to stdout without any intervention at all.

If that doesn't work, we need more details.

Cheers,
mwh

I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py:

# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü"

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\ klotentest1.py", line
3, in ?
print unicode("My umlauts are õ, ÷, ³", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)
( By the way: error result is same if I call it this way: python
klotentest1.py > klotentest1.txt )

In my point of view python shouldn't act in different ways whether result is
piped to file or not.

Marko Faldix

Jul 18 '05 #3

Fredrik Lundh

Marko Faldix wrote:

I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py:

# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü"

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.
your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\ klotentest1.py", line
3, in ?
print unicode("My umlauts are õ, ÷, ³", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)

In my point of view python shouldn't act in different ways whether result is
piped to file or not.

when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>

Jul 18 '05 #4

Marko Faldix

Hi,

"Fredrik Lundh" <fr*****@pythonware.com> schrieb im Newsbeitrag
news:ma*************************************@pytho n.org...

Marko Faldix wrote:
I try to describe. It's a Window machine with Python 2.3.2 installed. Using command line (cmd). Put these lines of code in a file called klotentest1.py:
# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü"

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers strange letters but no error.

your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\ klotentest1.py", line 3, in ?
print unicode("My umlauts are õ, ÷, ³", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)

In my point of view python shouldn't act in different ways whether result is piped to file or not.

when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>

So I just have to use only this:

print "My umlauts are ä, ö, ü"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?
Marko Faldix

Jul 18 '05 #5

Martin v. Löwis

"Marko Faldix" <ma********************@mplusr.de> writes:

print "My umlauts are Ã¤, Ã¶, Ã¼"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?

Wrong. On your operating system, notepad.exe and the console use
*different* encodings. If you think this is stupid, please complain to
Microsoft. If you print byte strings, it will come out wrong either in
the terminal, or in notepad - there is *no way* to have the same byte
string show correctly in both encodings.

If you want to output to a file, you should open the file in
locale.getpreferredencoding(). If you want to output to a terminal,
Python should automatically find out what the terminal's encoding is
(to make things worse, the user can override the terminal encoding
on Windows, on a per-terminal basis, using chcp.exe).

Regards,
Martin

Jul 18 '05 #6

Bengt Richter

On Mon, 15 Dec 2003 12:38:50 +0100, "Fredrik Lundh" <fr*****@pythonware.com> wrote:

Marko Faldix wrote:
I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py: ^^^^[1]
# -*- coding: iso-8859-1 -*- ^^^^^^^^^^[2]
print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü" ^^^^^^^^^^^^^^^^^^^^^^^^[3] [...] Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.
your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...

I think the OP is suggesting that given [1] & [2], [3] should implicitly carry the [2] info
and be converted for output just like the result of unicode(...) is.

(I know that's not the way it works now, and I know it's not an easy problem ;-)
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\ klotentest1.py", line
3, in ?
print unicode("My umlauts are õ, ÷, ³", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)

In my point of view python shouldn't act in different ways whether result is
piped to file or not.

when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence (I have to get back to a previous thread with Martin, where
I owe a reply. This same issue is key there). (I realize that's not the way it works now,
and that it's a hard problem, to repeat myself ;-)

Regards,
Bengt Richter

Jul 18 '05 #7

Serge Orlov

"Marko Faldix" <ma********************@mplusr.de> wrote in message news:br************@ID-108329.news.uni-berlin.de...

In my point of view python shouldn't act in different ways whether result is piped to file or not.

when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>

So I just have to use only this:

print "My umlauts are ä, ö, ü"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?

No, the right code is
=============================
# -*- coding: iso-8859-1 -*-
import locale, codecs, sys

if not sys.stdout.isatty():
sys.stdout = codecs.lookup(locale.getpreferedencoding())[3](sys.stdout)

print u"My umlauts are ä, ö, ü"
=============================
The difference between console and file output is that while
there's only one way to output ä on cp850 console, there
are many ways to output the same character to file (latin-1,
utf-8, utf-7, utf-16le, utf-16be, cp850 and maybe more).
So python refuses to guess.
Another rule to follow is to store non-ascii character in
unicode strings. Otherwise either you will have to track
the encodings yourself or assume that all 8-bits strings
in your program have the same encoding. That's not
a good idea. I'm not sure if you will have proper .upper()
and .lower() methods on 8-bit strings. (don't have python
here to check)

-- Serge.

Jul 18 '05 #8

Martin v. Löwis

Bengt Richter wrote:

I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence

The OP could easily overcome this aspect of the problem with a Unicode
literal (and in fact, he originally did convert the string literal to
a Unicode object before further processing).

This does not solve the problem, though: Writing the Unicode object to
a file still gives an encoding error, since he did not specify the
encoding of the file.

Regards,
Martin

Jul 18 '05 #9

Similar topics

On Creating Codecs

by: Your Name | last post by:

Hi, I have been trying to generate codecs for my language in Python using gencodec.py. The problem is the codec created does not work. Here is the process that I followed. I created a directory...

Python

Python 2.1 / 2.3: xreadlines not working with codecs.open

by: Eric Brunel | last post by:

Hi all, I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns...

Python

Codecs

by: Ivan Van Laningham | last post by:

Hi All-- As far as I can tell, after looking only at the documentation (and not searching peps etc.), you cannot query the codecs to give you a list of registered codecs, or a list of possible...

Python

decode unicode string using 'unicode_escape' codecs

by: aurora | last post by:

I have some unicode string with some characters encode using python notation like '\n' for LF. I need to convert that to the actual LF character. There is a 'unicode_escape' codec that seems to...

Python

Python UTF-8 and codecs

by: Mike Currie | last post by:

I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't decode byte 0x85 in position 255: oridinal not in...

Python

codecs.EncodedFile

by: Neil Cerutti | last post by:

Perhaps I'm just bad at searching for bugs, but anyhow, I wanted to know what you all thought about the following behavior. A quick search of pydev archives yielded a nice wrapper to apply to...

Python

Using codecs.EncodedFile() with Python 2.5

by: David Hughes | last post by:

I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program raises an exception using Python 2.5. This small...

Python

codecs / subprocess interaction: utf help requested

by: smitty1e | last post by:

The first print statement does what you'd expect. The second print statement has rather a lot of rat in it. The goal here is to write a function that will return the man page for some command...

Python

Mysql database in UTF8, PHP shows latin1 (iso-8859-1)

by: alex | last post by:

I've converted a latin1 database I have to utf8. The process has been: # mysqldump -u root -p --default-character-set=latin1 -c --insert-ignore --skip-set-charset mydb mydb.sql # iconv -f...

PHP

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing