473,326 Members | 2,192 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Unicode & Pythonwin / win32 / console?

Hello,

I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this:
I want to use win32-fuctions like win32ui.MessageBox,
listctrl.InsertItem ..... to get unicode strings on the screen - best
results according to the platform/language settings (mainly XP Home,
W2K, ...).

Also unicode strings should be displayed as nice as possible at the
console with normal print-s to stdout (on varying platforms, different
windows/countries and linux, ...; I py2exe/cxfreeze apps) ...

Any hints how to do this and make it as complete and automated as
possible?

Robert

Jan 9 '06 #1
7 4167
Robert wrote:
I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this:
I want to use win32-fuctions like win32ui.MessageBox,
listctrl.InsertItem ..... to get unicode strings on the screen - best
results according to the platform/language settings (mainly XP Home,
W2K, ...).
Not sure what your question is - is there even a question in this
paragraph? (notice I didn't understand the term "to come clear with")
Also unicode strings should be displayed as nice as possible at the
console with normal print-s to stdout (on varying platforms, different
windows/countries and linux, ...; I py2exe/cxfreeze apps) ...

Any hints how to do this and make it as complete and automated as
possible?


No need to do anything - it should work out of the box.

Regards,
Martin
Jan 9 '06 #2

Martin v. Löwis schrieb:
Robert wrote:
I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this:
I want to use win32-fuctions like win32ui.MessageBox,
listctrl.InsertItem ..... to get unicode strings on the screen - best
results according to the platform/language settings (mainly XP Home,
W2K, ...).


Not sure what your question is - is there even a question in this
paragraph? (notice I didn't understand the term "to come clear with")
Also unicode strings should be displayed as nice as possible at the
console with normal print-s to stdout (on varying platforms, different
windows/countries and linux, ...; I py2exe/cxfreeze apps) ...

Any hints how to do this and make it as complete and automated as
possible?


No need to do anything - it should work out of the box.


hmm...? never got any non-questionable results:

PythonWin 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit
(Intel)] on win32.
Portions Copyright 1994-2004 Mark Hammond (mh******@skippinet.com.au) -
see 'Help/About PythonWin' for further copyright information.
import win32ui,glob
s=glob.glob(u'/devel/test/*')[-2]
s u'/devel/test\\\u041f\u043e\u0448\u0443\u043a.txt' win32ui.MessageBox(s) Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position
12-16: ordinal not in range(128) print s /devel/test\?????.txt


[on a WinXP Home - German; same with py2.3, win98, ... ]
The Windows Explorer displays correct cyrillic letters for this file
name.
win32ui.MessageBox(s) seems to try ascii codec by default. But
mbcs/latin-1 (?) encoded 8bit strings like 'aousäöüß' (
=='aous\\xe4\\xf6\\xfc\\xdf' ) work (on this machine).

Robert

Jan 10 '06 #3
Robert wrote:
Also unicode strings should be displayed as nice as possible at the
console with normal print-s to stdout (on varying platforms, different
windows/countries and linux, ...; I py2exe/cxfreeze apps) ...

Any hints how to do this and make it as complete and automated as
possible?


No need to do anything - it should work out of the box. [...]
win32ui.MessageBox(s)
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position
12-16: ordinal not in range(128)


Can't comment on that - this is a PythonWin issue.
print s


/devel/test\?????.txt


I see. You need to do "chcp 1251" in your console first, for this
to print this string correctly (and potentially also set the
console font to Lucida Console).

However, if you would do the same on a Russian Windows installation,
the user will not need to change anything - cyrillic letters come
out right. Likewise for Umlauts in a German windows installation,
and Greek letters in a Greek installation.

Regards,
Martin
Jan 10 '06 #4

Martin v. Löwis schrieb:
Robert wrote:
>win32ui.MessageBox(s)
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position
12-16: ordinal not in range(128)


Can't comment on that - this is a PythonWin issue.
>print s


/devel/test\?????.txt


I see. You need to do "chcp 1251" in your console first, for this
to print this string correctly (and potentially also set the
console font to Lucida Console).


is in a PythonWin Interactive session - ok results for cyrillic chars
(tolerant mbcs/utf-8 encoding!).
But if I do this on Win console (as you probably mean), I get also
encoding Errors - no matter if chcp1251, because cyrillic chars raise
the encoding errors also.
I think this is not a good behaviour of python to be so picky. In
11**********************@g44g2000cwa.googlegroups. com I showed, how I
solved this so far. Any better/portable idea?
However, if you would do the same on a Russian Windows installation,
the user will not need to change anything - cyrillic letters come
out right. Likewise for Umlauts in a German windows installation,
and Greek letters in a Greek installation.


Yes. But the original problem is, that occasionally unicode strings
(filenames in my case) arise which are not defined on the local
platform, but have to be displayed (in 'replace' - encoding-mode)
without breaking the app flow. Thats the pain of the default behaviour
of current python - and there is no simple switch. Why should "print
xy" not print something _always_ as good and as far as possible?

Robert

Jan 11 '06 #5
Robert wrote:
is in a PythonWin Interactive session - ok results for cyrillic chars
(tolerant mbcs/utf-8 encoding!).
But if I do this on Win console (as you probably mean), I get also
encoding Errors - no matter if chcp1251, because cyrillic chars raise
the encoding errors also.
If you do "chcp 1251" (not "chcp1251") in the console, and then
run python.exe in the same console, what is the value of
sys.stdout.encoding?
I think this is not a good behaviour of python to be so picky.
I think it it is good.

Errors should never pass silently.
Unless explicitly silenced.
In
11**********************@g44g2000cwa.googlegroups. com I showed, how I
solved this so far. Any better/portable idea?
Not sure why you aren't using sys.stdout.encoding on Linux. I would do

try:
c = codecs.getwriter(sys.stdout.encoding)
except:
c = codecs.getwriter('ascii')
sys.stdout = c(sys.stdout, 'replace')

Also, I wouldn't edit site.py, but instead add sitecustomize.py.
Yes. But the original problem is, that occasionally unicode strings
(filenames in my case) arise which are not defined on the local
platform, but have to be displayed (in 'replace' - encoding-mode)
without breaking the app flow. Thats the pain of the default behaviour
of current python - and there is no simple switch. Why should "print
xy" not print something _always_ as good and as far as possible?


Because the author of the application wouldn't know that there
is a bug in the application, and that information was silently
discarded. Users might only find out much later that they have
question marks in places where users originally entered data,
and they would have no way of retrieving the original data.

If you can accept that data loss: fine, but you should silence
the errors explicitly.

Regards,
Martin
Jan 11 '06 #6

Martin v. Löwis schrieb:
Robert wrote:
is in a PythonWin Interactive session - ok results for cyrillic chars
(tolerant mbcs/utf-8 encoding!).
But if I do this on Win console (as you probably mean), I get also
encoding Errors - no matter if chcp1251, because cyrillic chars raise
the encoding errors also.


If you do "chcp 1251" (not "chcp1251") in the console, and then
run python.exe in the same console, what is the value of
sys.stdout.encoding?


correctly: 'cp1252' in my case; cyrillic-chars break "print" (on PC
linux 2.2 tty sys.stdout.encoding does not exist)

I live with this in site(customize):
# tolerant unicode output ... #
_stdout=sys.stdout
if sys.platform=='win32' and not
sys.modules.has_key('pywin.framework.startup'):
_stdoutenc=getattr(_stdout,'encoding',sys.getdefau ltencoding())
class StdOut:
def write(self,s):
_stdout.write(s.encode(_stdoutenc,'backslashreplac e'))
sys.stdout=StdOut()
elif sys.platform.startswith('linux'):
import locale
_stdoutenc=locale.getdefaultlocale()[1]
class StdOut:
def write(self,s):
_stdout.write(s.encode(_stdoutenc,'backslashreplac e'))
sys.stdout=StdOut()

I think this is not a good behaviour of python to be so picky.


I think it it is good.

Errors should never pass silently.
Unless explicitly silenced.


A political question. Arguments:

* Webbrowsers for example have to display defective HTML as good as
possible, unknown unicode chars as "?" and so on... Users got very
angry in the beginning of browsers when 'strict' programmers displayed
their exception error boxes ...

* at least the "print" statement has to go through - the costs (for
angry users and developers; e.g.
http://blog.ianbicking.org/do-i-hate...ate-ascii.html)
are much higher when apps suddenly break in simple print/display-output
when the system picks up alien unicode chars somewhere (e.g.
occasionally in filenames,...). No one is really angry when
occasionally chinese chars are displayed cryptically on non-chinese
computers. One can investigate, add fonts, ... to improve, or do
nothing in most cases, but apps do not break on every print statement!
This is not only true for tty-output, but also for log-file redirect
and almost any common situation for print/normal
stdout/file-(write)-output.

* anything is nice-printable in python by default, why not
unicode-strings!? If the decision for default 'strict' encoding on
stdout stands, we have at least to discuss about print-repr for
unicode.

* the need for having technical strings 'strict' is much more rare. And
programmers are anyway very aware in such situations . e.g. by
asciifile.write( us.encode(xy,'strict') ) .

* on Windows for example the (good) mbcs_encode is anyway tolerant as
it: unkown chars are mapped to '?' . I never had any objection to this.
Some recommendations - soft to hard:

* make print-repr for unicode strings tolerant (and in PythonWin
alwasy tolerant 'mbcs' encoding)

* make stdout/files to have 'replace'-mode encoding by default.
(similar as done with my code above)

* set site.py/encoding=('ascii', 'replace') # if not
utf-8/mbcs/locale ;enable a tuple
* save sys._setdefaultencoding by default

* I would also live perfectly with .encode(enc) to run 'replace' by
default, and 'strict' on demand. None of my apps and scripts would
break because of this, but win. A programmer is naturally very aware
when he wants 'strict'. Can you name realistic cases where 'replace'
behavior would be so critical that a program damages something?

In
11**********************@g44g2000cwa.googlegroups. com I showed, how I
solved this so far. Any better/portable idea?


Not sure why you aren't using sys.stdout.encoding on Linux. I would do

try:
c = codecs.getwriter(sys.stdout.encoding)
except:
c = codecs.getwriter('ascii')
sys.stdout = c(sys.stdout, 'replace')

Also, I wouldn't edit site.py, but instead add sitecustomize.py.


I have more problems with the shape of sys.path in different
situations, multiple sitecustomize.py on other apps, environments, OS /
users, cxfreeze,py2exe ... sitecustomize not stackable easily: a
horror solution. The need is for a callable _function_ or for general
change in python behaviour.

modifiying site.py is better and stable for me (I have my
patch/module-todo-list handy each time i install a new python), as I
always want tolerant behaviour. in code i check for
site.encoding/_setdefaultencoding (I save this). Thus i get one central
error if setup is not correct, but not evil unicode-errors somewhere
deep in the app once on a russian computer in the future...
Yes. But the original problem is, that occasionally unicode strings
(filenames in my case) arise which are not defined in the local
platform encodings, but have to be displayed (in 'replace' encoding mode)
without breaking the app flow. Thats the pain of the default behaviour
of current python - and there is no simple switch. Why should "print
xy" not print something _always_ as good and as far as possible?


Because the author of the application wouldn't know that there
is a bug in the application, and that information was silently
discarded. Users might only find out much later that they have
question marks in places where users originally entered data,
and they would have no way of retrieving the original data.

If you can accept that data loss: fine, but you should silence
the errors explicitly.


this is black/white theoretical - not real and practical (as python
wants to be). see above.

Robert

Jan 12 '06 #7
Robert wrote:
* Webbrowsers for example have to display defective HTML as good as
possible, unknown unicode chars as "?" and so on... Users got very
angry in the beginning of browsers when 'strict' programmers displayed
their exception error boxes ...
Right. If you would develop a webbrowser in Python, you should do the
same.
No one is really angry when
occasionally chinese chars are displayed cryptically on non-chinese
computers.
That is not true. Japanese are *frequently* upset when their
characters don't render correctly. They even have a word for that:
moji-bake. I assume it is the similar for Chinese.
* anything is nice-printable in python by default, why not
unicode-strings!? If the decision for default 'strict' encoding on
stdout stands, we have at least to discuss about print-repr for
unicode.
If you want to see this change really badly, you need to write a PEP.
* on Windows for example the (good) mbcs_encode is anyway tolerant as
it: unkown chars are mapped to '?' . I never had any objection to this.
Apparently, you haven't been dealing with character sets long enough.
I have seen *a lot* of objections to the way the CP_ACP encoding
deals with errors, e.g.

http://groups.google.com/group/comp....e=source&hl=en

When windows converts these file names in CP_ACP, then the
file names in a directory are not round-trippable. This is
a source of permanent pain.
* I would also live perfectly with .encode(enc) to run 'replace' by
default, and 'strict' on demand. None of my apps and scripts would
break because of this, but win. A programmer is naturally very aware
when he wants 'strict'. Can you name realistic cases where 'replace'
behavior would be so critical that a program damages something?


File names. Replace an unencodable filename with a question mark,
and you get a pattern that matches multiple files. For example, do

get_deletable_files.py | xargs rm

and you remove much more files than you want to. Pretty catastrophic.

Regards,
Martin
Jan 13 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: RJS | last post by:
Hi all, I can't get a py2exe compiled app to run with numarray (numarray-0.5.win32- py2.2). Also wxPythonWIN32-2.3.3.1-Py22 and ActivePython-2.2.1-222. In the sample below, commenting out...
4
by: Anand K Rayudu | last post by:
Hi all, I want to use python with COM extension. I am successful using python as client. I could interact to my application through COM interfaces from python. I also want to use the win32ui...
4
by: Chris P. | last post by:
I've been having a problem with PythonWin that seemed to start completely spontaneously and I don't even know where to START to find the answer. The only thing I can think of that marks the point...
5
by: David | last post by:
Hello, I really need help. I use C++ VisualStudio.net version 2002. I tried to create a very simple console app. However, it won't compile, then I found out that I am missing two important...
3
by: Thomas Heller | last post by:
I'm using code.Interactive console but it doesn't work correctly with non-ascii characters. I think it boils down to this problem: Python 2.4.3 (#69, Mar 29 2006, 17:35:34) on win32 Type...
17
by: Stuart McGraw | last post by:
In the announcement for Python-2.3 http://groups.google.com/group/comp.lang.python/msg/287e94d9fe25388d?hl=en it says "raw_input(): can now return Unicode objects". But I didn't see anything...
0
by: jbenezech | last post by:
Hi all , I have a perl/java app running under Win32. The application consists of a perl service (Win32::Daemon) and of java classes. The perl service calls every xx hours java classes to perform...
2
by: | last post by:
I mainly work on OS X, but thought I'd experiment with some Python code on XP. The problem is I can't seem to get these things to work at all. First of all, I'd like to use Greek letters in the...
0
by: John Machin | last post by:
On Apr 25, 9:15 pm, "andreas.prof...@googlemail.com" <andreas.prof...@googlemail.comwrote: Guessing is no substitute for reading the manual. print has nothing to do with your problem; the...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.