I'm trying to get python, unicode and kdialog to play nicely together.
This is a linux machine, and kdialog is a way to generate dialog boxes in
kde with which users can interact (for example input text), and you can
use the outputted text in your script.
Anyway, what I'm doing is reading from a utf-8 encoded text file using the
codecs module, and using the following:
data = codecs.open('file', 'r', 'utf-8')
I then manipulate the data to break it down into text snippets.
Then I run this command: test = os.popen('kdialog --inputbox %s' %(data))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017a' in
position 272: ordinal not in range(128)
I would really like kdialog display the text as utf-8. However, it seems
that python is trying to pass the utf-8 encoded data as ascii, which
obviously fails because it can't deal with the utf-8 encoded text. Is it
possible to pass the text out to kdialog as utf-8, rather than ascii?
Or have I completely misunderstood the whole process, in which case, can
you please enlighten me.
Matt 10 2373
Dumbkiwi wrote: I'm trying to get python, unicode and kdialog to play nicely together. This is a linux machine, and kdialog is a way to generate dialog boxes in kde with which users can interact (for example input text), and you can use the outputted text in your script.
Anyway, what I'm doing is reading from a utf-8 encoded text file using the codecs module, and using the following:
data = codecs.open('file', 'r', 'utf-8')
data is now a unicode string. I then manipulate the data to break it down into text snippets.
Then I run this command:
test = os.popen('kdialog --inputbox %s' %(data))
Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\u017a' in position 272: ordinal not in range(128)
I would really like kdialog display the text as utf-8. However, it seems that python is trying to pass the utf-8 encoded data as ascii, which obviously fails because it can't deal with the utf-8 encoded text. Is it possible to pass the text out to kdialog as utf-8, rather than ascii?
Just encode the data in the target encoding before passing it to os.popen():
test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))
Peter
On Tue, 26 Apr 2005 11:41:01 +0200, Peter Otten wrote: Dumbkiwi wrote:
I'm trying to get python, unicode and kdialog to play nicely together. This is a linux machine, and kdialog is a way to generate dialog boxes in kde with which users can interact (for example input text), and you can use the outputted text in your script.
Anyway, what I'm doing is reading from a utf-8 encoded text file using the codecs module, and using the following:
data = codecs.open('file', 'r', 'utf-8')
data is now a unicode string.
I then manipulate the data to break it down into text snippets.
Then I run this command:
> test = os.popen('kdialog --inputbox %s' %(data)) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\u017a' in position 272: ordinal not in range(128)
I would really like kdialog display the text as utf-8. However, it seems that python is trying to pass the utf-8 encoded data as ascii, which obviously fails because it can't deal with the utf-8 encoded text. Is it possible to pass the text out to kdialog as utf-8, rather than ascii?
Just encode the data in the target encoding before passing it to os.popen():
test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))
Peter
I had tried that, but then the text looks like crap. The text I'm using
for this is Polish, and there are a lot of non-English characters in
there. Using this method results in some strange characters - basically it
looks like a file encoded in utf-8, but displayed using iso-8859-1.
Is this the best I can do?
Thanks for your help.
Matt
Dumbkiwi wrote: Just encode the data in the target encoding before passing it to os.popen():
test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))
I had tried that, but then the text looks like crap. The text I'm using for this is Polish, and there are a lot of non-English characters in there. Using this method results in some strange characters - basically it looks like a file encoded in utf-8, but displayed using iso-8859-1.
Is this the best I can do?
I've just tried the setup you described (with German umlauts instead of
Polish characters) on my Suse 9.1, and it works as expected with both
Python 2.3 and 2.4. Perhaps the target encoding you need is not UTF-8. I
would try other popular encodings used for Polish text (no idea what these
are). sys.stdout.encoding might give you a clue.
Peter
Peter Otten <__*******@web.de> wrote in message news:<d4*************@news.t-online.com>... Dumbkiwi wrote:
Just encode the data in the target encoding before passing it to os.popen():
test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))
I had tried that, but then the text looks like crap. The text I'm using for this is Polish, and there are a lot of non-English characters in there. Using this method results in some strange characters - basically it looks like a file encoded in utf-8, but displayed using iso-8859-1.
Is this the best I can do?
I've just tried the setup you described (with German umlauts instead of Polish characters) on my Suse 9.1, and it works as expected with both Python 2.3 and 2.4. Perhaps the target encoding you need is not UTF-8. I would try other popular encodings used for Polish text (no idea what these are). sys.stdout.encoding might give you a clue.
Peter
Both sys.stdout.encoding and sys.stdin.encoding give:
ANSI_X3.4-1968
which is ascii (I think).
I'd be interested to see what your default encoding is, and why your
output was different.
Anyway, from your post, I've done some more digging, and found the
command:
sys.setappdefaultencoding()
which I've used, and it's fixed the problem (I think).
Thanks for your help.
Matt
dumbkiwi wrote: I'd be interested to see what your default encoding is,
ascii
and why your output was different.
If only I knew.
Anyway, from your post, I've done some more digging, and found the command:
sys.setappdefaultencoding()
That is an alias for sys.setdefaultencoding() created by your IDE (Eric),
and therefore may not always be available.
Peter
On 26 Apr 2005 13:39:26 -0700, dm*****@gmail.com (dumbkiwi) wrote: Peter Otten <__*******@web.de> wrote in message news:<d4*************@news.t-online.com>... Dumbkiwi wrote:
>> Just encode the data in the target encoding before passing it to >> os.popen(): Anyway, from your post, I've done some more digging, and found the command:
sys.setappdefaultencoding()
which I've used, and it's fixed the problem (I think).
Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in
the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.
In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be that
you didn't know what encoding to use.
What is the essential difference between
send(u_data.encode('polish'))
and
sys.setappdefaultencoding('polish')
...
send(u_data)
[1]: Now that's *TWO* contenders for TautologyOTW :-)
Cheers,
John
Peter Otten wrote: dumbkiwi wrote:
I'd be interested to see what your default encoding is, ascii
and why your output was different.
If only I knew.
Anyway, from your post, I've done some more digging, and found the command:
sys.setappdefaultencoding()
That is an alias for sys.setdefaultencoding() created by your IDE
(Eric), and therefore may not always be available.
Peter
Hmmm. That's disappointing. I've also discovered that you can do:
import sys
reload(sys)
and then get access to sys.setdefaultencoding().
Will that get me into trouble?
Matt
John Machin wrote: On 26 Apr 2005 13:39:26 -0700, dm*****@gmail.com (dumbkiwi) wrote:
Peter Otten <__*******@web.de> wrote in message
news:<d4*************@news.t-online.com>... Dumbkiwi wrote:
>> Just encode the data in the target encoding before passing it
to >> os.popen(): Anyway, from your post, I've done some more digging, and found the command:
sys.setappdefaultencoding()
which I've used, and it's fixed the problem (I think).
Dumb Kiwi, eh? Maybe not so dumb -- where'd you find sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.
Hmmm. See post above, seems to be something generated by eric3. So
this may not be the fix I'm looking for. In any case, how could the magical sys.setappdefaultencoding() fix your problem? From your description, your problem appeared to be that you didn't know what encoding to use.
I knew what encoding to use, the problem was that the text was being
passed to kdialog as ascii. The .encode('utf-8') at least allows
kdialog to run, but the text still looks like crap. Using
sys.setappdefaultencoding() seemed to help. The text looked a bit
better - although not entirely perfect - but I think that's because the
font I was using didn't have the correct characters (they came up as
square boxes). What is the essential difference between
send(u_data.encode('polish'))
and
sys.setappdefaultencoding('polish') ... send(u_data)
Not sure - I'm new to character encoding, and most of this seems like
black magic to me. [1]: Now that's *TWO* contenders for TautologyOTW :-)
Cheers,
John
Matt
On 26 Apr 2005 19:16:25 -0700, dm*****@gmail.com wrote: John Machin wrote: On 26 Apr 2005 13:39:26 -0700, dm*****@gmail.com (dumbkiwi) wrote:
>Peter Otten <__*******@web.de> wrote in messagenews:<d4*************@news.t-online.com>... >> Dumbkiwi wrote: >> >> >> Just encode the data in the target encoding before passing itto >> >> os.popen(): > >Anyway, from your post, I've done some more digging, and found the >command: > >sys.setappdefaultencoding() > >which I've used, and it's fixed the problem (I think). >
Dumb Kiwi, eh? Maybe not so dumb -- where'd you find sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.
Hmmm. See post above, seems to be something generated by eric3. So this may not be the fix I'm looking for.
In any case, how could the magical sys.setappdefaultencoding() fix your problem? From your description, your problem appeared to be that you didn't know what encoding to use.
I knew what encoding to use,
Would you mind telling us (a) what that encoding is (b) how you came
to that knowledge (c) why you just didn't do
test = os.popen('kdialog --inputbox %s'
%(data.encode('that_encoding')))
instead of
test = os.popen('kdialog --inputbox %s' %(data.encode('utf-8')))
the problem was that the text was being passed to kdialog as ascii.
It wasn't being passed to kdialog; there was an attempt which failed.
The .encode('utf-8') at least allows kdialog to run, but the text still looks like crap. Using sys.setappdefaultencoding() seemed to help. The text looked a bit better - although not entirely perfect - but I think that's because the font I was using didn't have the correct characters (they came up as square boxes).
And the font you *were* using is what? And the font you are now using
is what? What facilities do you have to use different fonts? What is the essential difference between
send(u_data.encode('polish'))
and
sys.setappdefaultencoding('polish') ... send(u_data)
Not sure - I'm new to character encoding, and most of this seems like black magic to me.
The essential difference is that setting a default encoding is a daft
idea. [1]: Now that's *TWO* contenders for TautologyOTW :-)
Before I retract that back to one contender, I'll give it one more
shot:
1. Your data: you say it is Polish text, and is utf-8. This implies
that it is in Unicode, encoded as utf-8. What evidence do you have?
Have you been able to display it anywhere so that it "looks good"?
If it's not confidential, can you show us a dump of the first say 100
bytes of text, in an unambiguous form, like this:
print repr(open('polish.text', 'rb').read(100))
2. Your script: You say "I then manipulate the data to break it down
into text snippets" - uh-huh ... *what* manipulations? Care to tell
us? Care to show us the code?
3. kdialog: I know nothing of KDE and its toolkit. I would expect
either (a) it should take utf-8 and be able to display *any* of the
first 64K (nominal) Unicode characters, given a Unicode font or (b)
you can encode your data in a legacy charset, *AND* tell it what that
charset is, and have a corresponding font or (c) you have both
options. Which is correct, and what are the details of how you can
tell kdialog what to do -- configuration? command-line arguments?
HTHYTHYS,
John
John Machin wrote: On 26 Apr 2005 19:16:25 -0700, dm*****@gmail.com wrote:
John Machin wrote: On 26 Apr 2005 13:39:26 -0700, dm*****@gmail.com (dumbkiwi) wrote:
>Peter Otten <__*******@web.de> wrote in messagenews:<d4*************@news.t-online.com>... >> Dumbkiwi wrote: >> >> >> Just encode the data in the target encoding before passing
itto >> >> os.popen():
> >Anyway, from your post, I've done some more digging, and found
the >command: > >sys.setappdefaultencoding() > >which I've used, and it's fixed the problem (I think). >
Dumb Kiwi, eh? Maybe not so dumb -- where'd you find sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked
in the 2.4.1 docs and also did import sys; dir(sys) and I can't spot
it. Hmmm. See post above, seems to be something generated by eric3. So this may not be the fix I'm looking for.
In any case, how could the magical sys.setappdefaultencoding() fix your problem? From your description, your problem appeared to be
that you didn't know what encoding to use.
I knew what encoding to use,
Would you mind telling us (a) what that encoding is (b) how you came to that knowledge (c) why you just didn't do
(a) utf-8
(b) I asked the author of the text, and it displays properly in other
parts of the script when not using kdialog. Is there a way to test it
otherwise - I presume that there is. test = os.popen('kdialog --inputbox %s' %(data.encode('that_encoding')))
instead of
test = os.popen('kdialog --inputbox %s' %(data.encode('utf-8')))
Because, "that_encoding" == "utf-8" (as far as I was aware). the problem was that the text was being passed to kdialog as ascii. It wasn't being passed to kdialog; there was an attempt which failed.
Quite right. The .encode('utf-8') at least allows kdialog to run, but the text still looks like crap. Using sys.setappdefaultencoding() seemed to help. The text looked a bit better - although not entirely perfect - but I think that's because
thefont I was using didn't have the correct characters (they came up as square boxes).
And the font you *were* using is what? And the font you are now using is what? What facilities do you have to use different fonts?
The font I was using was bitstream vera sans. The font I'm now using
is verdana. What is the essential difference between
send(u_data.encode('polish'))
and
sys.setappdefaultencoding('polish') ... send(u_data) Not sure - I'm new to character encoding, and most of this seems
likeblack magic to me.
The essential difference is that setting a default encoding is a daft idea.
Because it acheives nothing more than what I can do with
..encode('that_encoding')? [1]: Now that's *TWO* contenders for TautologyOTW :-)
Before I retract that back to one contender, I'll give it one more shot:
Aaah, there's nothing better than a bit of cheerful snarkiness on a
newsgroup.
1. Your data: you say it is Polish text, and is utf-8. This implies that it is in Unicode, encoded as utf-8. What evidence do you have?
See above.
Have you been able to display it anywhere so that it "looks good"?
Yes. What I am doing here is a theme for a superkaramba widget (see http://netdragon.sourceforge.net). It displays fine everywhere else on
the widget, it's just in the kdialog boxes that it doesn't display
correctly.
If it's not confidential, can you show us a dump of the first say 100 bytes of text, in an unambiguous form, like this:
Can't do it now, because I'm at work. I can do it when I get home
tonight. print repr(open('polish.text', 'rb').read(100))
2. Your script: You say "I then manipulate the data to break it down into text snippets" - uh-huh ... *what* manipulations? Care to tell us? Care to show us the code?
Manipulation is simply breaking the text down into dictionary pairs.
It is basically a translation file for my widget, with English text,
and a corresponding Posish text. I use the re module to parse the
file, and create dictionary pairs between the English text, and the
corresponding Polish text. 3. kdialog: I know nothing of KDE and its toolkit. I would expect either (a) it should take utf-8 and be able to display *any* of the first 64K (nominal) Unicode characters, given a Unicode font or (b) you can encode your data in a legacy charset, *AND* tell it what that charset is, and have a corresponding font or (c) you have both options. Which is correct, and what are the details of how you can tell kdialog what to do -- configuration? command-line arguments?
That's what I was hoping someone here might be able to tell me. Having
searched on line, I cannot find any information about kdialog and
encoding. I have left a message on the relevant kde mailing list, but
have had no response. The command line options are found with kdialog
--help, but as you don't have kde, it will be difficult for you to look
at those. Having examined them at length, there is no option for
encoding. HTHYTHYS,
John
Thanks for your help and interest.
Matt This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Michael Weir |
last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but
I am having no fun at all trying to write utf-8 strings to a unicode file.
Does anyone have a couple of lines of code...
|
by: Bill Eldridge |
last post by:
I'm trying to grab a document off the Web and toss it
into a MySQL database, but I keep running into the
various encoding problems with Unicode (that aren't
a problem for me with GB2312, BIG 5,...
|
by: Cousin Stanley |
last post by:
I saw a reference yesterday while reading a Linux news group
to using kdialog in a bash shell script, so I decided to
try it from Python ....
import os
pipe_in = os.popen( "kdialog...
|
by: Francis Girard |
last post by:
Hi,
For the first time in my programmer life, I have to take care of character
encoding. I have a question about the BOM marks.
If I understand well, into the UTF-8 unicode binary...
|
by: Zenobia |
last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice
features such as:
* rewrite source code
* check syntax
* global search & replace (through several files at...
|
by: webdev |
last post by:
lo all,
some of the questions i'll ask below have most certainly been discussed
already, i just hope someone's kind enough to answer them again to help
me out..
so i started a python 2.3...
|
by: Neil Schemenauer |
last post by:
python-dev@python.org.]
The PEP has been rewritten based on a suggestion by Guido to change
str() rather than adding a new built-in function. Based on my
testing, I believe the idea is...
|
by: Nikolay Petrov |
last post by:
How can I convert DOS cyrillic text to Unicode
|
by: ChaosKCW |
last post by:
Hi
I am reading from an oracle database using cx_Oracle. I am writing to a
SQLite database using apsw.
The oracle database is returning utf-8 characters for euopean item
names, ie special...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |