473,324 Members | 2,581 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

kdialog and unicode

I'm trying to get python, unicode and kdialog to play nicely together.
This is a linux machine, and kdialog is a way to generate dialog boxes in
kde with which users can interact (for example input text), and you can
use the outputted text in your script.

Anyway, what I'm doing is reading from a utf-8 encoded text file using the
codecs module, and using the following:

data = codecs.open('file', 'r', 'utf-8')

I then manipulate the data to break it down into text snippets.

Then I run this command:
test = os.popen('kdialog --inputbox %s' %(data))

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017a' in
position 272: ordinal not in range(128)

I would really like kdialog display the text as utf-8. However, it seems
that python is trying to pass the utf-8 encoded data as ascii, which
obviously fails because it can't deal with the utf-8 encoded text. Is it
possible to pass the text out to kdialog as utf-8, rather than ascii?

Or have I completely misunderstood the whole process, in which case, can
you please enlighten me.

Matt
Jul 19 '05 #1
10 2373
Dumbkiwi wrote:
I'm trying to get python, unicode and kdialog to play nicely together.
This is a linux machine, and kdialog is a way to generate dialog boxes in
kde with which users can interact (for example input text), and you can
use the outputted text in your script.

Anyway, what I'm doing is reading from a utf-8 encoded text file using the
codecs module, and using the following:

data = codecs.open('file', 'r', 'utf-8')
data is now a unicode string.

I then manipulate the data to break it down into text snippets.

Then I run this command:
test = os.popen('kdialog --inputbox %s' %(data))

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017a' in
position 272: ordinal not in range(128)

I would really like kdialog display the text as utf-8. However, it seems
that python is trying to pass the utf-8 encoded data as ascii, which
obviously fails because it can't deal with the utf-8 encoded text. Is it
possible to pass the text out to kdialog as utf-8, rather than ascii?


Just encode the data in the target encoding before passing it to os.popen():

test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))

Peter

Jul 19 '05 #2
On Tue, 26 Apr 2005 11:41:01 +0200, Peter Otten wrote:
Dumbkiwi wrote:
I'm trying to get python, unicode and kdialog to play nicely together.
This is a linux machine, and kdialog is a way to generate dialog boxes
in kde with which users can interact (for example input text), and you
can use the outputted text in your script.

Anyway, what I'm doing is reading from a utf-8 encoded text file using
the codecs module, and using the following:

data = codecs.open('file', 'r', 'utf-8')


data is now a unicode string.

I then manipulate the data to break it down into text snippets.

Then I run this command:
> test = os.popen('kdialog --inputbox %s' %(data))

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017a' in
position 272: ordinal not in range(128)

I would really like kdialog display the text as utf-8. However, it
seems that python is trying to pass the utf-8 encoded data as ascii,
which obviously fails because it can't deal with the utf-8 encoded text.
Is it possible to pass the text out to kdialog as utf-8, rather than
ascii?


Just encode the data in the target encoding before passing it to
os.popen():

test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))

Peter


I had tried that, but then the text looks like crap. The text I'm using
for this is Polish, and there are a lot of non-English characters in
there. Using this method results in some strange characters - basically it
looks like a file encoded in utf-8, but displayed using iso-8859-1.

Is this the best I can do?

Thanks for your help.

Matt
Jul 19 '05 #3
Dumbkiwi wrote:
Just encode the data in the target encoding before passing it to
os.popen():

test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))
I had tried that, but then the text looks like crap. The text I'm using
for this is Polish, and there are a lot of non-English characters in
there. Using this method results in some strange characters - basically it
looks like a file encoded in utf-8, but displayed using iso-8859-1.

Is this the best I can do?


I've just tried the setup you described (with German umlauts instead of
Polish characters) on my Suse 9.1, and it works as expected with both
Python 2.3 and 2.4. Perhaps the target encoding you need is not UTF-8. I
would try other popular encodings used for Polish text (no idea what these
are). sys.stdout.encoding might give you a clue.

Peter
Jul 19 '05 #4
Peter Otten <__*******@web.de> wrote in message news:<d4*************@news.t-online.com>...
Dumbkiwi wrote:
Just encode the data in the target encoding before passing it to
os.popen():

test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))

I had tried that, but then the text looks like crap. The text I'm using
for this is Polish, and there are a lot of non-English characters in
there. Using this method results in some strange characters - basically it
looks like a file encoded in utf-8, but displayed using iso-8859-1.

Is this the best I can do?


I've just tried the setup you described (with German umlauts instead of
Polish characters) on my Suse 9.1, and it works as expected with both
Python 2.3 and 2.4. Perhaps the target encoding you need is not UTF-8. I
would try other popular encodings used for Polish text (no idea what these
are). sys.stdout.encoding might give you a clue.

Peter


Both sys.stdout.encoding and sys.stdin.encoding give:

ANSI_X3.4-1968

which is ascii (I think).

I'd be interested to see what your default encoding is, and why your
output was different.

Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

which I've used, and it's fixed the problem (I think).

Thanks for your help.

Matt
Jul 19 '05 #5
dumbkiwi wrote:
I'd be interested to see what your default encoding is,
ascii
and why your output was different.
If only I knew.
Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()


That is an alias for sys.setdefaultencoding() created by your IDE (Eric),
and therefore may not always be available.

Peter

Jul 19 '05 #6
On 26 Apr 2005 13:39:26 -0700, dm*****@gmail.com (dumbkiwi) wrote:
Peter Otten <__*******@web.de> wrote in message news:<d4*************@news.t-online.com>...
Dumbkiwi wrote:
>> Just encode the data in the target encoding before passing it to
>> os.popen():

Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

which I've used, and it's fixed the problem (I think).


Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in
the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.

In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be that
you didn't know what encoding to use.

What is the essential difference between

send(u_data.encode('polish'))

and

sys.setappdefaultencoding('polish')
...
send(u_data)

[1]: Now that's *TWO* contenders for TautologyOTW :-)

Cheers,

John

Jul 19 '05 #7

Peter Otten wrote:
dumbkiwi wrote:
I'd be interested to see what your default encoding is,
ascii
and why your output was different.


If only I knew.
Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()


That is an alias for sys.setdefaultencoding() created by your IDE

(Eric), and therefore may not always be available.

Peter


Hmmm. That's disappointing. I've also discovered that you can do:

import sys
reload(sys)

and then get access to sys.setdefaultencoding().

Will that get me into trouble?

Matt

Jul 19 '05 #8

John Machin wrote:
On 26 Apr 2005 13:39:26 -0700, dm*****@gmail.com (dumbkiwi) wrote:
Peter Otten <__*******@web.de> wrote in message news:<d4*************@news.t-online.com>...
Dumbkiwi wrote:

>> Just encode the data in the target encoding before passing it to >> os.popen():

Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

which I've used, and it's fixed the problem (I think).


Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in
the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.


Hmmm. See post above, seems to be something generated by eric3. So
this may not be the fix I'm looking for.

In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be that
you didn't know what encoding to use.
I knew what encoding to use, the problem was that the text was being
passed to kdialog as ascii. The .encode('utf-8') at least allows
kdialog to run, but the text still looks like crap. Using
sys.setappdefaultencoding() seemed to help. The text looked a bit
better - although not entirely perfect - but I think that's because the
font I was using didn't have the correct characters (they came up as
square boxes).
What is the essential difference between

send(u_data.encode('polish'))

and

sys.setappdefaultencoding('polish')
...
send(u_data)
Not sure - I'm new to character encoding, and most of this seems like
black magic to me.

[1]: Now that's *TWO* contenders for TautologyOTW :-)

Cheers,

John


Matt

Jul 19 '05 #9
On 26 Apr 2005 19:16:25 -0700, dm*****@gmail.com wrote:

John Machin wrote:
On 26 Apr 2005 13:39:26 -0700, dm*****@gmail.com (dumbkiwi) wrote:
>Peter Otten <__*******@web.de> wrote in messagenews:<d4*************@news.t-online.com>... >> Dumbkiwi wrote:
>>
>> >> Just encode the data in the target encoding before passing itto >> >> os.popen():
>
>Anyway, from your post, I've done some more digging, and found the
>command:
>
>sys.setappdefaultencoding()
>
>which I've used, and it's fixed the problem (I think).
>


Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in
the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.


Hmmm. See post above, seems to be something generated by eric3. So
this may not be the fix I'm looking for.

In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be that
you didn't know what encoding to use.


I knew what encoding to use,


Would you mind telling us (a) what that encoding is (b) how you came
to that knowledge (c) why you just didn't do

test = os.popen('kdialog --inputbox %s'
%(data.encode('that_encoding')))

instead of

test = os.popen('kdialog --inputbox %s' %(data.encode('utf-8')))
the problem was that the text was being
passed to kdialog as ascii.
It wasn't being passed to kdialog; there was an attempt which failed.
The .encode('utf-8') at least allows
kdialog to run, but the text still looks like crap. Using
sys.setappdefaultencoding() seemed to help. The text looked a bit
better - although not entirely perfect - but I think that's because the
font I was using didn't have the correct characters (they came up as
square boxes).
And the font you *were* using is what? And the font you are now using
is what? What facilities do you have to use different fonts?

What is the essential difference between

send(u_data.encode('polish'))

and

sys.setappdefaultencoding('polish')
...
send(u_data)


Not sure - I'm new to character encoding, and most of this seems like
black magic to me.


The essential difference is that setting a default encoding is a daft
idea.


[1]: Now that's *TWO* contenders for TautologyOTW :-)


Before I retract that back to one contender, I'll give it one more
shot:

1. Your data: you say it is Polish text, and is utf-8. This implies
that it is in Unicode, encoded as utf-8. What evidence do you have?
Have you been able to display it anywhere so that it "looks good"?
If it's not confidential, can you show us a dump of the first say 100
bytes of text, in an unambiguous form, like this:

print repr(open('polish.text', 'rb').read(100))

2. Your script: You say "I then manipulate the data to break it down
into text snippets" - uh-huh ... *what* manipulations? Care to tell
us? Care to show us the code?

3. kdialog: I know nothing of KDE and its toolkit. I would expect
either (a) it should take utf-8 and be able to display *any* of the
first 64K (nominal) Unicode characters, given a Unicode font or (b)
you can encode your data in a legacy charset, *AND* tell it what that
charset is, and have a corresponding font or (c) you have both
options. Which is correct, and what are the details of how you can
tell kdialog what to do -- configuration? command-line arguments?

HTHYTHYS,

John
Jul 19 '05 #10

John Machin wrote:
On 26 Apr 2005 19:16:25 -0700, dm*****@gmail.com wrote:

John Machin wrote:
On 26 Apr 2005 13:39:26 -0700, dm*****@gmail.com (dumbkiwi) wrote:

>Peter Otten <__*******@web.de> wrote in messagenews:<d4*************@news.t-online.com>...
>> Dumbkiwi wrote:
>>
>> >> Just encode the data in the target encoding before passing it
to
>> >> os.popen():

>
>Anyway, from your post, I've done some more digging, and found
the >command:
>
>sys.setappdefaultencoding()
>
>which I've used, and it's fixed the problem (I think).
>

Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.

Hmmm. See post above, seems to be something generated by eric3. So
this may not be the fix I'm looking for.

In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be
that you didn't know what encoding to use.


I knew what encoding to use,


Would you mind telling us (a) what that encoding is (b) how you came
to that knowledge (c) why you just didn't do


(a) utf-8
(b) I asked the author of the text, and it displays properly in other
parts of the script when not using kdialog. Is there a way to test it
otherwise - I presume that there is.

test = os.popen('kdialog --inputbox %s'
%(data.encode('that_encoding')))

instead of

test = os.popen('kdialog --inputbox %s' %(data.encode('utf-8')))
Because, "that_encoding" == "utf-8" (as far as I was aware).
the problem was that the text was being
passed to kdialog as ascii.
It wasn't being passed to kdialog; there was an attempt which failed.


Quite right.
The .encode('utf-8') at least allows
kdialog to run, but the text still looks like crap. Using
sys.setappdefaultencoding() seemed to help. The text looked a bit
better - although not entirely perfect - but I think that's because
thefont I was using didn't have the correct characters (they came up as
square boxes).


And the font you *were* using is what? And the font you are now using
is what? What facilities do you have to use different fonts?


The font I was using was bitstream vera sans. The font I'm now using
is verdana.

What is the essential difference between

send(u_data.encode('polish'))

and

sys.setappdefaultencoding('polish')
...
send(u_data)
Not sure - I'm new to character encoding, and most of this seems likeblack magic to me.


The essential difference is that setting a default encoding is a daft
idea.

Because it acheives nothing more than what I can do with
..encode('that_encoding')?

[1]: Now that's *TWO* contenders for TautologyOTW :-)

Before I retract that back to one contender, I'll give it one more
shot:

Aaah, there's nothing better than a bit of cheerful snarkiness on a
newsgroup.
1. Your data: you say it is Polish text, and is utf-8. This implies
that it is in Unicode, encoded as utf-8. What evidence do you have?
See above.
Have you been able to display it anywhere so that it "looks good"?
Yes. What I am doing here is a theme for a superkaramba widget (see
http://netdragon.sourceforge.net). It displays fine everywhere else on
the widget, it's just in the kdialog boxes that it doesn't display
correctly.
If it's not confidential, can you show us a dump of the first say 100
bytes of text, in an unambiguous form, like this:
Can't do it now, because I'm at work. I can do it when I get home
tonight.

print repr(open('polish.text', 'rb').read(100))

2. Your script: You say "I then manipulate the data to break it down
into text snippets" - uh-huh ... *what* manipulations? Care to tell
us? Care to show us the code?
Manipulation is simply breaking the text down into dictionary pairs.
It is basically a translation file for my widget, with English text,
and a corresponding Posish text. I use the re module to parse the
file, and create dictionary pairs between the English text, and the
corresponding Polish text.

3. kdialog: I know nothing of KDE and its toolkit. I would expect
either (a) it should take utf-8 and be able to display *any* of the
first 64K (nominal) Unicode characters, given a Unicode font or (b)
you can encode your data in a legacy charset, *AND* tell it what that
charset is, and have a corresponding font or (c) you have both
options. Which is correct, and what are the details of how you can
tell kdialog what to do -- configuration? command-line arguments?
That's what I was hoping someone here might be able to tell me. Having
searched on line, I cannot find any information about kdialog and
encoding. I have left a message on the relevant kde mailing list, but
have had no response. The command line options are found with kdialog
--help, but as you don't have kde, it will be difficult for you to look
at those. Having examined them at length, there is no option for
encoding.
HTHYTHYS,

John


Thanks for your help and interest.

Matt

Jul 19 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
8
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...
0
by: Cousin Stanley | last post by:
I saw a reference yesterday while reading a Linux news group to using kdialog in a bash shell script, so I decided to try it from Python .... import os pipe_in = os.popen( "kdialog...
8
by: Francis Girard | last post by:
Hi, For the first time in my programmer life, I have to take care of character encoding. I have a question about the BOM marks. If I understand well, into the UTF-8 unicode binary...
48
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is...
10
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
24
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.