help wanted regarding displaying Japanese characters in a GUI using QT and python

prats

I want to write a GUI application in PYTHON using QT. This application
is supposed to take in Japanese characters. I am using PyQt as the
wrapper for using QT from python. I am able to take input in japanese.
But I am unable to display them back to GUI. It displays some junk
characters Can anyone suggest me some way how to debug the issue.

The code used for tranferring data from view to document is:

"
codec = QTextCodec.codecForName('ISO-2022-JP')
encoded_string = codec.fromUnicode( string )
return str(encoded_string)
"

here string is QString object containing the data from the view.
I think the encoded_string is a QCString object and contains the
unicode coded characters of the japanese string given in the GUI?

how am I going to display the data back to the view from document.

I would be really grateful if somebody helps me in this regard.

Regards,
Pratik

Apr 19 '06 #1

Subscribe Post Reply

4692

David Boddie

[Posting via Google's web interface again and hoping that double
newlines will prevent insane concatenation of lines...]
prats wrote:

I want to write a GUI application in PYTHON using QT. This application
is supposed to take in Japanese characters. I am using PyQt as the
wrapper for using QT from python. I am able to take input in japanese.
But I am unable to display them back to GUI. It displays some junk
characters Can anyone suggest me some way how to debug the issue.
The code used for tranferring data from view to document is:
"
codec = QTextCodec.codecForName('ISO-2022-JP')
encoded_string = codec.fromUnicode( string )
return str(encoded_string)
"
here string is QString object containing the data from the view.
I think the encoded_string is a QCString object and contains the
unicode coded characters of the japanese string given in the GUI?

Actually, it contains the original text in the ISO-2022-JP encoding and
not a unicode representation. You're just storing an anonymous sequence
of characters in your encoded_string variable which you then return.
Any
user interface element that receives these later on has to guess which
encoding is used to represent the text, and it sounds like it can't do
that.

how am I going to display the data back to the view from document.

If you're using the text in the GUI, you shouldn't need to pass it
through the codec at all. It should be possible to display the original
string in any widget that can display text. Keep the text in a QString
and
it should just work.
David

Apr 19 '06 #2

prats

No I need to replace the text given by the user in the GUI by a new
text already in ISO-2022-JP encoding. Then I would have to redisplay
this new text. I explain in detail. I have a text file(say) which has
something written in it using base64 encoding and using charset
ISO-2022-JP. I want to display this data in the GUI.
What I did was first to read in the text and then decode it using
'decodestring' function of base64 module in python.
"
import base64
decoded_string = base64.decodestring(encoded_string)
"
here the encoded string is the text that was read from the file.

How do I display this decoded_string to the GUI?

~pratik

Apr 20 '06 #3

prats

Hi all,
this is in continuation to my previous post.
The text I want to display is (in base64 encoding):
"
SW4gdGhpcyBzYW1wbGUsIGUtbWFpbCB0aXRsZSBhbmQgdGV4dC BhcmUgd3JpdHRlbiBpbiBKYXBh
bmVzZS4gDQpPdXIgbGFuZ3VhZ2UgaGFzIHRocmVlIHR5cGVzIG NhbGxlZCCBZ0thdGFrYW5hgWgs
IIFnSGlyYWdhbmGBaCBhbmQgDQqBZ0thbmppgWguIA0KVGhpcy BlLW1haWwgY29udGFpbnMgYWxs
IHRoZSB0eXBlcy4gDQqDQ4NOg1aDQYLNgmiCd4Jrgm6CYIJjgl KBRIJQgk+CyYLEg4GBW4OLi0CU
XILwIA0Kk/qWe4zqkc6JnoKzgrmC6YLngrWCooLFgreC5oFCIA0KgqCCooKk gqaCqIFBg0GDQ4NF
g0eDSSANCoKpgquCrYKvgrGBQYNKg0yDToNQg1IgDQqCs4K1gr eCuYK7gUGDVINWg1iDWoNcIA0K
gr2Cv4LCgsSCxoFBg16DYINjg2WDZyANCoLIgsmCyoLLgsyBQY Npg2qDa4Nsg20gDQqCzYLQgtOC
1oLZgUGDboNxg3SDd4N6IA0KgtyC3YLegt+C4IFBg32DfoOAg4 GDgiANCoLiguSC5oFBg4aDhoOI
IA0KgueC6ILpguqC64FBg4mDioOLg4yDjSANCoLtgvCC8YFBg4 +DSYOTIA0KgmCCYYJigmOCZIJl
gmaCZ4JogmmCaoJrgmyCbYJugm+CcIJxgnKCc4J0gnWCdoJ3gn iCeSANCoKBgoKCg4KEgoWChoKH
goiCiYKKgouCjIKNgo6Cj4KQgpGCkoKTgpSClYKWgpeCmIKZgp ogDQqBQoFBgWmBaoGBgXuBW4GW
gY+BdYF2gUmBlIGQgZOBlYFggYSBg4GbgX6BooGggZmB9CANCp OMi56Tc49hkkqL5pHjgViW2IJS
gXyCUYJUgXyCUiANCoKggqKCqIKikbmV25BWj2iDcoOLglCCVY pLIA0Kg0ODToNWg0GKlI6uie+O
0CANCg==
"

This text contains both english and japanese characters i.e first few
english characters followed by some japanese characters.

the decoded_string variable contains the first few english characters
and then all junk characters, which I guess are the "ISO-2022-JP"
encoded characters. but how do I get back those japanese characters in
a format so that they get properly displayed in th GUI. Do I need to
change any system settings for that purpose. I am using windows XP and
I have Japanese fonts installed in my PC. I have also set the default
font as japanese.

Please help me in this regard.
~pratik

Apr 20 '06 #4

Serge Orlov

prats wrote:

Hi all,
this is in continuation to my previous post.
The text I want to display is (in base64 encoding): This text contains both english and japanese characters i.e first few
english characters followed by some japanese characters.

the decoded_string variable contains the first few english characters
and then all junk characters, which I guess are the "ISO-2022-JP"
encoded characters.
You guess is wrong. Save you data in a file

"
import base64
bytes = base64.decodestring(encoded_string)
f = open("jp.txt","wb")
f.write(bytes)
f.close()
"
start Firefox, set View->Encoding->Auto-detect->Japanese and open
jp.txt. Now open menu View->Encoding and see that you data is encoded
in shift-jis encoding. To work with non-ascii character you need to
convert your text to unicode:

text = bytes.decode("shift-jis")

That's it. As David already said, you need to keep your text in
unicode.
Do I need to change any system settings for that purpose.
I am using windows XP and I have Japanese fonts installed
in my PC. I have also set the default font as japanese.

AFAIK you _only_ need to turn on "Install files for Asian languages" in
regional settings. You don't need to mess with default font. The
following code works perfectly in IDLE on windows xp english edition:

"
import base64
bytes = base64.decodestring(encoded_string)
print bytes.decode("shift-jis")
"

Apr 20 '06 #5

prats

I think I could not make myself clear. I have a GUI written in Python
and Qt and PyQt as the python wrappper fro QT. Now I have a string
which is base64 encoded. This string contains both japanese and english
charaters. I need to decode them and display them properly in the GUI
ie. with both english and japanese characters.
I need a way to display them. Qt doc says that QStrings are capable of
displaying all characters. So I need a way to get a QString from the
base64 encoded string.
~pratik

Apr 20 '06 #6

Serge Orlov

prats wrote:

I think I could not make myself clear.
On the contrary. You've given enough information for me to do what you
want: decoding your text and displaying it in a GUI. The fact that I
used another GUI is not important, read below why.
I have a GUI written in Python
and Qt and PyQt as the python wrappper fro QT. Now I have a string
which is base64 encoded. This string contains both japanese and english
charaters. I need to decode them and display them properly in the GUI
ie. with both english and japanese characters. I need a way to display them. Qt doc says that QStrings are capable of
displaying all characters.
(nitpick: not displaying but holding) And so is capable Python unicode
string. It was introduced more than 5 years ago if my memory serves me
right. It is the recommeded way to hold non-ascii characters in Python
and all toolkits are expected to play nice with it. I would be really
surprised if PyQt doesn't work with it.
So I need a way to get a QString from the
base64 encoded string.

Why don't you try to use unicode?

Apr 20 '06 #7

prats

sorry I did not correctly read your point. I works fine. Thanks for
your help.
I have one more query. It was said that the text I was supposed to show
was written using "ISO-2022-JP" charset. But It didn't when I decoded
it using that charset. But it worked fine with the "shift-jis"
encoding. Is it the default charset used by python i.e. I mean to say
bytes would be by default "shift-jis"?
~pratik

Apr 20 '06 #8

John Machin

On 20/04/2006 8:15 PM, prats wrote:

sorry I did not correctly read your point. I works fine. Thanks for
your help.
I have one more query. It was said that the text I was supposed to show
was written using "ISO-2022-JP" charset.
Where more than one encoding is in use for a language, some people just
guess. I've seen this with ASCII/EBCDIC and GB[K]/Big5.
But It didn't when I decoded
it using that charset. But it worked fine with the "shift-jis"
encoding. Is it the default charset used by python i.e. I mean to say
bytes would be by default "shift-jis"?

That may be Ruby's default, although I doubt it. Python was originally
written in Old High Dutch, but PEP 0.0001 did away with the ij ligature
so that Python could be expressed in ASCII, which has been the default
encoding ever since.

Apr 20 '06 #9

Serge Orlov

prats wrote:

sorry I did not correctly read your point. I works fine. Thanks for
your help.
I have one more query. It was said that the text I was supposed to show
was written using "ISO-2022-JP" charset. But It didn't when I decoded
it using that charset. But it worked fine with the "shift-jis"
encoding. Is it the default charset used by python i.e. I mean to say
bytes would be by default "shift-jis"?

No, the default charset in python is ascii. There is no absolutely
reliable way to find out the encoding of arbitrary bytes. But if you
have more than ten bytes and you know some properties of the text (like
you're sure your text contains only English and Japanese) then the
first thing you can do is to rule out invalid encodings:

def valid_en_jp_encodings(bytes):
try:
bytes.decode("ascii")
return ["ascii"]
except UnicodeDecodeError:
pass
encodings = "utf-8", "shift-jis", "iso-2022-jp", "euc-jp"
valid = []
for encoding in encodings:
try:
bytes.decode(encoding)
valid.append(encoding)
except UnicodeDecodeError:
pass
return valid

If this function returns a list with only one item you're lucky. If it
returns more than one item things are getting more complicated. You can
try to use http://chardet.feedparser.org/ to guess encoding or you can
present list of valid encodings to the user and let him/her make a
choice. There is also possibility that this function returns an empty
list, you will need to display a error message in such case.

Apr 20 '06 #10

David Boddie

Out of interest, I've written some code to show your example text and
added it to the
PyQt Wiki:
http://www.diotavelli.net/PyQtWiki/D..._Japanese_Text
I used the codec for Shift-JIS to obtain a unicode representation of
the string, as
Serge suggested.
David

Apr 20 '06 #11

prats

The text I got was from a outlook message. The snippet of the mail
message is:
"
Content-Type: text/plain;
charset="iso-2022-jp"
Content-Transfer-Encoding: base64

SW4gdGhpcyBzYW1wbGUsIGUtbWFpbCB0aXRsZSBhbmQgdGV4dC BhcmUgd3JpdHRlbiBpbiBKYXBh
bmVzZS4gDQpPdXIgbGFuZ3VhZ2UgaGFzIHRocmVlIHR5cGVzIG NhbGxlZCCBZ0thdGFrYW5hgWgs
IIFnSGlyYWdhbmGBaCBhbmQgDQqBZ0thbmppgWguIA0KVGhpcy BlLW1haWwgY29udGFpbnMgYWxs
IHRoZSB0eXBlcy4gDQqDQ4NOg1aDQYLNgmiCd4Jrgm6CYIJjgl KBRIJQgk+CyYLEg4GBW4OLi0CU
XILwIA0Kk/qWe4zqkc6JnoKzgrmC6YLngrWCooLFgreC5oFCIA0KgqCCooKk gqaCqIFBg0GDQ4NF
g0eDSSANCoKpgquCrYKvgrGBQYNKg0yDToNQg1IgDQqCs4K1gr eCuYK7gUGDVINWg1iDWoNcIA0K
gr2Cv4LCgsSCxoFBg16DYINjg2WDZyANCoLIgsmCyoLLgsyBQY Npg2qDa4Nsg20gDQqCzYLQgtOC
1oLZgUGDboNxg3SDd4N6IA0KgtyC3YLegt+C4IFBg32DfoOAg4 GDgiANCoLiguSC5oFBg4aDhoOI
IA0KgueC6ILpguqC64FBg4mDioOLg4yDjSANCoLtgvCC8YFBg4 +DSYOTIA0KgmCCYYJigmOCZIJl
gmaCZ4JogmmCaoJrgmyCbYJugm+CcIJxgnKCc4J0gnWCdoJ3gn iCeSANCoKBgoKCg4KEgoWChoKH
goiCiYKKgouCjIKNgo6Cj4KQgpGCkoKTgpSClYKWgpeCmIKZgp ogDQqBQoFBgWmBaoGBgXuBW4GW
gY+BdYF2gUmBlIGQgZOBlYFggYSBg4GbgX6BooGggZmB9CANCp OMi56Tc49hkkqL5pHjgViW2IJS
gXyCUYJUgXyCUiANCoKggqKCqIKikbmV25BWj2iDcoOLglCCVY pLIA0Kg0ODToNWg0GKlI6uie+O
0CANCg==
"
Outlook could properly show the message. This means the encoding is
"iso-2022-jp". or else outlook couldnot have decoded it? I am unable to
explain this behaviour.

~Pratik

Apr 20 '06 #12

by: Jim E. | last post by:

Using VC++ on an application for English Win 95/98 thru XP, how can I display multi-byte characters (Asian languages or roman characters with accent marks) in standard MFC controls like CEdit,...

C / C++

displaying japanese text on English OS

by: Benoit Martin | last post by:

in my windows app, I have some japanese text that I load from a text file and display on a label. No matter what type of encoding I try to use on the text file, the text always comes up as a bunch...

.NET Framework

HELP! HELP! HELP

by: Mitchell Thomas | last post by:

I hope someone out there can solve my mysterious problem. I have tried everything imaginable, even paid $35 to Microsoft to help me, but they were not able to figure out this problem: Here is the...

Microsoft Access / VBA

anybody help me

by: Rahul | last post by:

Hi Everybody I have some problem in my script. please help me. This is script file. I have one *.inq file. I want run this script in XML files. But this script errors shows . If u want i am...

Python

Unicode, encodings, and asian languages: need some help.

by: apprentice | last post by:

Hello, I'm writing an class library that I imagine people from different countries might be interested in using, so I'm considering what needs to be provided to support foreign languages,...

.NET Framework

VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help

by: gunimpi | last post by:

http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted...

Microsoft Access / VBA

Python, Dutch, English, Chinese, Japanese, etc.

by: Steve Howell | last post by:

The never-ending debate about PEP 3131 got me thinking about natural languages with respect to Python, and I have a bunch of mostly simple observations (some factual, some anecdotal). I present...

Python

japanese encoding iso-2022-jp in python vs. perl

by: kettle | last post by:

Hi, I am rather new to python, and am currently struggling with some encoding issues. I have some utf-8-encoded text which I need to encode as iso-2022-jp before sending it out to the world. I am...

Python

Displaying Chinese and Japanese characters on Swing components.

by: vaskarbasak | last post by:

Hi, I'm having problems displaying Chinese and Japanese characters on Swing components. I know some conversion should be done. Do you have some source code sample or any idea ? Thanks! vaskar

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

help wanted regarding displaying Japanese characters in a GUI using QT and python

Similar topics