Ok, so Im a newb python programmer and I'm trying to create a simple python web-application. The program is simply going to read in pairs of words, parse them into a dictionary file, then randomly display the key and prompt the user for the correct answer. Basically, its a digital flash card system with a modular "dictionary" file.
The problem is this: I'm trying to create this program to help me study foregin languages (specifically Japanese at the moment) and when I save the txt file which houses the word pairs, it is automatically encoded into UTF. However, when getting user input, the input it natively sent to the program in Shift-JIS encoding. I downloaded CJKcodecs for python to encode a string into any number of Japanese codings, however the problem is I don't know how to "decode" the UTF and then recode it into Shift-JIS so that I can compare the dictionary values with the input values. OR, I could convert the input from Shift-JIS to UTF, but either way I don't know how to decode any of the codecs. I'm sure theres just some simple function call, but I have been unable to find it.
Anyhelp would be appreciated!
Thanks =) 3 6994
Antioch: however the problem is I don't know how to "decode" the UTF and then recode it into Shift-JIS so that I can compare the dictionary values with the input values.
I don't have a Shift-JIS codec installed so this breaks but should work
if you have the codec installed:
y = '\xe3\x81\x8b\xe3\x82\x8f\xe3\x81\x95\xe3\x81\x8d'
print y
print repr(y)
u = unicode(y, "utf-8")
print repr(u)
s = u.encode("shift-jis")
print s
Neil
Antioch wrote: Ok, so Im a newb python programmer and I'm trying to create a simple python web-application. The program is simply going to read in pairs of words, parse them into a dictionary file, then randomly display the key and prompt the user for the correct answer. Basically, its a digital flash card system with a modular "dictionary" file.
The problem is this: I'm trying to create this program to help me study foregin languages (specifically Japanese at the moment) and when I save the txt file which houses the word pairs, it is automatically encoded into UTF. However, when getting user input, the input it natively sent to the program in Shift-JIS encoding. I downloaded CJKcodecs for python to encode a string into any number of Japanese codings, however the problem is I don't know how to "decode" the UTF and then recode it into Shift-JIS so that I can compare the dictionary values with the input values. OR, I could convert the input from Shift-JIS to UTF, but either way I don't know how to decode any of the codecs. I'm sure theres just some simple function call, but I have been unable to find it.
Anyhelp would be appreciated! Thanks =)
If s is a string encoded in UTF-8, converting it in Shift-JIS will be something
like:
s2 = unicode(s, 'utf-8').encode('shift-jis')
For the reverse:
s = unicode(s2, 'shift-jis').encode('utf-8')
You have to make sure s contains only valid japanese characters or the encoding
/ decoding to / from Shift-JIS will fail and you'll get a ValueError exception.
For further details, see the unicode function @ http://www.python.org/doc/current/li...cs.html#l2h-71 , the decode
and encode methods on strings @ http://www.python.org/doc/current/li...g-methods.html and the codecs module
@ http://www.python.org/doc/current/li...le-codecs.html
HTH
--
- Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com
Antioch <se************@SPAMTRAP.hotmail.com> wrote: However, when getting user input, the input it natively sent to the program in Shift-JIS encoding.
You can change this by setting the encoding of the web page containing
the <form>, either by having the server send an HTTP response header
'Content-Type: text/html;charset=utf-8', or by including a
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
element inside the <head>.
If you don't set the charset like this, the browser will have to guess
what encoding to use, which in the absence of any properly-encoded
Japanese text on the form page could be anything, and may default to
different encodings dependent on the user's locale.
[Detour:]
The browser will send the form submission in the specified encoding,
*unless* the user deliberately goes to the browser's encodings menu
and selects a different one. Unlikely, but possible. The 'proper' way
around this is to write <form accept-charset="utf-8">, which should mean
that the browser should send the submission as UTF-8 regardless of the
encoding of the page containing the form. Unfortunately Internet
Explorer on Windows is broken and stupid, and prefers to use this as
a 'backup' encoding: it will use the current page's encoding for fields
which can be encoded in that, and the accept-charset encoding on
fields that contain characters that can't be encoded in the current
page's charset. Thus you can get a mixture of encodings with absolutely
no way to determine which is which.
The IE-compatible but utterly hideous workaround is to avoid
accept-charset and include a hidden field with name '_charset_' in the
form. IE will fill it in with the currently selected encoding when the
form is submitted. Whether it is worth doing this is debateable.
[end detour.]
the problem is I don't know how to "decode" the UTF and then recode it into Shift-JIS so that I can compare the dictionary values
I'd definitely recommend storing the dictionary values as Unicode strings
rather than trying to compare encoded versions.
I don't know how to decode any of the codecs. I'm sure theres just some simple function call
Yep:
characterString= unicode(jisString, 'shift_jis')
utfString= characterString.encode('utf-8')
--
Andrew Clover
mailto:an*@doxdesk.com http://www.doxdesk.com/ This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Basil |
last post by:
Hello.
I have compiler BC Builder 6.0.
I have an example:
#include <strstrea.h>
int main () {
wchar_t ff = {' s','d ', 'f', 'g', 't'};
|
by: Richard |
last post by:
Level: Java newbie, C experienced
Platform: Linux and Win32, Intel
Another programmer and I are working on a small project together.
He's writing a server process in Java that accepts input...
|
by: Christopher Ireland |
last post by:
Hi -
It's funny ... this works fine:
private void button2_Click(object sender, System.EventArgs e) {
string s = "\u0075";
char c = Convert.ToChar(s);
label1.Text = c.ToString();
}
|
by: kurotsuke |
last post by:
I need to convert a sequence of keys presses on the keyboard into the
corresponding character code (UNICODE).
I'm intercepting the KeyUp event (using an external hooking library)
and need to get...
|
by: John Salerno |
last post by:
Ok, for those who have gotten as far as level 2 (don't laugh!), I have a
question. I did the translation as such:
import string
alphabet = string.lowercase
code = string.lowercase + 'ab'...
|
by: Thomas Ploch |
last post by:
Hello fellow pythonists,
I have a question concerning posting code on this list.
I want to post source code of a module, which is a homework for
university (yes yes, I know, please read...
|
by: Christopher Layne |
last post by:
So I recently ran into a situation where I invoked UB without specifically
knowing I did it. Yes, human, I know.
What exactly is/was the rationale for not allowing shifts to be the same width
of...
|
by: Patrick |
last post by:
Hi
I have a basic question concerning rotations and bitmasking.
Assume the following code fragement.
uint32 p_lo = { 0x00, 0x00};
for (j = 0; j < 64; j++ )
|
by: pitjpz |
last post by:
We have moved our Database to another server. The server it was on used SQL 4 and the new one its on now uses SQL5
the only problem we can find is that when you attempt to delete a record from...
|
by: matthewroth |
last post by:
I have searched high and low and am stumped on this. The below code is a checklist form for work. there are 2 shifts 7 days a week and it displays the checklist items for each shift. something i did...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |