Sorry if my terminology is wrong..... but I'm having intermittent
problems dealing with accented characters in python. (Only from the 8
bit latin-1 character set I think..)
I've written an anagram finder that produces anagrams from a
dictionary of words. The user can load their own dictionary.
( http://www.voidspace.org.uk/atlantibots/nanagram.html )
It's particularly difficult for me to understand what is happening -
because python's behaviour *seems* intermittent.
For example - if I run my program from IDLE and give it the word
'degré' (containing e-acute) then I get the error :
Exception in Tkinter callback
Traceback (most recent call last):
[snip..]
File "D:\Python Projects\Nanagram1.3\Nanagram-GUI.pyw", line 123, in
prepare
if letter in self.valid_letters:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
26: ordinal not in range(128)
Traceback (most recent call last):
It is testing each character of the users input to remove invalid
characters (like "-" and "'")... It crashes when it comes tot he
e-acute.
*However* - If I run it by double clicking on the file then it appears
to work fine (e.g. if I ask it find anagrams of 'degré hello ma' then
it strips out the e-acute (thinking it's an invalid character) and
finds anagrams of the rest :
gleam holder
hallo merged
What I'd like to do is switch by default to an 8 bit codec (latin-1 I
think ?????) and then offer the user the choice of either mapping the
accented characters to their nearest equivalent (e-acute to e for
example) *or* treating them as seperate characters.............
I can't work out how to change the default codec (no matter what the
locale) ?
Anyone able to help - or point me to a useful resource ?? (I've tried
google - b4 u suggest it )
Fuzzy