Paul Prescod <paul@prescod.net> wrote in message news:<mailman.193.1077530419.27104.python-list@python.org>...[color=blue]
> Fuzzyman wrote:[color=green]
> > Sorry if my terminology is wrong..... but I'm having intermittent
> > problems dealing with accented characters in python. (Only from the 8
> > bit latin-1 character set I think..)[/color]
>
> I would say that if you get a 100% failure rate in IDLE and a 100%
> success rate from a console program then your problem is not
> intermittent but environment specific.[/color]
If that was the case then I'm sure you'd be right... good not to
quibble about terminology eh ;-)
(in a few other test cases the success-fail pattern was the opposite
way round)
[color=blue]
>[color=green]
> > For example - if I run my program from IDLE and give it the word
> > 'degri' (containing e-acute) then I get the error :[/color]
>
> What do you mean "give it the word". Through raw_input()? Through a file?
>[/color]
Right - it is fetching the words from a Tkinter entry box using the
get() method.
[color=blue]
> However you are getting this information, it seems to me that in IDLE
> you are getting a Unicode object rather than an 8-bit string object.
> Convert it to an 8-bit string:
>
> mydata.encode("latin-1")[/color]
Great - that might do the job.
I'll try it.
Thanks.
[color=blue]
>[color=green]
> > if letter in self.valid_letters:
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
> > 26: ordinal not in range(128)[/color]
>
> Something looks suspicious here. I wouldn't expect self.valid_letters to
> have a 0x83 character in it because I would expect it to be hard-coded
> to ASCII in your program like:
>[/color]
Self.valid_letters *in fact* is string.lowercase - which I thought
included the 8 bit latin-1 letters as well. (the letters are converted
to lowercase by using the .lower() string method )
[color=blue]
> valid_letters = "abcdefghijklmnopqrstuvwxyzABCDEF..."
>
> On the other hand I wouldn't expect "letter" to have more than one
> character so how could it have a problem at position 26?
>[/color]
I'm iterating over the string.
[color=blue][color=green]
> > What I'd like to do is switch by default to an 8 bit codec (latin-1 I
> > think ?????) and then offer the user the choice of either mapping the
> > accented characters to their nearest equivalent (e-acute to e for
> > example) *or* treating them as seperate characters.............[/color]
>
> Why change the default codec rather than explicitly using the codec you
> care about? If you want to work in the 8-bit world rather than the
> Unicode world, just use the "encode" function on the Unicode object. If
> you want to work in the Unicode world.
>[/color]
Great - sounds good.
[color=blue][color=green]
> > I can't work out how to change the default codec (no matter what the
> > locale) ?[/color]
>
> I'd advise against fixing the problem in that way. Convert data
> appropriately when you bring it from the outside world into the Python
> program and ignore the default codec.
>
> Paul Prescod[/color]
Thanks for your help.
Fuzzyman
http://www.voidspace.org.uk/atlantib...thonutils.html