On Nov 28, 8:45 am, kyoso...@gmail.com wrote:
On Nov 27, 3:35 pm, Martin Landa <landa.mar...@gmail.comwrote:
Hi all,
sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.
s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_S OA"
I would like to convert this string to plain ascii (using some lookup
table for latin2)
to get
-Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
Thanks for any hits! Regards, Martin Landa
With a little googling, I found this:
http://www.peterbe.com/plog/unicode-to-ascii
and if the OP has the patience to read *ALL* the comments on that blog
entry, he will find that comment[-2] points to
http://effbot.python-hosting.com/fil...xt/unaccent.py
and comment[-1] (from the blog owner) is "Brilliant! Thank you."
The bottom line is that there is no universal easy solution; you need
to handcraft a translation table suited to your particular purpose
(e.g. do you want u-with-umlaut to become u or ue?). The
unicodedata.normalize function is useful for off-line preparation of a
set of candidate mappings for that table; it should not be applied
either on-line or blindly.
Cheers,
John