By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,428 Members | 1,340 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,428 IT Pros & Developers. It's quick & easy.

string conversion latin2 to ascii

P: n/a
Hi all,

sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.

s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_S OA"

I would like to convert this string to plain ascii (using some lookup
table for latin2)

to get

-Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA

Thanks for any hits! Regards, Martin Landa
Nov 27 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
On Nov 27, 3:35 pm, Martin Landa <landa.mar...@gmail.comwrote:
Hi all,

sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.

s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_S OA"

I would like to convert this string to plain ascii (using some lookup
table for latin2)

to get

-Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA

Thanks for any hits! Regards, Martin Landa
With a little googling, I found this:

http://www.peterbe.com/plog/unicode-to-ascii

You might also find this article useful:

http://www.reportlab.com/i18n/python..._tutorial.html

Mike
Nov 27 '07 #2

P: n/a
sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.

s = "Ukázka_moĹľnosti_vyuĹľitĂ*_programu_OpenJUMP_v_S OA"
That's not a Unicode string (at least in Python 2); it is
a latin-2 encoded byte string; it has nothing to do with Unicode.
I would like to convert this string to plain ascii (using some lookup
table for latin2)

to get

-Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
I recommend to use string.translate. You need a translation
table there, which is best generated with string.maketrans.

table=string.maketrans("ážĂ*","azi")
print s.translate(table)

HTH,
Martin
Nov 27 '07 #3

P: n/a
On Nov 28, 8:45 am, kyoso...@gmail.com wrote:
On Nov 27, 3:35 pm, Martin Landa <landa.mar...@gmail.comwrote:
Hi all,
sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.
s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_S OA"
I would like to convert this string to plain ascii (using some lookup
table for latin2)
to get
-Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
Thanks for any hits! Regards, Martin Landa

With a little googling, I found this:

http://www.peterbe.com/plog/unicode-to-ascii
and if the OP has the patience to read *ALL* the comments on that blog
entry, he will find that comment[-2] points to

http://effbot.python-hosting.com/fil...xt/unaccent.py

and comment[-1] (from the blog owner) is "Brilliant! Thank you."

The bottom line is that there is no universal easy solution; you need
to handcraft a translation table suited to your particular purpose
(e.g. do you want u-with-umlaut to become u or ue?). The
unicodedata.normalize function is useful for off-line preparation of a
set of candidate mappings for that table; it should not be applied
either on-line or blindly.

Cheers,
John
Nov 27 '07 #4

P: n/a
* Martin Landa <la**********@gmail.com>, 2007-11-27:
I have unicode string (or better say latin2 encoding) containing
non-ascii characters, e.g.

s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_S OA"

I would like to convert this string to plain ascii (using some lookup
table for latin2)

to get

-Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
You may try python-elinks
<http://freshmeat.net/projects/python-elinks/>:

>>import elinks
print "Ukázka_mo\236nosti_vyu\236ití_programu_OpenJUMP_v _SOA".decode('Windows-1250').encode('ASCII', 'elinks')
Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
--
Jakub Wilk
Nov 28 '07 #5

P: n/a
On Nov 27, 5:08 pm, John Machin <sjmac...@lexicon.netwrote:
On Nov 28, 8:45 am, kyoso...@gmail.com wrote:


On Nov 27, 3:35 pm, Martin Landa <landa.mar...@gmail.comwrote:
Hi all,
sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.
s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_S OA"
I would like to convert this string to plain ascii (using some lookup
table for latin2)
to get
-Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
Thanks for any hits! Regards, Martin Landa
With a little googling, I found this:
http://www.peterbe.com/plog/unicode-to-ascii

and if the OP has the patience to read *ALL* the comments on that blog
entry, he will find that comment[-2] points to

http://effbot.python-hosting.com/fil...xt/unaccent.py

and comment[-1] (from the blog owner) is "Brilliant! Thank you."

The bottom line is that there is no universal easy solution; you need
to handcraft a translation table suited to your particular purpose
(e.g. do you want u-with-umlaut to become u or ue?). The
unicodedata.normalize function is useful for off-line preparation of a
set of candidate mappings for that table; it should not be applied
either on-line or blindly.

Cheers,
John
Sorry...I didn't know about translation tables or I would have
mentioned that instead. My bad.

Mike
Nov 28 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.