Peter Bulychev wrote:
Hello.
I want to convert unicode character into ascii one.
The method ".encode('ASCII ') " can convert only those unicode
characters, which fit into 0..128 range.
But there are still lots of characters beyond this range, which can be
manually converted to some visibly similar ascii characters. For
instance, there are several quotation marks in unicode, which can be
converted into ascii quotation mark.
Please be more specific. There is no general solution. Unicode can
handle latin, cyrilic (russian), chinese, japanese and arabic characters
in the same string. There are thousands of possible non-ascii characters
and many of them are not similar to any ascii character.
If you only want this to work for a subset, please define that subset.
Laszlo 7 3970
Peter Bulychev wrote:
I want to convert unicode character into ascii one.
You have to make some arbitrary choices of what to translate. Based
on some materials on effbot's site, and a recipe, I made ftp://alan.smcvt.edu/hefferon/unicode2ascii.py
which has at least some of what you are looking for.
$ grep HYPHEN unicode2ascii.p y
u'\N{SOFT HYPHEN}':u'-',
u'\N{HYPHEN}':u '-',
u'\N{NON-BREAKING HYPHEN}':u'-',
u'\N{SOFT HYPHEN}': '-',
No doubt I have some terrible gaffes and some things missing.
Corrections appreciated.
Jim
Peter Bulychev wrote:
I want to convert unicode character into ascii one.
You have to make some arbitrary choices of what to translate. Based
on some materials on effbot's site, and a recipe, I made ftp://alan.smcvt.edu/hefferon/unicode2ascii.py
which has at least some of what you are looking for.
$ grep HYPHEN unicode2ascii.p y
u'\N{SOFT HYPHEN}':u'-',
u'\N{HYPHEN}':u '-',
u'\N{NON-BREAKING HYPHEN}':u'-',
u'\N{SOFT HYPHEN}': '-',
No doubt I have some terrible gaffes and some things missing.
Corrections appreciated.
Jim
On Jul 2, 9:55 am, Jim <jim.heffe...@g mail.comwrote:
Peter Bulychev wrote:
I want to convert unicode character into ascii one.
You have to make some arbitrary choices of what to translate. Based
on some materials on effbot's site, and a recipe, I made ftp://alan.smcvt.edu/hefferon/unicode2ascii.py
which has at least some of what you are looking for.
$ grep HYPHEN unicode2ascii.p y
u'\N{SOFT HYPHEN}':u'-',
u'\N{HYPHEN}':u '-',
u'\N{NON-BREAKING HYPHEN}':u'-',
u'\N{SOFT HYPHEN}': '-',
No doubt I have some terrible gaffes and some things missing.
Corrections appreciated.
Comments on the above grep output:
1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'
2. The idea of a soft hyphen is as a hint to a hyphenator about where
to insert a hyphen if one is necessary and the hyphenator is suspected
of acting cluelessly without the hint. IMHO, asciification should
substitute u'', not u'-'.
3. Read PEP 8. s/:/: /
Cheers,
John
On Jul 1, 8:29 pm, John Machin <sjmac...@lexic on.netwrote:
On Jul 2, 9:55 am, Jim <jim.heffe...@g mail.comwrote:
Comments on the above grep output:
1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'
Hmph. I'll correct that. Thanks.
2. The idea of a soft hyphen is as a hint to a hyphenator about where
to insert a hyphen if one is necessary and the hyphenator is suspected
of acting cluelessly without the hint. IMHO, asciification should
substitute u'', not u'-'.
Thanks also here. I'll think about it.
3. Read PEP 8. s/:/: /
I don't like the spacing in 8, personally.
Thanks,
Jim
On Jul 1, 8:29 pm, John Machin <sjmac...@lexic on.netwrote:
On Jul 2, 9:55 am, Jim <jim.heffe...@g mail.comwrote:
Comments on the above grep output:
1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'
Hmph. I'll correct that. Thanks.
2. The idea of a soft hyphen is as a hint to a hyphenator about where
to insert a hyphen if one is necessary and the hyphenator is suspected
of acting cluelessly without the hint. IMHO, asciification should
substitute u'', not u'-'.
Thanks also here. I'll think about it.
3. Read PEP 8. s/:/: /
I don't like the spacing in 8, personally.
Thanks,
Jim
Jim <ji**********@g mail.comwrites:
I don't like the spacing in [PEP 8], personally.
Nevertheless, your Python code will be much less effort to read by
others (and yourself in future) if it is written in conformance with
PEP 8.
Writing all your Python code to conform with that standard is the
simplest step you can take to ensure that your code won't cause other
Python programmers undue reading effort.
--
\ “There's no excuse to be bored. Sad, yes. Angry, yes. |
`\ Depressed, yes. Crazy, yes. But there's no excuse for boredom, |
_o__) ever.” —Viggo Mortensen |
Ben Finney
On Jul 1, 8:42 pm, Jim <jim.heffe...@g mail.comwrote:
On Jul 1, 8:29 pm, John Machin <sjmac...@lexic on.netwrote:
Comments on the above grep output:
1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'
Hmph. I'll correct that. Thanks.
Well, maybe not. I forgot that I got the by-hand conversions from
three different sources and that's why that character appears in two
different places. (I thought that listing all cases for each source
was less confusing. Arguable, for sure.)
2. The idea of a soft hyphen is as a hint to a hyphenator about where
to insert a hyphen if one is necessary and the hyphenator is suspected
of acting cluelessly without the hint. IMHO, asciification should
substitute u'', not u'-'.
Thanks also here. I'll think about it.
Googling "soft hyphen" showed me that the question is not perfectly
clear-- some people seem to have very elaborate opinions on the
topic-- but I've gone with your suggestion. Thank you.
Again, I'd appreciate additional corrections. Not do I only speak
ASCII :-( but I admit to entering the data while watching a basketball
game, so no doubt there are some real blunders.
Thanks,
Jim This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Markus Hmmerli |
last post by:
I ' ll tra to convert a Cstring to char* without success.
I working in a Unicode enabled environment
this is working in Unicode
CString source = _T("TestString");
TCHAR *szSource = source.GetBuffer(0 );
but i need a char* and so this is not working
CString source = _T("TestString");
|
by: Eric Lilja |
last post by:
Hello, I had what I thought was normal text-file and I needed to locate a
string matching a certain pattern in that file and, if found, replace that
string. I thought this would be simple but I had problems getting my
algorithm to work and in order to help me find the solution I decided to
print each line to screen as I read them.
Then, to my surprise, I noticed that there was a space between every
character as I outputted the lines to the...
|
by: culley harrelson |
last post by:
It seems to me that these values should be the same:
select 'lydia eugenia trevio', convert('lydia eugenia trevio' using
ascii_to_utf_8);
but they seem to be different. What am I missing?
culley
---------------------------(end of broadcast)---------------------------
|
by: davihigh |
last post by:
My Friends:
I am using std::ofstream (as well as ifstream), I hope that when i
wrote in some std::string(...) with locale, ofstream can convert to
UTF-8 encoding and save file to disk. So does ifstream.
Something I found shows that, I need to have a proper codecvt to set
it. I need more information, maybe a small piece of code sample. Thank
you!
|
by: csanjith |
last post by:
Hi, i have a situaion where i need to convert the characters entered in
an text field to upper case using C. The configuration id utf8
environment in which user can enter any character (single , double,
triple byte etc). I need to convert to upper case only those characters
which has got upper case. ie if an user enter bot english and japanese
characters in the text field, then I should convert only english
characters, not japanese.
| |
by: thinktwice |
last post by:
i'm using VC++6 IDE
i know i could use macros like A2T, T2A,
but is there any way more decent way to do this?
|
by: Donn Ingle |
last post by:
Hello,
I hope someone can illuminate this situation for me.
Here's the nutshell:
1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale.
2. If this returns "C" or anything without 'utf8' in it, then things start
to go downhill:
2a. The app assumes unicode objects internally. i.e. Whenever there is
|
by: M.-A. Lemburg |
last post by:
On 2008-07-01 20:31, Peter Bulychev wrote:
You could write a codec which translates Unicode into a ASCII
lookalike characters, but AFAIK there is no standard for doing
this.
I guess the best choice is to use the Unicode code point names
as basis. These can be accessed via unicodedata.name(). You can
then create a mapping which can be processed by the character
map codec.
|
by: est |
last post by:
From python manual
str( )
Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to
return a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |