473,804 Members | 2,164 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Replace accented chars with unaccented ones

Hi

I would like to replace accentuel chars (like "é", "è" or "Ã*") with non
accetued ones ("é" -> "e", "è" -> "e", "Ã*" -> "a").

I have tried string.replace method, but it seems dislike non ascii chars...

Can you help me please ?
Thanks.
Jul 18 '05
14 16130
> r += xlate[ord(i)]
r += i


Perhaps I'm going to have to create a signature and drop information
about this in every post to c.l.py, but repeated string additions are
slow as hell for any reasonably large lengthed string. It is much
faster to place characters into a list and ''.join() them.
def test_s(l): .... t = time.time()
.... for i in xrange(100):
.... a = ''
.... for j in xrange(l):
.... a += '0'
.... return time.time()-t
.... def test_l(l): .... t = time.time()
.... for i in xrange(100):
.... a = ''.join(['0' for j in xrange(l)])
.... return time.time()-t
.... i = 128
while i < 4097:

.... print test_s(i), test_l(i)
.... i *= 2
....
0.0150001049042 0.0309998989105
0.0469999313354 0.047000169754
0.140999794006 0.109000205994
0.343999862671 0.203000068665
0.905999898911 0.40700006485
2.56200003624 0.828000068665

At 256 characters long, it looks about even. Anything longer and
''.join(lst) is significantly faster.

When we do something like the below, the overhead of creating short
lists is significant, but it is still faster when l is greater than
roughly 2048:
a = []
for i in xrange(l):
a += ['0']
- Josiah
Jul 18 '05 #11
> Using the .translate() method on unicode strings should be
even more performant:

# prepare mapping table to match .translate interface
table = {}
for k,v in replacement_pai rs: table[ord(k)]=v

def multi_replace(i np):
return inp.translate(t able)


Even better *smile*.

- Josiah
Jul 18 '05 #12
Josiah Carlson <jc******@nospa m.uci.edu> wrote in message news:<c3******* ***@news.servic e.uci.edu>...
r += xlate[ord(i)]
r += i


Perhaps I'm going to have to create a signature and drop information
about this in every post to c.l.py, but repeated string additions are
slow as hell for any reasonably large lengthed string. It is much
faster to place characters into a list and ''.join() them.


True. Is this better?

... body of latin1_to_ascii () ...
r = []
for i in unicrap:
if xlate.has_key(o rd(i)):
r.append (xlate[ord(i)])
elif ord(i) >= 0x80:
pass
else:
r.append (i)
return ''.join(r)
Yours,
Noah
Jul 18 '05 #13
Noah wrote:
Josiah Carlson <jc******@nospa m.uci.edu> wrote in message news:<c3******* ***@news.servic e.uci.edu>...
r += xlate[ord(i)]
r += i


Perhaps I'm going to have to create a signature and drop information
about this in every post to c.l.py, but repeated string additions are
slow as hell for any reasonably large lengthed string. It is much
faster to place characters into a list and ''.join() them.

True. Is this better?

... body of latin1_to_ascii () ...
r = []
for i in unicrap:
if xlate.has_key(o rd(i)):
r.append (xlate[ord(i)])
elif ord(i) >= 0x80:
pass
else:
r.append (i)
return ''.join(r)


I'd use:
''.join([xlate.get(ord(i ), i) for i in unicrap \
if ord(i) in xlate or ord(i) < 0x80]

Using r.append(), in general, while being faster than string addition,
is significantly slower than using list comprehensions.

- Josiah
Jul 18 '05 #14
Nicolas Bouillon <bo***@bouil.or g.invalid> wrote:
Hi

I would like to replace accentuel chars (like "é", "è" or "ÃÂÂ*") with non
accetued ones ("é" -> "e", "è" -> "e", "ÃÂÂ*" -> "a").

I have tried string.replace method, but it seems dislike non ascii chars...

Can you help me please ?
Thanks.


You could try experimenting with the 'unicodedata' module:
import unicodedata
[unicodedata.nam e(x) for x in u'123 abc @#$ \u00ff'] ['DIGIT ONE', 'DIGIT TWO', 'DIGIT THREE', 'SPACE', 'LATIN SMALL LETTER
A', 'LATIN SMALL LETTER B', 'LATIN SMALL LETTER C', 'SPACE',
'COMMERCIAL AT', 'NUMBER SIGN', 'DOLLAR SIGN', 'SPACE', 'LATIN SMALL
LETTER Y WITH DIAERESIS'] unicodedata.loo kup('latin capital letter a with grave')

u'\xc0'

You could strip the ' WITH...' part when applicable and convert names
back to string. You would only need to process characters with ord >=
160.

HTH,

AdSR
Jul 18 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
4883
by: Laurent | last post by:
Hello, I'm french and I have a small sorting problem with python (and zope's zcatalog): In a python shell python, try : 'é' > 'z' The answer is true. Then if you try
0
2204
by: Jeff Levinson [mcsd] | last post by:
I don't know of a component like that, but it is really easy to do yourself. First, the String.Replace function is unicode based so it's quite easy to use extended and standard characters interchangeably. What I would recommend is to create an object that holds the basic character and extended character in unicode (use this as a map for conversion) and then create a shared method on the object that ran the replace command and ...
4
3831
by: Robert Mark Bram | last post by:
Hi All, I have the following to replace newline chars with <br> in a string: ..replace(/\n/g,"<br>") How can I change this so that it replaces only if there is not already a "<br>newline" or "newline<p>" combo? Thanks for any advice!
2
1641
by: G. Brannon Smith | last post by:
I have a personal database of my books, several of which are French with accented characters in their titles. However I am getting inconsistent display of the accent characters depending on the app I am using to access the DB. When the accents show up OK in psql and phpPgAdmin, they look like garbage in pgaccess and pgadmin3. If I correct them in pgaccess and/or pgadmin3, they look like garbage in psql and phpPgAdmin.
7
7962
by: silverburgh.meryl | last post by:
Hi, If I have a string like this: char buff; buff ='h'; buff ='e'; buff ='l'; buff ='l'; buff ='o';
2
6433
by: gsuns82 | last post by:
Hi all, I have to replace accented characters from a input string with normal plain text.I have coded as follows. String input = "ÄÀÁÂÃ"; input= input.replaceAll("", "A"); like wise v can do for all. output was: ************ AAAAA
7
3796
by: Grok | last post by:
I need an elegant way to remove any characters in a string if they are not in an allowed char list. The part cleaning files of the non-allowed characters will run as a service, so no forms here. The list also needs to be editable by the end-user so I'll be providing a form on which they can edit the allowed character list. The end-user is non-technical so asking them to type a regular expression is out.
13
3758
by: Hongyu | last post by:
Hi, I have a datetime char string returned from ctime_r, and it is in the format like ""Wed Jun 30 21:49:08 1993\n\0", which has 26 chars including the last terminate char '\0', and i would like to remove the weekday information that is "Wed" here, and I also would like to replace the spaces char by "_" and also remove the "\n" char. I didn't know how to truncate the string from beginning or replace some chars in a string with another...
0
10600
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10350
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10351
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10096
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7638
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6866
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5673
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3834
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3002
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.