By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,537 Members | 1,471 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,537 IT Pros & Developers. It's quick & easy.

Remove accent marks from text?

P: n/a
MC
Is there a string function in .NET that will remove the accent marks from letters? I know that's a slightly vague request... and that I could implement it by table lookup (and will do so unless something's already there). But can it be accomplished by switching a string among "cultures" or something like that?
Oct 27 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a

"MC" wrote:
Is there a string function in .NET that will remove the accent marks from letters? I know that's a slightly vague request... and that I could implement it by table lookup (and will do so unless something's already there). But can it be accomplished by switching a string among "cultures" or something like that?
Hi,

You can remove non spacing characters (and possibly modifier characters)
from the string if you normalize it. This will effectively remove accents
(diacritics) as well.

string normalizedString = regularString.Normalize(NormalizationForm.FormD);

StringBuilder sb = new StringBuilder(normalizedString);

for (int i = 0; i < sb.Length; i++)
{
if (CharUnicodeInfo.GetUnicodeCategory(sb[i]) ==
UnicodeCategory.NonSpacingMark)
sb.Remove(i, 1);
}
regularString = sb.ToString();

--
Happy Coding!
Morten Wennevik [C# MVP]
Oct 27 '08 #2

P: n/a
MC
You can remove non spacing characters (and possibly modifier characters)
from the string if you normalize it. This will effectively remove accents
(diacritics) as well.
Thanks. I should have been clearer. Not only do I want to remove non-spacing characters, I also want to change accented letters to the corresponding unaccented letters. (This is for matching up foreign names... somebody long ago decided the database needed to be in plain ASCII.)
Oct 27 '08 #3

P: n/a
"MC" <fo**************@www.ai.uga.edu.slash.mcwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
>You can remove non spacing characters (and possibly modifier characters)
from the string if you normalize it. This will effectively remove
accents
(diacritics) as well.

Thanks. I should have been clearer. Not only do I want to remove
non-spacing
characters, I also want to change accented letters to the corresponding
unaccented letters. (This is for matching up foreign names... somebody
long
ago decided the database needed to be in plain ASCII.)
Here's hoping no one has used alternate spellings, like <letter>+e for
German umlauted letters. And will the es-tset get translated to "ss"...?
Oct 27 '08 #4

P: n/a

"MC" wrote:
You can remove non spacing characters (and possibly modifier characters)
from the string if you normalize it. This will effectively remove accents
(diacritics) as well.

Thanks. I should have been clearer. Not only do I want to remove non-spacing characters, I also want to change accented letters to the corresponding unaccented letters. (This is for matching up foreign names... somebody long ago decided the database needed to be in plain ASCII

That is exactly what you achieve by first normalizing (using FormD) and then
removing nonspacing characters. The normalized string will contain an ascii
character followed by a non spacing modifier character which when combined
will be the original character. Remove the non spacing characters and all
that remains is the unaccented text.

--
Happy Coding!
Morten Wennevik [C# MVP]
Oct 28 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.