473,385 Members | 1,740 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Translate UTF16 into lower ascii

Bob
Is there an easy way to translate odd UTF8/16 characters (like letters
with umlauts, vowels with accent symbols above) into the closest
'look-alike' lower ascii equivalent (A-Z, a-z)?

This is something that has probably been done, but I can't think of a
good search key for finding the code.

Nov 21 '08 #1
4 3269
Bob
On Fri, 21 Nov 2008 02:44:24 -0500, "Michael B. Trausch"
<mi**@trausch.uswrote:
>On Fri, 21 Nov 2008 01:56:04 -0500
Bob <Bo*@nospam.comwrote:
>Is there an easy way to translate odd UTF8/16 characters (like letters
with umlauts, vowels with accent symbols above) into the closest
'look-alike' lower ascii equivalent (A-Z, a-z)?

This is something that has probably been done, but I can't think of a
good search key for finding the code.

There may be a library out there somewhere, but I am sure that it is so
obscure that I can't find it.

Your best best would be to try to transliterate what you can and drop
what you can't transliterate. A table-based approach would be the only
way I can see being able to do it reasonably. Maybe looking for a list
of transliterations that you could preprocess into a table would be
ideal?

--- Mike
Very likely that someone has already done this, as there are occasions
that plain 'lower ascii' must be used, like on cell phone keypads. If
someone wanted to enter the name "Andre" on a cell phone, there would
be no access to an E with the accent over it.

Now, to find it...
Nov 21 '08 #2
Bob
On Fri, 21 Nov 2008 09:00:53 +0100, Jérémy Jeanson
<je************@free.frwrote:
>System.Text.ASCIIEncoding have some methodes to convert, translate
chars. you can find many exmeple in MSDN
Entirely appropriate to hear from someone with two accents in their
name. <G Good example here, as I wouldn't know how to type your name
as you have it spelled above. And you wouldn't want to drop the two
E's...you'd translate to lower ascii E when necessary.

I presume that you're referring to the Decoder.Convert functions via
ASCIIEncoding classes. I didn't see anything that looked like it would
do this.
Nov 21 '08 #3
Bob wrote:
Very likely that someone has already done this, as there are occasions
that plain 'lower ascii' must be used, like on cell phone keypads. If
someone wanted to enter the name "Andre" on a cell phone, there would

Really? I have all umlauts available on my mobile (and it is not a
special or expensive model). It depends on the language setting, if it
is set to English then there are no special characters of course. Think
about Chinese or Japanese mobiles, they do not have 2000+ tiny keys -
but I guess you can send Chinese text using the keypad somehow...

Michael
Nov 21 '08 #4
MC

"Bob" <Bo*@nospam.comwrote in message news:8d********************************@4ax.com...
Is there an easy way to translate odd UTF8/16 characters (like letters
with umlauts, vowels with accent symbols above) into the closest
'look-alike' lower ascii equivalent (A-Z, a-z)?

This is something that has probably been done, but I can't think of a
good search key for finding the code.
Check an earlier thread here about "Remove accents" or somesuch.

The key idea is to "normalize" the Unicode in such a way that the accents become combining characters (e.g., the acute accent is a separate character from the letter it appears on), then remove the combining characters (which have codes in a particular, high-numbered range).
Nov 21 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: John Perks and Sarah Mount | last post by:
(My Python uses UTF16 natively; can someone with UTF32 Python let me know if that behaves differently?) >>> import codecs >>> u'\ud800' # part of surrogate pair u'\ud800'...
22
by: DJ | last post by:
Can someone tell me the library call that converts strings to lower case or retrns a new string that is lower case of the original, thanks im using <string> David
17
by: Janice | last post by:
char* line = "abcd"; How to convert the line to upper case and print? Any option for printf to do this? Thanx
4
by: Fuzzyman | last post by:
Hello all, I'm handling some text files where I don't (necessarily) know the encoding beforehand. Because I use regular expressions to parse the text I *must* decode UTF16 encoded text...
2
by: johkar | last post by:
I have a rather long unbroke string (a URL) which I would like to break at certain points using XSL. I can't seem to get translate() to work: translate(.,'/',,'&lt;wbr>/&lt;/wbr>') I don't know...
9
by: B Williams | last post by:
I have written some code that will take in a string and print out the reverse, but I also want it to check for upper and lower case and swap them. Will someone assist me? include <iostream>...
4
by: R Wood | last post by:
Greetings - A recent Perl experiment hasn't turned out so well, which has piqued my interest in Python. The project is this: take a Vcard file exported from Apple's Addressbook and use a...
5
by: Les Caudle | last post by:
I've got some C# 2.0 code that has been working for a year. using (XmlWriter w = XmlWriter.Create("out.xml" ,settings)) { // many lines of code to write to w...
1
by: Server Applications | last post by:
Hello I am trying to build a system where I can full-text index documents with UTF8 or UTF16 data using Oracle Text. I am doing the filtering in a third-party component outside the database, so...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.