Hi, i have a situaion where i need to convert the characters entered in
an text field to upper case using C. The configuration id utf8
environment in which user can enter any character (single , double,
triple byte etc). I need to convert to upper case only those characters
which has got upper case. ie if an user enter bot english and japanese
characters in the text field, then I should convert only english
characters, not japanese.
I have seen that the C functions toupper() and tolower() handles multi
byte characters from Solaris 8. I am not sure with other platforms.
can any one suggest the best approch for the above scenario. 8 20296 cs******@gmail. com writes: Hi, i have a situaion where i need to convert the characters entered in
Hi. Do you know you triple-posted this?
an text field to upper case using C. The configuration id utf8 environment in which user can enter any character (single , double, triple byte etc). I need to convert to upper case only those characters
which has got upper case. ie if an user enter bot english and japanese characters in the text field, then I should convert only english characters, not japanese.
I have seen that the C functions toupper() and tolower() handles multi byte characters from Solaris 8. I am not sure with other platforms.
can any one suggest the best approch for the above scenario.
The encodings supported by your C implementation for operations in
toupper() and tolower() are implementation-defined: you'll need to
look it up in your documentation.
You might need to use setlocale() with something like "en_US.UTF8 ",
and restore the locale afterwards. This may or may not work. And you
still won't be able to work with multibyte strings directly: you'll
have to convert to and from wchar_t's.
Your best bet is to use a (off-topic) specialized library devoted to
manipulating UTF8 strings. IBM has one: http://www-306.ibm.com/software/glob.../icu/index.jsp
This provides a u_strToLower() function in ustring.h. Please don't
post here for questions regarding this library, however, as it is
off-topic for this NG.
HTH,
-Micah
Micah Cowan a écrit : Your best bet is to use a (off-topic) specialized library devoted to manipulating UTF8 strings. IBM has one:
And glib can do it too.
On Thursday 16 March 2006 20:26, Micah Cowan opined (in
<87************ @mcowan.barracu danetworks.com> ): cs******@gmail. com writes: Hi, i have a situaion where i need to convert the characters entered
Hi. Do you know you triple-posted this?
It's the blinkin' Google. It does it sometimes.
--
BR, Vladimir
"There is hopeful symbolism in the fact that flags do not wave in a
vacuum."
-- Arthur C. Clarke cs******@gmail. com wrote: Hi, i have a situaion where i need to convert the characters entered in
an text field to upper case using C. The configuration id utf8 environment in which user can enter any character (single , double, triple byte etc). I need to convert to upper case only those characters
Latin based uppercasing is easy, just convert to those characters
exactly in the lower-case or upper-case ASCII range. This is one of
the properties of UTF-8. However, to perform correct case change over
the whole Unicode range, you need to simply know which characters have
either a upper case or capitalization case alternative character (as
well as the reverse.) This information is available from the standard
Unicode data table.
Oh yeah, and this off topic here in comp.lang.c. ANSI C does not have
a notion of portable internationaliz ation, let alone Unicode (though
some compilers implement wchar_t as Unicode, this cannot be relied
upon.)
--
Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
On Thu, 16 Mar 2006 10:17:13 -0800, csanjith wrote: Hi, i have a situaion where i need to convert the characters entered in
an text field to upper case using C. The configuration id utf8 environment in which user can enter any character (single , double, triple byte etc). I need to convert to upper case only those characters
which has got upper case. ie if an user enter bot english and japanese characters in the text field, then I should convert only english characters, not japanese.
I have seen that the C functions toupper() and tolower() handles multi byte characters from Solaris 8. I am not sure with other platforms.
It would seem improbable that toupper(), on Solaris or elsewhere, could
give the correct output for all valid input when using UTF-8, UTF-16,
UTF-32 or any other Unicode encoding variant.
For all Unicode encoding variants there are some "characters " (or
"graphemes" in Unicode terminology) which can be encoded
equivalently across multiple sets of integer values. Assume UTF-32, and
our Unicode string is an array of uint32_t integers. The Latin-1 character
a+umlaut could be described equivalently within the range of one 32-bit
integer with a value of 0x000000C1 or by combining two integers,
0x00000041-0x00000301. The latter representation could not be passed to
toupper() or tolower(), considering that neither can take an array.
[ http://www.unicode.org/faq/char_combmark.html#8]
Hmmmm. Do any Unicode gurus know if 0x0061-0301 would accomplish a capital
A + umlaut? Regardless, I strongly suspect that there are many graphemes
in many scripts where such a trick could never work, but where notions
like uppercase or lowercase are still meaningful.
On Thu, 16 Mar 2006 17:29:38 -0800, websnarf wrote: cs******@gmail. com wrote: Hi, i have a situaion where i need to convert the characters entered in
an text field to upper case using C. The configuration id utf8 environment in which user can enter any character (single , double, triple byte etc). I need to convert to upper case only those characters Latin based uppercasing is easy, just convert to those characters exactly in the lower-case or upper-case ASCII range. This is one of the properties of UTF-8. However, to perform correct case change over the whole Unicode range, you need to simply know which characters have either a upper case or capitalization case alternative character (as well as the reverse.) This information is available from the standard Unicode data table.
Latin != ASCII. ASCII is 7-bit, ISO Latin encodings are 8-bit. Non-ASCII
code points, in UTF-8, are multibyte. How do you pass an array to toupper()?
Oh yeah, and this off topic here in comp.lang.c. ANSI C does not have a notion of portable internationaliz ation, let alone Unicode (though some compilers implement wchar_t as Unicode, this cannot be relied upon.)
The wide-character API is not sufficient to support Unicode.
But you're right, this is off-topic. Anybody know where this would be
on-topic, though?
William Ahern <wi*****@25than dClement.com> wrote: On Thu, 16 Mar 2006 10:17:13 -0800, csanjith wrote:
Hi, i have a situaion where i need to convert the characters entered in an text field to upper case using C. The configuration id utf8 environment in which user can enter any character (single , double, triple byte etc). I need to convert to upper case only those characters which has got upper case. ie if an user enter bot english and japanese characters in the text field, then I should convert only english characters, not japanese.
For all Unicode encoding variants there are some "characters " (or "graphemes" in Unicode terminology) which can be encoded equivalently across multiple sets of integer values. Assume UTF-32, and our Unicode string is an array of uint32_t integers. The Latin-1 character a+umlaut could be described equivalently within the range of one 32-bit integer with a value of 0x000000C1 or by combining two integers, 0x00000041-0x00000301. The latter representation could not be passed to toupper() or tolower(), considering that neither can take an array.
[http://www.unicode.org/faq/char_combmark.html#8]
Hmmmm. Do any Unicode gurus know if 0x0061-0301 would accomplish a capital A + umlaut?
If I read the Unicode Standard correctly, yes, it would. However, the
right question is: do you really _want_ to capitalise an accented lowed
case letter to an accented upper case letter? In Dutch you wouldn't.
There are (at least) two reasonable C solutions:
- trust that your implementation handles this correctly, for example by
letting the sysadmin of the system the program runs on install
language-specific libraries for the <ctype.h> functions, and just use
tolower() and toupper(), as you would otherwise;
- assume that you know better than J. Random Sysadmin which characters
you want to capitalise, and write your own case-changing functions
with knowledge about the Unicode tables.
Something is to be said for either solution; the first is simpler and
more flexible, in the second case the results are more strictly known.
Richard we******@gmail. com wrote On 03/16/06 20:29,: cs******@gmail. com wrote: Hi, i have a situaion where i need to convert the characters entered in
an text field to upper case using C. The configuration id utf8 environment in which user can enter any character (single , double, triple byte etc). I need to convert to upper case only those characters Latin based uppercasing is easy, just convert to those characters exactly in the lower-case or upper-case ASCII range. [...]
Note that all of "àáâãäåæçèéêëìí îïðñòóôõöøùúûüý " appear
in the alphabets of Latinic (Latinous?) languages, are
outside the ASCII lower-case range, and yet have upper-
case equivalents.
-- Er*********@sun .com This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Hallvard B Furuseth |
last post by:
Has someone got a Python routine or module which converts Unicode
strings to lowercase (or uppercase)?
What I actually need to do is to compare a number of strings in a
case-insensitive manner, so I assume it's simplest to convert to
lower/upper first.
Possibly all strings will be from the latin-1 character set, so I could
convert to 8-bit latin-1, map to lowercase, and convert back, but that
seems rather cumbersome.
|
by: Vladimir |
last post by:
Method UnicodeEncoding.GetMaxByteCount(charCount) returns charCount * 2.
Method UTF8Encoding.GetMaxByteCount(charCount) returns charCount * 4.
But why that?
Look:
/*
Each Unicode character in a string is defined by a Unicode scalar value,
also called ...
|
by: hunterb |
last post by:
I have a file which has no BOM and contains mostly single byte chars. There
are numerous double byte chars (Japanese) which appear throughout. I need to
take the resulting Unicode and store it in a DB and display it onscreen. No
matter which way I open the file, convert it to Unicode/leave it as is or
what ever, I see all single bytes ok, but double bytes become 2 seperate
single bytes. Surely there is an easy way to convert these mixed...
|
by: Dan V. |
last post by:
How do I create a one line text file with these control codes? e.g.: 144 =
0x90 and 147 = 0x93?
I am trying to create a one line text file with these characters all one one
row with no spaces.
1. 144 = 0x90
2. 147 = 0x93
3. STX = (^B = 2 = 0x2)
4. NUL = (^@ = 0 = 0x0)
|
by: Friso Wiskerke |
last post by:
Hi all,
I'm creating a fixed length textfile with data which is sent out to a
third-party which in turn reads the file and processes it. Some of the
characters are not part of the lower ASCII table. This causes problems
because an È (&HC4) in the textfile is converted into 2 bytes on the
receiving end which then in turn shifts the remaining data on the line one
byte to the right... and in a fixed length textfile that's a disaster
Is...
| |
by: Dino Buljubasic |
last post by:
Hi,
I would like to convert characters as typed in my text box to upper
case. How can I do this?
Thank you
|
by: csanjith |
last post by:
Hi, i have a situaion where i need to convert the characters entered in
an text field to upper case using C. The configuration id utf8
environment in which user can enter any character (single , double,
triple byte etc). I need to convert to upper case only those characters
which has got upper case. ie if an user enter bot english and japanese
characters in the text field, then I should convert only english
characters, not japanese.
|
by: uday.sen |
last post by:
Hi,
I need to convert a string from UTF8 to wide character (wchar_t *). I
perform the same in windows using:
MultiByteToWideChar(CP_UTF8, 0, pInput, -1, pOutput, nLen);
However, in linux this API is not available. However, there exists
mbstowcs() API, which converts multibyte string to wide character. But
will this API convert UTF8 encoded string to wide character? Or this
|
by: Tux SC |
last post by:
Hello,
I have a JavaScript function which receives as input a string which in
some cases contain non-UTF8 characters. The output of this function is
used by another utility which fails if it receives any non-UTF8
characters. Is there a way for me to filter out or convert the non-
UTF8 characters in the JavaScript function?
Thanks much!
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |