By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,508 Members | 1,204 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,508 IT Pros & Developers. It's quick & easy.

utf-8 to ascii

P: n/a
I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change . any help is
appreciated thanks

Nov 14 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
"ma************@yahoo.com" <ma************@yahoo.com> wrote:
I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change .


Depends. What is "column length" in UTF? Is it the number of UTF-encoded
characters, the number of characters in that encoding, or something else
again? Note that for non-ASCII characters, the first count is smaller
than the second. Also, what are you going to do with those characters?
How will you map U0641 to ASCII?

(Note that a strict interpretation of what you wrote would result in a
trivial implementation: if a UTF-encoded character is not ASCII, it
cannot be converted to ASCII, so the whole conversion fails because of
malformed input - but if all input _is_ ASCII, then it has the same
encoding in UTF-8 as in ASCII in the first place, and no conversion is
necessary. This is not likely to be an acceptable solution ;-) )

Richard
Nov 14 '05 #2

P: n/a
"ma************@yahoo.com" <ma************@yahoo.com> wrote:
# I have a question. how to generate two files, one in UTF-8, the other
# in ASCII with the same column length SO that when i do the conversion
# from utf-8 to ascii, the column length does not change . any help is
# appreciated thanks

If you're restricting yourself to the ASCII codes x01 through x7E, the
UTF-8 and ASCII are identical. x00 is sometimes remapped to an unused unicode
character and I don't remember if x7F is the same in both.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
I think that's kinda of personal; I don't think I should answer that.
Nov 14 '05 #3

P: n/a
On 2005-01-26 18:06:43 -0500, SM Ryan
<wy*****@tango-sierra-oscar-foxtrot-tango.fake.org> said:
"ma************@yahoo.com" <ma************@yahoo.com> wrote:
# I have a question. how to generate two files, one in UTF-8, the other
# in ASCII with the same column length SO that when i do the conversion
# from utf-8 to ascii, the column length does not change . any help is
# appreciated thanks

If you're restricting yourself to the ASCII codes x01 through x7E, the
UTF-8 and ASCII are identical. x00 is sometimes remapped to an unused unicode
character and I don't remember if x7F is the same in both.


<pedantic>
By defintion, UTF-8 and ASCII are identical in the range [0, 0x7F],
period. No exception for 0x00 or 0x7F.
</pedantic>

--
Clark S. Cox, III
cl*******@gmail.com

Nov 14 '05 #4

P: n/a
On 26 Jan 2005 07:23:16 -0800,
ma************@yahoo.com <ma************@yahoo.com> wrote:

I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change . any help is
appreciated thanks


To closest ANSI comes to this issue is the mbtowc() and related functions.
However, a multibyte character may be utf-8 but it could also be something else.
And a wide character could be unicode or it may not be. In the interval
0x00 through 0xff the unicode value of a character is identical to the
iso-8859-1 value.

The bottom line is that the OS or some third party library may provide
the required conversion functions.

Villy
Nov 14 '05 #5

P: n/a
the column length in utf-8 means the number of UTF-encoded
characters.

Nov 14 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.