473,408 Members | 1,601 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

utf-8 to ascii

I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change . any help is
appreciated thanks

Nov 14 '05 #1
5 3553
"ma************@yahoo.com" <ma************@yahoo.com> wrote:
I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change .


Depends. What is "column length" in UTF? Is it the number of UTF-encoded
characters, the number of characters in that encoding, or something else
again? Note that for non-ASCII characters, the first count is smaller
than the second. Also, what are you going to do with those characters?
How will you map U0641 to ASCII?

(Note that a strict interpretation of what you wrote would result in a
trivial implementation: if a UTF-encoded character is not ASCII, it
cannot be converted to ASCII, so the whole conversion fails because of
malformed input - but if all input _is_ ASCII, then it has the same
encoding in UTF-8 as in ASCII in the first place, and no conversion is
necessary. This is not likely to be an acceptable solution ;-) )

Richard
Nov 14 '05 #2
"ma************@yahoo.com" <ma************@yahoo.com> wrote:
# I have a question. how to generate two files, one in UTF-8, the other
# in ASCII with the same column length SO that when i do the conversion
# from utf-8 to ascii, the column length does not change . any help is
# appreciated thanks

If you're restricting yourself to the ASCII codes x01 through x7E, the
UTF-8 and ASCII are identical. x00 is sometimes remapped to an unused unicode
character and I don't remember if x7F is the same in both.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
I think that's kinda of personal; I don't think I should answer that.
Nov 14 '05 #3
On 2005-01-26 18:06:43 -0500, SM Ryan
<wy*****@tango-sierra-oscar-foxtrot-tango.fake.org> said:
"ma************@yahoo.com" <ma************@yahoo.com> wrote:
# I have a question. how to generate two files, one in UTF-8, the other
# in ASCII with the same column length SO that when i do the conversion
# from utf-8 to ascii, the column length does not change . any help is
# appreciated thanks

If you're restricting yourself to the ASCII codes x01 through x7E, the
UTF-8 and ASCII are identical. x00 is sometimes remapped to an unused unicode
character and I don't remember if x7F is the same in both.


<pedantic>
By defintion, UTF-8 and ASCII are identical in the range [0, 0x7F],
period. No exception for 0x00 or 0x7F.
</pedantic>

--
Clark S. Cox, III
cl*******@gmail.com

Nov 14 '05 #4
On 26 Jan 2005 07:23:16 -0800,
ma************@yahoo.com <ma************@yahoo.com> wrote:

I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change . any help is
appreciated thanks


To closest ANSI comes to this issue is the mbtowc() and related functions.
However, a multibyte character may be utf-8 but it could also be something else.
And a wide character could be unicode or it may not be. In the interval
0x00 through 0xff the unicode value of a character is identical to the
iso-8859-1 value.

The bottom line is that the OS or some third party library may provide
the required conversion functions.

Villy
Nov 14 '05 #5
the column length in utf-8 means the number of UTF-encoded
characters.

Nov 14 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: lawrence | last post by:
Someone on www.php.net suggested using a seems_utf8() method to test text for UTF-8 character encoding but didn't specify how to write such a method. Can anyone suggest a test that might work?...
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
6
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml...
6
by: archana | last post by:
Hi all, can someone tell me difference between unicode and utf 8 or utf 18 and which one is supporting more character set. whic i should use to support character ucs-2. I want to use ucs-2...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
4
by: shreshth.luthra | last post by:
Hi All, I am having a GUI which accepts a Unicode string and searches a given set of xml files for that string. Now, i have 2 XML files both of them saved in UTF-8 format, having characters...
10
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email...
23
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars...
35
by: Bjoern Hoehrmann | last post by:
Hi, For a free software project, I had to write a routine that, given a Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds the UTF-8 encoded form of it, for example, U+00F6...
4
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.