473,406 Members | 2,705 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

utf-8 and ascii


I have a question. how to generate two files, one in UTF-8, the other in
ASCII with the same column length
SO that when i do the conversion from utf-8 to ascii or vice versa, the
column length does not change . any help is appreciated
thanks

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 16 '05 #1
5 6395
> I have a question. how to generate two files, one in UTF-8, the other in
ASCII with the same column length
SO that when i do the conversion from utf-8 to ascii or vice versa, the
column length does not change . any help is appreciated
thanks


It is quite important to get the terminology right:

The two files are identical, because (by definition) ASCII uses characters
in the range 0-127. In that range, UTF8 is identical with ASCII.

If what you want is not ASCII, but ANSI, we have to make clear what you mean
by "column length." You mean number of bytes, or number of characters?

If you mean number of characters, this is again not changed by ANSI - UTF8
conversion.

If you want number of bytes, this is not possible, because a character
above 127 in ANSI encoding can take between 2 and 4 bytes in UTF8
(depending on the ANSI code page), but never 1 byte.
So the number of bytes in UTF8 is guaranteed to be higher or equal to the
one in ANSI encoding.

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email
Nov 16 '05 #2

Sorry for the cofusion. Here is what I meant to say.
I am genrating a file(.txt file, which is being opened with notepad),
the file has some data from some tables. The tables has fixed column
length, yet When i open in the notepad the column length changes. For ex
the data in one of the column is Republique Française. now the field
length in the table ( FoxPro database) is suppose 75. Yet when i open it
in the notepad it becomes 74. My problem is that when the encoding
changes from ASCII to UTF-8 , the field length ( or the column length )
for that value also changes. I know it is happening because no of bits
used in ASCII & UTF-8 are different. Is there soem way I can keep the
column length fixed to 75 only
Any help is appreciated

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 16 '05 #3
Meenu Mehta <ma************@yahoo.com> wrote:
Sorry for the cofusion. Here is what I meant to say.
I am genrating a file(.txt file, which is being opened with notepad),
the file has some data from some tables. The tables has fixed column
length, yet When i open in the notepad the column length changes. For ex
the data in one of the column is Republique Française.
Then it's not ASCII, to start with. There's no cedilla in ASCII.
now the field length in the table ( FoxPro database) is suppose 75. Yet
when i open it in the notepad it becomes 74. My problem is that when the
encoding changes from ASCII to UTF-8 , the field length ( or the column length )
for that value also changes. I know it is happening because no of bits
used in ASCII & UTF-8 are different. Is there soem way I can keep the
column length fixed to 75 only


I suspect you just want to create the file using Encoding.Default
instead of either Encoding.ASCII or Encoding.UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4
> I am genrating a file(.txt file, which is being opened with notepad),
the file has some data from some tables. The tables has fixed column
length, yet When i open in the notepad the column length changes. If only visualy the column changes, this might be just a font issue.
Try using a fixed-width font (like Courier)
For ex
the data in one of the column is Republique Française. now the field
length in the table ( FoxPro database) is suppose 75. Yet when i open it
in the notepad it becomes 74. My problem is that when the encoding
changes from ASCII to UTF-8 , the field length ( or the column length )
for that value also changes. I know it is happening because no of bits
used in ASCII & UTF-8 are different. Is there soem way I can keep the
column length fixed to 75 only

If you compare exporting from an ANSI database (correct term to use instead
ASCII) versus a UTF8 database, then the cause is deeper.
The size of the database field counts bytes, notepad (and the users) count
characters. There is no solution here except padding the columns with spaces
to the desired width.

Example:
-
Database field: 8
XXX = 3 characters, 3 bytes
Field contains "58 58 58 20 20 20 20 20"
Output to text "XXX "

ççç = 3 characters, 6 bytes
Field contains "c3 a7 c3 a7 c3 a7 20 20"
Output to text "ççç "

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email
Nov 16 '05 #5
> I am genrating a file(.txt file, which is being opened with notepad),
the file has some data from some tables. The tables has fixed column
length, yet When i open in the notepad the column length changes. If only visualy the column changes, this might be just a font issue.
Try using a fixed-width font (like Courier)
For ex
the data in one of the column is Republique Française. now the field
length in the table ( FoxPro database) is suppose 75. Yet when i open it
in the notepad it becomes 74. My problem is that when the encoding
changes from ASCII to UTF-8 , the field length ( or the column length )
for that value also changes. I know it is happening because no of bits
used in ASCII & UTF-8 are different. Is there soem way I can keep the
column length fixed to 75 only

If you compare exporting from an ANSI database (correct term to use instead
ASCII) versus a UTF8 database, then the cause is deeper.
The size of the database field counts bytes, notepad (and the users) count
characters. There is no solution here except padding the columns with spaces
to the desired width.

Example:
-
Database field: 8
XXX = 3 characters, 3 bytes
Field contains "58 58 58 20 20 20 20 20"
Output to text "XXX "

ççç = 3 characters, 6 bytes
Field contains "c3 a7 c3 a7 c3 a7 20 20"
Output to text "ççç "

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email
Nov 16 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: lawrence | last post by:
Someone on www.php.net suggested using a seems_utf8() method to test text for UTF-8 character encoding but didn't specify how to write such a method. Can anyone suggest a test that might work?...
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
6
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml...
6
by: archana | last post by:
Hi all, can someone tell me difference between unicode and utf 8 or utf 18 and which one is supporting more character set. whic i should use to support character ucs-2. I want to use ucs-2...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
4
by: shreshth.luthra | last post by:
Hi All, I am having a GUI which accepts a Unicode string and searches a given set of xml files for that string. Now, i have 2 XML files both of them saved in UTF-8 format, having characters...
10
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email...
23
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars...
35
by: Bjoern Hoehrmann | last post by:
Hi, For a free software project, I had to write a routine that, given a Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds the UTF-8 encoded form of it, for example, U+00F6...
4
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.