473,394 Members | 1,526 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Unicode character in non-unicode text file

(this is follow-on message to one posted yesterday)

I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within.

Specifically, I'm using the unicode character &x2022, which is a largish
bullet.

I've tried this:

_writer = new System.IO.StreamWriter( file, false, new UnicodeEncoding());

which creates the text file just fine with regards to the bullet character,
but in Unicode format with BOM and all. When I save this file to ANSI with
Excel or Notepad, life is good.

BUT, when I try this:

_writer = new System.IO.StreamWriter( file, false, new
UnicodeEncoding(false, false));

The BOM is gone (good) but the bullet character gets converted into another
character.

What magic does Notepad/Excel use to preserve the character but lose the BOM?

Jul 22 '05 #1
6 2749
Dbaldi,

Know that for everybody on the world NotePad can act different.

It is dependend from the used code page.

Have a look at these pages

http://www.geocities.com/Athens/Acad.../fontset.htm#b

http://www.microsoft.com/globaldev/r...ocversion.mspx

I hope that this gives an idea

Cor
Jul 22 '05 #2
Yes of course I'm very familiar with code pages. The question really isn't
about Notepad or Excel... I'm simply using them as examples to show that this
can be done (create non-unicode file with unicode character preserved)
*somehow*.

I'm just trying to understand what Notepad does (assuming english page 1033)
when it saves the Unicode file as ANSI, still managing to preserve the bullet
character.

"Cor Ligthert [MVP]" wrote:
Dbaldi,

Know that for everybody on the world NotePad can act different.

It is dependend from the used code page.

Have a look at these pages

http://www.geocities.com/Athens/Acad.../fontset.htm#b

http://www.microsoft.com/globaldev/r...ocversion.mspx

I hope that this gives an idea

Cor

Jul 22 '05 #3
dbaldi <db****@discussions.microsoft.com> wrote:
(this is follow-on message to one posted yesterday)

I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within.


You can't. Excel and Notepad may happen to cope with the example you've
given, but I strongly suspect they don't do so reliably. They can't
possibly make every character in the appropriate ANSI encoding
available and cope with other characters unless they use escaping or
the like which would confuse other applications.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 22 '05 #4
Yes, I understand its an impossible generic solution. But my problem scope
does not go beyond this one single character. This buillet is the only
non-ASCII character the system needs to support. I'm trying to explain to my
client why Excel can do it but I can't using System.IO (or, find a way to
make it work of course).

I'm trying to understand how excel does this:

1) Open Uncode file with this bullet character
2) Save As... ANSI file in Excel - bullet is still there, but Excel still
recognizes the file as ANSI

Jul 22 '05 #5
dbaldi <db****@discussions.microsoft.com> wrote:
Yes, I understand its an impossible generic solution. But my problem scope
does not go beyond this one single character. This buillet is the only
non-ASCII character the system needs to support. I'm trying to explain to my
client why Excel can do it but I can't using System.IO (or, find a way to
make it work of course).

I'm trying to understand how excel does this:

1) Open Uncode file with this bullet character
2) Save As... ANSI file in Excel - bullet is still there, but Excel still
recognizes the file as ANSI


I suggest you examine the file with a hex editor, and see what byte
it's put there, and what character it should be in whichever ANSI code
page you're using.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 22 '05 #6
> I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within. Question 1: "what ANSI"
For Windows, "ANSI" means the default system code page. This means 932 for
Japanese, 1251 for Russian and so on.

If by ANSI you mean the "Western European ANSI" (1252, Latin 1), then
you should have no problem U+2022 maps to 0x95
_writer = new System.IO.StreamWriter( file, false, new UnicodeEncoding());
which creates the text file just fine with regards to the bullet character,
but in Unicode format with BOM and all. Normal. You ask for UnicodeEncoding, and you get Unicode Encoding.
Try System.Text.Encoding with Encoding.Default
What magic does Notepad/Excel use to preserve the character
but lose the BOM?

There is no magic. ANSI does not have BOM and the U+2022 bullet maps to
something that exists in 1252 (which I guess if you default system locale).

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email
Jul 22 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: ahsan Imam | last post by:
Hello All, I have this file and when I import the file in the python interpretor I get the following error: "__main__:1: DeprecationWarning: Non-ASCII character '\xc0' in file trans.py on...
9
by: Leif K-Brooks | last post by:
How do I make a regular expression which will match the same character repeated one or more times, instead of matching repetitions of any (possibly non-same) characters like ".+" does? In other...
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
5
by: Lars | last post by:
Why doesn't the W3C's HTML Validator recognize &euro; and what do I have to do to make my html-file valid?
76
by: Zenobia | last post by:
How do I display character 151 (long hyphen) in XHTML (utf-8) ? Is there another character that will substitute? The W3C validation parser, http://validator.w3.org, tells me that this character...
4
by: christopherlmarshall | last post by:
I have gotten in the habit of using strings to manage character buffers that I pass in to unix system calls. For example, suppose I want to create a character buffer to use with the "write"...
50
by: The Bicycling Guitarist | last post by:
A browser conforming to HTML 4.0 is required to recognize &#number; notations. If I use XHTML 1.0 and charset UTF-8 though, does &eacute; have as much support as é ? Sometimes when I run...
5
by: Stefan Krah | last post by:
Hello, I am currently writing code where it is convenient to convert char to int . The conversion function relies on a character set with contiguous alphabets. int set_mesg(Key *key, char...
21
by: aegis | last post by:
7.4#1 states The header <ctype.h> declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the value of which shall be representable as an...
13
by: Anna | last post by:
I try to put 8 int bit for example 10100010 into one character of type char(1 octet) with no hope . Could anyone propose a simple way to do it? Thank you very much.
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.