471,357 Members | 1,083 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,357 software developers and data experts.

Unicode character in non-unicode text file

(this is follow-on message to one posted yesterday)

I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within.

Specifically, I'm using the unicode character &x2022, which is a largish
bullet.

I've tried this:

_writer = new System.IO.StreamWriter( file, false, new UnicodeEncoding());

which creates the text file just fine with regards to the bullet character,
but in Unicode format with BOM and all. When I save this file to ANSI with
Excel or Notepad, life is good.

BUT, when I try this:

_writer = new System.IO.StreamWriter( file, false, new
UnicodeEncoding(false, false));

The BOM is gone (good) but the bullet character gets converted into another
character.

What magic does Notepad/Excel use to preserve the character but lose the BOM?

Jul 22 '05 #1
6 2629
Dbaldi,

Know that for everybody on the world NotePad can act different.

It is dependend from the used code page.

Have a look at these pages

http://www.geocities.com/Athens/Acad.../fontset.htm#b

http://www.microsoft.com/globaldev/r...ocversion.mspx

I hope that this gives an idea

Cor
Jul 22 '05 #2
Yes of course I'm very familiar with code pages. The question really isn't
about Notepad or Excel... I'm simply using them as examples to show that this
can be done (create non-unicode file with unicode character preserved)
*somehow*.

I'm just trying to understand what Notepad does (assuming english page 1033)
when it saves the Unicode file as ANSI, still managing to preserve the bullet
character.

"Cor Ligthert [MVP]" wrote:
Dbaldi,

Know that for everybody on the world NotePad can act different.

It is dependend from the used code page.

Have a look at these pages

http://www.geocities.com/Athens/Acad.../fontset.htm#b

http://www.microsoft.com/globaldev/r...ocversion.mspx

I hope that this gives an idea

Cor

Jul 22 '05 #3
dbaldi <db****@discussions.microsoft.com> wrote:
(this is follow-on message to one posted yesterday)

I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within.


You can't. Excel and Notepad may happen to cope with the example you've
given, but I strongly suspect they don't do so reliably. They can't
possibly make every character in the appropriate ANSI encoding
available and cope with other characters unless they use escaping or
the like which would confuse other applications.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 22 '05 #4
Yes, I understand its an impossible generic solution. But my problem scope
does not go beyond this one single character. This buillet is the only
non-ASCII character the system needs to support. I'm trying to explain to my
client why Excel can do it but I can't using System.IO (or, find a way to
make it work of course).

I'm trying to understand how excel does this:

1) Open Uncode file with this bullet character
2) Save As... ANSI file in Excel - bullet is still there, but Excel still
recognizes the file as ANSI

Jul 22 '05 #5
dbaldi <db****@discussions.microsoft.com> wrote:
Yes, I understand its an impossible generic solution. But my problem scope
does not go beyond this one single character. This buillet is the only
non-ASCII character the system needs to support. I'm trying to explain to my
client why Excel can do it but I can't using System.IO (or, find a way to
make it work of course).

I'm trying to understand how excel does this:

1) Open Uncode file with this bullet character
2) Save As... ANSI file in Excel - bullet is still there, but Excel still
recognizes the file as ANSI


I suggest you examine the file with a hex editor, and see what byte
it's put there, and what character it should be in whichever ANSI code
page you're using.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 22 '05 #6
> I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within. Question 1: "what ANSI"
For Windows, "ANSI" means the default system code page. This means 932 for
Japanese, 1251 for Russian and so on.

If by ANSI you mean the "Western European ANSI" (1252, Latin 1), then
you should have no problem U+2022 maps to 0x95
_writer = new System.IO.StreamWriter( file, false, new UnicodeEncoding());
which creates the text file just fine with regards to the bullet character,
but in Unicode format with BOM and all. Normal. You ask for UnicodeEncoding, and you get Unicode Encoding.
Try System.Text.Encoding with Encoding.Default
What magic does Notepad/Excel use to preserve the character
but lose the BOM?

There is no magic. ANSI does not have BOM and the U+2022 bullet maps to
something that exists in 1252 (which I guess if you default system locale).

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email
Jul 22 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by ahsan Imam | last post: by
9 posts views Thread by Leif K-Brooks | last post: by
38 posts views Thread by Haines Brown | last post: by
5 posts views Thread by Lars | last post: by
4 posts views Thread by christopherlmarshall | last post: by
50 posts views Thread by The Bicycling Guitarist | last post: by
5 posts views Thread by Stefan Krah | last post: by
21 posts views Thread by aegis | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.