473,545 Members | 2,049 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode character in non-unicode text file

(this is follow-on message to one posted yesterday)

I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within.

Specifically, I'm using the unicode character &x2022, which is a largish
bullet.

I've tried this:

_writer = new System.IO.Strea mWriter( file, false, new UnicodeEncoding ());

which creates the text file just fine with regards to the bullet character,
but in Unicode format with BOM and all. When I save this file to ANSI with
Excel or Notepad, life is good.

BUT, when I try this:

_writer = new System.IO.Strea mWriter( file, false, new
UnicodeEncoding (false, false));

The BOM is gone (good) but the bullet character gets converted into another
character.

What magic does Notepad/Excel use to preserve the character but lose the BOM?

Jul 22 '05 #1
6 2767
Dbaldi,

Know that for everybody on the world NotePad can act different.

It is dependend from the used code page.

Have a look at these pages

http://www.geocities.com/Athens/Acad.../fontset.htm#b

http://www.microsoft.com/globaldev/r...ocversion.mspx

I hope that this gives an idea

Cor
Jul 22 '05 #2
Yes of course I'm very familiar with code pages. The question really isn't
about Notepad or Excel... I'm simply using them as examples to show that this
can be done (create non-unicode file with unicode character preserved)
*somehow*.

I'm just trying to understand what Notepad does (assuming english page 1033)
when it saves the Unicode file as ANSI, still managing to preserve the bullet
character.

"Cor Ligthert [MVP]" wrote:
Dbaldi,

Know that for everybody on the world NotePad can act different.

It is dependend from the used code page.

Have a look at these pages

http://www.geocities.com/Athens/Acad.../fontset.htm#b

http://www.microsoft.com/globaldev/r...ocversion.mspx

I hope that this gives an idea

Cor

Jul 22 '05 #3
dbaldi <db****@discuss ions.microsoft. com> wrote:
(this is follow-on message to one posted yesterday)

I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within.


You can't. Excel and Notepad may happen to cope with the example you've
given, but I strongly suspect they don't do so reliably. They can't
possibly make every character in the appropriate ANSI encoding
available and cope with other characters unless they use escaping or
the like which would confuse other applications.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 22 '05 #4
Yes, I understand its an impossible generic solution. But my problem scope
does not go beyond this one single character. This buillet is the only
non-ASCII character the system needs to support. I'm trying to explain to my
client why Excel can do it but I can't using System.IO (or, find a way to
make it work of course).

I'm trying to understand how excel does this:

1) Open Uncode file with this bullet character
2) Save As... ANSI file in Excel - bullet is still there, but Excel still
recognizes the file as ANSI

Jul 22 '05 #5
dbaldi <db****@discuss ions.microsoft. com> wrote:
Yes, I understand its an impossible generic solution. But my problem scope
does not go beyond this one single character. This buillet is the only
non-ASCII character the system needs to support. I'm trying to explain to my
client why Excel can do it but I can't using System.IO (or, find a way to
make it work of course).

I'm trying to understand how excel does this:

1) Open Uncode file with this bullet character
2) Save As... ANSI file in Excel - bullet is still there, but Excel still
recognizes the file as ANSI


I suggest you examine the file with a hex editor, and see what byte
it's put there, and what character it should be in whichever ANSI code
page you're using.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 22 '05 #6
> I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within. Question 1: "what ANSI"
For Windows, "ANSI" means the default system code page. This means 932 for
Japanese, 1251 for Russian and so on.

If by ANSI you mean the "Western European ANSI" (1252, Latin 1), then
you should have no problem U+2022 maps to 0x95
_writer = new System.IO.Strea mWriter( file, false, new UnicodeEncoding ());
which creates the text file just fine with regards to the bullet character,
but in Unicode format with BOM and all. Normal. You ask for UnicodeEncoding , and you get Unicode Encoding.
Try System.Text.Enc oding with Encoding.Defaul t
What magic does Notepad/Excel use to preserve the character
but lose the BOM?

There is no magic. ANSI does not have BOM and the U+2022 bullet maps to
something that exists in 1252 (which I guess if you default system locale).

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email
Jul 22 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
4491
by: ahsan Imam | last post by:
Hello All, I have this file and when I import the file in the python interpretor I get the following error: "__main__:1: DeprecationWarning: Non-ASCII character '\xc0' in file trans.py on line 11, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details" I am not sure what encoding to use. I am not sure where...
9
10552
by: Leif K-Brooks | last post by:
How do I make a regular expression which will match the same character repeated one or more times, instead of matching repetitions of any (possibly non-same) characters like ".+" does? In other words, I want a pattern like this: >>> re.findall(".+", "foo") # not what I want >>> re.findall("something", "foo") # what I want
38
5696
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I find an answer to this question (don't find it in the W3C_char_entities document). -- Haines Brown brownh@hartford-hwp.com
5
6748
by: Lars | last post by:
Why doesn't the W3C's HTML Validator recognize &euro; and what do I have to do to make my html-file valid?
76
15048
by: Zenobia | last post by:
How do I display character 151 (long hyphen) in XHTML (utf-8) ? Is there another character that will substitute? The W3C validation parser, http://validator.w3.org, tells me that this character and the ones around it are illegal - then, after resubmission it flags no errors. So, are there any illegal characters between 0 and 255 in the...
4
3422
by: christopherlmarshall | last post by:
I have gotten in the habit of using strings to manage character buffers that I pass in to unix system calls. For example, suppose I want to create a character buffer to use with the "write" system call. string buf(1024); int fd; write(fd,(void *)(&buf),buf.size());
50
4277
by: The Bicycling Guitarist | last post by:
A browser conforming to HTML 4.0 is required to recognize &#number; notations. If I use XHTML 1.0 and charset UTF-8 though, does &eacute; have as much support as é ? Sometimes when I run the TIDY utility on my code, it replaces my character notations with weird looking things I don't recognize. Also, when I converted to UTF-8 from...
5
6641
by: Stefan Krah | last post by:
Hello, I am currently writing code where it is convenient to convert char to int . The conversion function relies on a character set with contiguous alphabets. int set_mesg(Key *key, char *s) { char *x;
21
1978
by: aegis | last post by:
7.4#1 states The header <ctype.h> declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined. Why should something...
13
2778
by: Anna | last post by:
I try to put 8 int bit for example 10100010 into one character of type char(1 octet) with no hope . Could anyone propose a simple way to do it? Thank you very much.
0
7468
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7401
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7656
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7757
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5329
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
4945
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3443
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1884
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
704
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.