(this is follow-on message to one posted yesterday)
I'm trying to reproduce the capabilities in both Notepad and Excel, whereby
a Unicode text file with Unicode characters can be converted to ANSI, while
still preserving the unicode characters within.
Specifically, I'm using the unicode character &x2022, which is a largish
bullet.
I've tried this:
_writer = new System.IO.Strea mWriter( file, false, new UnicodeEncoding ());
which creates the text file just fine with regards to the bullet character,
but in Unicode format with BOM and all. When I save this file to ANSI with
Excel or Notepad, life is good.
BUT, when I try this:
_writer = new System.IO.Strea mWriter( file, false, new
UnicodeEncoding (false, false));
The BOM is gone (good) but the bullet character gets converted into another
character.
What magic does Notepad/Excel use to preserve the character but lose the BOM? 6 2767
Yes of course I'm very familiar with code pages. The question really isn't
about Notepad or Excel... I'm simply using them as examples to show that this
can be done (create non-unicode file with unicode character preserved)
*somehow*.
I'm just trying to understand what Notepad does (assuming english page 1033)
when it saves the Unicode file as ANSI, still managing to preserve the bullet
character.
"Cor Ligthert [MVP]" wrote: Dbaldi,
Know that for everybody on the world NotePad can act different.
It is dependend from the used code page.
Have a look at these pages
http://www.geocities.com/Athens/Acad.../fontset.htm#b
http://www.microsoft.com/globaldev/r...ocversion.mspx
I hope that this gives an idea
Cor
dbaldi <db****@discuss ions.microsoft. com> wrote: (this is follow-on message to one posted yesterday)
I'm trying to reproduce the capabilities in both Notepad and Excel, whereby a Unicode text file with Unicode characters can be converted to ANSI, while still preserving the unicode characters within.
You can't. Excel and Notepad may happen to cope with the example you've
given, but I strongly suspect they don't do so reliably. They can't
possibly make every character in the appropriate ANSI encoding
available and cope with other characters unless they use escaping or
the like which would confuse other applications.
--
Jon Skeet - <sk***@pobox.co m> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Yes, I understand its an impossible generic solution. But my problem scope
does not go beyond this one single character. This buillet is the only
non-ASCII character the system needs to support. I'm trying to explain to my
client why Excel can do it but I can't using System.IO (or, find a way to
make it work of course).
I'm trying to understand how excel does this:
1) Open Uncode file with this bullet character
2) Save As... ANSI file in Excel - bullet is still there, but Excel still
recognizes the file as ANSI
dbaldi <db****@discuss ions.microsoft. com> wrote: Yes, I understand its an impossible generic solution. But my problem scope does not go beyond this one single character. This buillet is the only non-ASCII character the system needs to support. I'm trying to explain to my client why Excel can do it but I can't using System.IO (or, find a way to make it work of course).
I'm trying to understand how excel does this:
1) Open Uncode file with this bullet character 2) Save As... ANSI file in Excel - bullet is still there, but Excel still recognizes the file as ANSI
I suggest you examine the file with a hex editor, and see what byte
it's put there, and what character it should be in whichever ANSI code
page you're using.
--
Jon Skeet - <sk***@pobox.co m> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
> I'm trying to reproduce the capabilities in both Notepad and Excel, whereby a Unicode text file with Unicode characters can be converted to ANSI, while still preserving the unicode characters within.
Question 1: "what ANSI"
For Windows, "ANSI" means the default system code page. This means 932 for
Japanese, 1251 for Russian and so on.
If by ANSI you mean the "Western European ANSI" (1252, Latin 1), then
you should have no problem U+2022 maps to 0x95
_writer = new System.IO.Strea mWriter( file, false, new UnicodeEncoding ()); which creates the text file just fine with regards to the bullet character, but in Unicode format with BOM and all.
Normal. You ask for UnicodeEncoding , and you get Unicode Encoding.
Try System.Text.Enc oding with Encoding.Defaul t
What magic does Notepad/Excel use to preserve the character but lose the BOM?
There is no magic. ANSI does not have BOM and the U+2022 bullet maps to
something that exists in 1252 (which I guess if you default system locale).
--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: ahsan Imam |
last post by:
Hello All,
I have this file and when I import the file in the python interpretor
I get the following error:
"__main__:1: DeprecationWarning: Non-ASCII character '\xc0' in file
trans.py on line 11, but no encoding declared; see
http://www.python.org/peps/pep-0263.html for details"
I am not sure what encoding to use. I am not sure where...
|
by: Leif K-Brooks |
last post by:
How do I make a regular expression which will match the same character
repeated one or more times, instead of matching repetitions of any
(possibly non-same) characters like ".+" does? In other words, I want a
pattern like this:
>>> re.findall(".+", "foo") # not what I want
>>> re.findall("something", "foo") # what I want
|
by: Haines Brown |
last post by:
I'm having trouble finding the character entity for the French
abbreviation for "number" (capital N followed by a small supercript
o, period).
My references are not listing it. Where would I find an answer to this
question (don't find it in the W3C_char_entities document).
--
Haines Brown
brownh@hartford-hwp.com
|
by: Lars |
last post by:
Why doesn't the W3C's HTML Validator recognize € and what do I have
to do to make my html-file valid?
|
by: Zenobia |
last post by:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?
Is there another character that will substitute? The W3C validation parser,
http://validator.w3.org, tells me that this character and the ones around it are illegal
- then, after resubmission it flags no errors.
So, are there any illegal characters between 0 and 255 in the...
| |
by: christopherlmarshall |
last post by:
I have gotten in the habit of using strings to manage character buffers
that I pass in to unix system calls.
For example, suppose I want to create a character buffer to use with
the "write" system call.
string buf(1024);
int fd;
write(fd,(void *)(&buf),buf.size());
|
by: The Bicycling Guitarist |
last post by:
A browser conforming to HTML 4.0 is required to recognize &#number;
notations.
If I use XHTML 1.0 and charset UTF-8 though, does é have as much
support as é ?
Sometimes when I run the TIDY utility on my code, it replaces my character
notations with weird looking things I don't recognize. Also, when I
converted to UTF-8 from...
|
by: Stefan Krah |
last post by:
Hello,
I am currently writing code where it is convenient to convert
char to int . The conversion function relies on
a character set with contiguous alphabets.
int set_mesg(Key *key, char *s)
{
char *x;
|
by: aegis |
last post by:
7.4#1 states
The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.
Why should something...
|
by: Anna |
last post by:
I try to put 8 int bit for example 10100010 into one character of type
char(1 octet) with no hope . Could anyone propose a simple way to do
it? Thank you very much.
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it. ...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...
| |