By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
426,034 Members | 1,672 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 426,034 IT Pros & Developers. It's quick & easy.

serious encoding problem

P: n/a
I want to save text in a file and after that I want to display this textfile
using the internet explorer.

If I am displaying "html text" everything is fine but if I want to display
plain text all characters from the extended ascii are looking weird - are
not properly encoded! Using the options in View -> Encoding -> ... in the
internet explorer I can switch to another encoding and it is displayed
correct. With the same way, I can make the "html text" look weird.

In my program I am using the AxSHDocVw.AxWebBrowser control to display the
text.
How is that problem solved? Outlook Express for instance is displaying all
messages/ text correct - html and plain text messages. How can I achieve
that behavior? Is there a way to change the encoding from code?

Thanks really a lot in advance,
timtos.

Nov 13 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
timtos <ha*****@uni-koblenz.de> wrote:
I want to save text in a file and after that I want to display this textfile
using the internet explorer.

If I am displaying "html text" everything is fine but if I want to display
plain text all characters from the extended ascii are looking weird - are
not properly encoded!


And what exactly is "the extended ascii"? Ascii is unicode 0-127, and
nothing else. There are various encodings which have the same values
for 0-127, but they differ considerably between each other. You need to
know *exactly* what encoding you really want to use, and then use the
appropriate Encoding instance for output.

See http://www.pobox.com/~skeet/csharp/unicode.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Nov 13 '05 #2

P: n/a
Thanks for answering! With "the extended ascii" I meant 128 and above!
I came to the phrase "the extended ascii" because at
http://www.asciitable.com/ at the bottom, it is refered to the "Extended
ASCII Codes" and they mean with it what I meant with "the extended ascii"
;-) IŽll have a look at your link now. Thanks again for answering.

Greetings,
timtos.

"Jon Skeet" <sk***@pobox.com> wrote in message
news:MP************************@news.microsoft.com ...
timtos <ha*****@uni-koblenz.de> wrote:
I want to save text in a file and after that I want to display this textfile using the internet explorer.

If I am displaying "html text" everything is fine but if I want to display plain text all characters from the extended ascii are looking weird - are not properly encoded!


And what exactly is "the extended ascii"? Ascii is unicode 0-127, and
nothing else. There are various encodings which have the same values
for 0-127, but they differ considerably between each other. You need to
know *exactly* what encoding you really want to use, and then use the
appropriate Encoding instance for output.

See http://www.pobox.com/~skeet/csharp/unicode.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too

Nov 13 '05 #3

P: n/a
timtos <ha*****@uni-koblenz.de> wrote:
Thanks for answering! With "the extended ascii" I meant 128 and above!
But there *is* no single "extended ascii".
I came to the phrase "the extended ascii" because at
http://www.asciitable.com/ at the bottom, it is refered to the "Extended
ASCII Codes" and they mean with it what I meant with "the extended ascii"
;-) IŽll have a look at your link now. Thanks again for answering.


That page was written by someone who doesn't really understand what
ASCII is, I suspect. As I said, there are various "extensions" to
ASCII, none of which can uniquely be called "extended ascii". When
other people talk about "extended ascii" they mean different things...
and that's why it's a term which should never, IMO, be used.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Nov 13 '05 #4

P: n/a
> When other people talk about "extended ascii" they mean different
things... and that's why it's a term which should never, IMO, be used.

And I agree with you ;-)
Thanks for clearing things up!

Greetings,
timtos.

"Jon Skeet" <sk***@pobox.com> wrote in message
news:MP************************@news.microsoft.com ...
timtos <ha*****@uni-koblenz.de> wrote:
Thanks for answering! With "the extended ascii" I meant 128 and above!


But there *is* no single "extended ascii".
I came to the phrase "the extended ascii" because at
http://www.asciitable.com/ at the bottom, it is refered to the "Extended
ASCII Codes" and they mean with it what I meant with "the extended ascii" ;-) IŽll have a look at your link now. Thanks again for answering.


That page was written by someone who doesn't really understand what
ASCII is, I suspect. As I said, there are various "extensions" to
ASCII, none of which can uniquely be called "extended ascii". When
other people talk about "extended ascii" they mean different things...
and that's why it's a term which should never, IMO, be used.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too

Nov 13 '05 #5

P: n/a
timtos <ha*****@uni-koblenz.de> wrote:
But I still got my initial problem :-(
I think I understood the Unicode stuff but perhaps there is something out
there concerning encoding what I _donŽt_ understand...


Okay. Do this in several stages. First, work out what the encoding of
the text file is. Then load it into a .NET program, and print out the
unicode value (as an integer) of each character. That way you'll know
you've loaded it properly. Then work out exactly what encoding your
control wants (hopefully it'll be documented) and then you should be
able to encode it appropriately.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Nov 13 '05 #6

P: n/a
First of all, thanks a lot for your help Jon Skeet. You helped a lot!
But a few little questions remain although now the encoding is working:

I created the file with a StreamWriter and used the following call to
initialize it:
sw = fit.CreateText();

So the default encoding was used - UTF8. I thought that was ok because in
the msdn it is written:
"UTF-8 handles all Unicode characters correctly and gives consistent results
on localized versions of the operating system."

But trying to display that file in the AxWebBrowser control, the characters
were messed up because this control uses West-European(Windows) encoding.
Now I use the following call to initialize the StreamWriter object:
sw = new StreamWriter(path, false, System.Text.Encoding.Default);

Now it is working!
But why is UTF-8 not the right way here?
Any thoughts about this problem? Is the way I am going now ok?

Thanks for sharing your thoughts,
timtos.

"Jon Skeet" wrote:
Okay. Do this in several stages. First, work out what the encoding of
the text file is. Then load it into a .NET program, and print out the
unicode value (as an integer) of each character. That way you'll know
you've loaded it properly. Then work out exactly what encoding your
control wants (hopefully it'll be documented) and then you should be
able to encode it appropriately.

Nov 13 '05 #7

P: n/a
timtos <ha*****@uni-koblenz.de> wrote:
First of all, thanks a lot for your help Jon Skeet. You helped a lot!
But a few little questions remain although now the encoding is working:

I created the file with a StreamWriter and used the following call to
initialize it:
sw = fit.CreateText();

So the default encoding was used - UTF8. I thought that was ok because in
the msdn it is written:
"UTF-8 handles all Unicode characters correctly and gives consistent results
on localized versions of the operating system."
UTF-8 itself does indeed handle all Unicode characters.
But trying to display that file in the AxWebBrowser control, the characters
were messed up because this control uses West-European(Windows) encoding.
Right.
Now I use the following call to initialize the StreamWriter object:
sw = new StreamWriter(path, false, System.Text.Encoding.Default);

Now it is working!
But why is UTF-8 not the right way here?
Because as you say, the control doesn't use UTF-8.
Any thoughts about this problem? Is the way I am going now ok?


If you could find some way to make the AxWebBrowser control use UTF-8,
that would be the best solution. UTF-8 is a generally nice encoding.

If you can't change what encoding a control will understand, however,
you must "feed" it stuff encoded with what it *does* understand.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Nov 13 '05 #8

P: n/a
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com>
<"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com>>
wrote:
Everything John said is true. But note that the method of using "Default" as
the encoding type will not work well on CJK platforms since some of the byte
combinations that will be produced wwill be illegal in the default system
code page.
I don't understand that. How can using Encoding.Default produce illegal
byte sequences in the default system code page - I thought the whole
point of Encoding.Default was that it *was* the default system code
page.
The way it is working now.... the data is improperly translated to the wrong
code page, but then later you improperly convert it back using the same
encoding. So it is a good example of "two wrongs making a right!"


Not entirely sure about this, either - as far as I can see the OP is
*only* encoding text to a file, not decoding it at all. The browser
control is doing that, and so long as the file is encoded with the same
code page that the browser control is decoding with, how are either of
them "wrong" as such?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Nov 13 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.