I have a software application I've written called PowerBlog (PowerBlog.net)
that takes the editing capability of the Internet Explorer WebBrowser
control (essentially a DHTMLTextBox), extracts the user-typed HTML, assigns
it as an XML node's InnerText property (using C#: System.Xml.XmlDocument
obj; obj.InnerText = myHTML). Then I later get the InnerText as a string and
write to disk.
When this text is displayed in a web browser, special characters that are
beyond the standard ASCII charset are not rendered correctly. Frequently, I
have copied text from a web site, pasted in the DHTMLTextbox, saved, and
published it, and my published output has corrupt characters. However, prior
to publishing, when previewing my document it looks fine -- it is only when
it is published (extracted, written to disk, uploaded to the server via FTP,
downloaded via HTTP) that the corruption occurs.
There are several places where this problem could be occurring, and I don't
know how to figure it out.
- A "design feature" in the XmlNode's InnerText property that converts the
&###; encoding into an actual character.
- An encoding flaw when written to disk (currently I'm using the default,
UTF-8 I guess).
- A flaw in the FTP client class where the file is being corrupted during
upload (I think I'm using binary upload format but perhaps I should
double-check).
- A flaw in IIS (no known strange settings exist)
I still need to do some homework on this but I was wondering if anyone has
any bright ideas before I continue searching this out?
Thanks,
Jon 3 5847
"Jon Davis" <jo*@REMOVE.ME.PLEASE.jondavis.net> wrote in message
news:Ok**************@tk2msftngp13.phx.gbl... - A "design feature" in the XmlNode's InnerText property that converts the &###; encoding into an actual character. - An encoding flaw when written to disk (currently I'm using the default, UTF-8 I guess). - A flaw in the FTP client class where the file is being corrupted during upload (I think I'm using binary upload format but perhaps I should double-check). - A flaw in IIS (no known strange settings exist)
For starters I'd rule out the last two options - I think it's almost got to
be in character encoding or the way you're writing it to disk.
As you notice, if the source code text in your DHTML component is stored in
a different encoding to the format you're using to write to disk, then
you'll lose information, or it will be written incorrectly. Most encodings
store ascii characters upto 128 the same, so errors only become obvious
after 128.
I'd be interested to find out what encoding the DHTML control is using to
store its source code. UCS-2 is, as far as I'm aware, the standard windows
encoding, so you might want to try writing out to disk using this encoding
rather than UTF-8. The streamwriters let you set the encoding before
writing. Hopefully you'll not get any loss of information, which is what is
happening when you try to write UCS-2 as UTF-8! Just a guess, but worth a
try!?
HTH
Tobin
Thanks, Jon
Thanks Tobin. I'll check out UCS-2, et al.
Jon
"Tobin Harris" <to********************@breathemail.net> wrote in message
news:bo*************@ID-135366.news.uni-berlin.de... "Jon Davis" <jo*@REMOVE.ME.PLEASE.jondavis.net> wrote in message news:Ok**************@tk2msftngp13.phx.gbl... - A "design feature" in the XmlNode's InnerText property that converts
the &###; encoding into an actual character. - An encoding flaw when written to disk (currently I'm using the
default, UTF-8 I guess). - A flaw in the FTP client class where the file is being corrupted
during upload (I think I'm using binary upload format but perhaps I should double-check). - A flaw in IIS (no known strange settings exist) For starters I'd rule out the last two options - I think it's almost got
to be in character encoding or the way you're writing it to disk.
As you notice, if the source code text in your DHTML component is stored
in a different encoding to the format you're using to write to disk, then you'll lose information, or it will be written incorrectly. Most encodings store ascii characters upto 128 the same, so errors only become obvious after 128.
I'd be interested to find out what encoding the DHTML control is using to store its source code. UCS-2 is, as far as I'm aware, the standard windows encoding, so you might want to try writing out to disk using this encoding rather than UTF-8. The streamwriters let you set the encoding before writing. Hopefully you'll not get any loss of information, which is what
is happening when you try to write UCS-2 as UTF-8! Just a guess, but worth a try!?
HTH
Tobin
Thanks, Jon
Hopefully you'll not get any loss of information, which is what is happening when you try to write UCS-2 as UTF-8!
Unlikely. UCS2 or UTF8 are two different representations of the same
character set (Unicode). There is no loss of info when you convert from one
to the other (if the conversion is correctly done). - A flaw in the FTP client class where the file is being corrupted during upload (I think I'm using binary upload format but perhaps I should double-check).
Unlikely. Even if the binary is not set, the only damaged characters will be
the control characters (below 0x20).
- An encoding flaw when written to disk (currently I'm using the default, UTF-8 I guess).
Most probable. As a test, add this to the html file, first one in the
<head> section:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
Without it the browser will assume the default is iso-8859-1.
Mihai This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Dylan |
last post by:
Here's what I'm trying to do:
- scrape some html content from various sources
The issue I'm running to:
- some of the sources have incorrectly encoded characters... for
example, cp1252...
|
by: David Komanek |
last post by:
Hi all,
I have a question if it is possible to manipulate the settings of
character encoding in Ms Internet Explorer 5.0, 5.5 and 6.0. The
problem is that the default instalation of Ms IE seems...
|
by: Albretch |
last post by:
Hi HTML gurus,
I understand that you would use HTML character entities for ä and €
but why on earth would anyone encode:
a colon: ":",
a semicolon ";",
or
a gramatical period...
|
by: chandy |
last post by:
Hi,
I have an Html document that declares that it uses the utf-8 character
set. As this document is editable via a web interface I need to make
sure than high-ascii characters that may be...
|
by: John Dalberg |
last post by:
The below html validates correctly on w3.org's html validator when the file
has an html extension. When the same file gets an aspx extension, I get the
error below from the validator. This tells me...
|
by: Zhiv Kurilka |
last post by:
Hi,
I have a text file with following content:
"((^)|(.* +))§§§§§§§§"
if I read it with:
k=System.IO.StreamReader( "file.txt",System.Text.Encoding.ASCII);
k.readtotheend()
|
by: stup |
last post by:
Hi!
I have a small javascript snippet that does the following:
// an entire html document is in here
data = "\u003c!DOCTYPE html PUBLIC \u0022-//W3C//DTD XHTML 1.1 Strict//EN\u0022\n ....";...
|
by: GGnOrE |
last post by:
Hey,
When I am writing an HTML Document, how do i know what character encoding I am using. Is Times New Roman have a specific character encoding or can it be found on my host server?
What do you...
|
by: dineshchothe |
last post by:
Hello,
I want to read text from text area of jsp page and write its contents to a text file which is at server side.While doing this contents are get written into the file at server side...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |