469,271 Members | 1,743 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,271 developers. It's quick & easy.

Saving XML as UTF-8?

How do I load and save a UTF-8 document in XML in ASP/VBS?
Well, the loading* is not the problem actually -- the file is in UTF-8,
and understood correctly -- but once saved, the UTF-8 is replaced by
what seems to be iso-8859-1 (which Flash doesn't understand, but that's
another problem). Any help greatly appreciated.
* Something like this...
set xDoc = server.createObject("Msxml2.DOMDocument")
xDoc.async = false
xDoc.load sPath
Jul 22 '05 #1
7 4508


Philipp Lenssen wrote:
How do I load and save a UTF-8 document in XML in ASP/VBS?

Well, the loading* is not the problem actually -- the file is in UTF-8,
and understood correctly -- but once saved, the UTF-8 is replaced by
what seems to be iso-8859-1 * Something like this...
set xDoc = server.createObject("Msxml2.DOMDocument")
xDoc.async = false
xDoc.load sPath


I am pretty sure if you then use
xDoc.save Server.MapPath(filename)
later then the encoding is preserved.
Are you by chance saving by writing xDoc.xml with the FileSystemObject?

The MSXML 4 docs say about the save method:

"Character encoding is based on the encoding attribute in the XML
declaration, such as <?xml version="1.0" encoding="windows-1252"?>. When
no encoding attribute is specified, the default setting is UTF-8."

which supports my view that the encoding the document has when being
loaded is preserved when saving.


--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 22 '05 #2
Martin Honnen wrote:
Philipp Lenssen wrote:
How do I load and save a UTF-8 document in XML in ASP/VBS?

I am pretty sure if you then use
xDoc.save Server.MapPath(filename)
later then the encoding is preserved.
Are you by chance saving by writing xDoc.xml with the
FileSystemObject?


Thanks so far Martin, this is my save method:

xDoc.save server.mapPath(sPath)

So no, I'm not using the FSO...
Any idea what's happening?

--
Google Blogoscoped
http://blog.outer-court.com
Jul 22 '05 #3


Philipp Lenssen wrote:

Philipp Lenssen wrote:

How do I load and save a UTF-8 document in XML in ASP/VBS?
this is my save method:

xDoc.save server.mapPath(sPath)


You say the file is saved as iso-8859-1, does MSXML really save it with
that encoding and put a
<?xml version="1.0" encoding="iso-8859-1"?>
in there, or why do you think that MSXML saves as iso-8859-1?

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 22 '05 #4
Martin Honnen wrote:
Philipp Lenssen wrote:

Philipp Lenssen wrote:
> How do I load and save a UTF-8 document in XML in ASP/VBS?
>

this is my save method:

xDoc.save server.mapPath(sPath)


You say the file is saved as iso-8859-1, does MSXML really save it
with that encoding and put a <?xml version="1.0"
encoding="iso-8859-1"?> in there, or why do you think that MSXML
saves as iso-8859-1?


Let me put it this way. I use my own Netpadd editor, which doesn't
support UTF-8. I know because whenever I open UTF-8, I see this "i>?"
as first character. So when I want to open UTF-8, I use Notepad.
The files however that *were* UTF-8 when I put them in this tool which
I'm programming (a simple text translation tool), they are coming out
"fine" for my non-UTF-8 Netpadd once they are saved. So they lost their
"UTF-8ness" without me saying so in ASP!

Thanks so far, and hope you have more hints!
--
Google Blogoscoped
http://blog.outer-court.com
Jul 22 '05 #5
UTF-8 does not by itself add special characters to the start of a file. If
the files are plain XML the first non-whitespace character should be "<".
Unicode files do have 2 special characters at the beginning.

What operating system are you running on when you open files in Notepad? The
version of notepad included with NT, Win2000, and WinXP Pro is capable of
saving files in ANSI, Unicode, or UTF-8

How are you opening the files from the ASP script? If possible show the
simplest *working* code (just read and then write the file) that duplicates
the problem along with a sample XML file.
--
--Mark Schupp
Head of Development
Integrity eLearning
www.ielearning.com

"Philipp Lenssen" <in**@outer-court.com> wrote in message
news:35*************@individual.net...
Martin Honnen wrote:
Philipp Lenssen wrote:

> > Philipp Lenssen wrote:
> >
> >
> > > How do I load and save a UTF-8 document in XML in ASP/VBS?
> > >

> this is my save method:
>
> xDoc.save server.mapPath(sPath)
>


You say the file is saved as iso-8859-1, does MSXML really save it
with that encoding and put a <?xml version="1.0"
encoding="iso-8859-1"?> in there, or why do you think that MSXML
saves as iso-8859-1?


Let me put it this way. I use my own Netpadd editor, which doesn't
support UTF-8. I know because whenever I open UTF-8, I see this "i>?"
as first character. So when I want to open UTF-8, I use Notepad.
The files however that *were* UTF-8 when I put them in this tool which
I'm programming (a simple text translation tool), they are coming out
"fine" for my non-UTF-8 Netpadd once they are saved. So they lost their
"UTF-8ness" without me saying so in ASP!

Thanks so far, and hope you have more hints!
--
Google Blogoscoped
http://blog.outer-court.com

Jul 22 '05 #6


Philipp Lenssen wrote:
Martin Honnen wrote:
You say the file is saved as iso-8859-1, does MSXML really save it
with that encoding and put a <?xml version="1.0"
encoding="iso-8859-1"?> in there, or why do you think that MSXML
saves as iso-8859-1?


Let me put it this way. I use my own Netpadd editor, which doesn't
support UTF-8. I know because whenever I open UTF-8, I see this "i>?"
as first character. So when I want to open UTF-8, I use Notepad.
The files however that *were* UTF-8 when I put them in this tool which
I'm programming (a simple text translation tool), they are coming out
"fine" for my non-UTF-8 Netpadd once they are saved. So they lost their
"UTF-8ness" without me saying so in ASP!


Frankly to use a tool that doesn't understand UTF-8 to check whether a
file is UTF-8 encoded doesn't sound like a reliable way, it might simply
be a byte order mark at the beginning of the file and that mark is
optional in UTF-8.

I don't really how to help on that, I would use an XML parser to check
whether the file is properly encoded, simply loading the file in IE/Win
should do to check that.

If you have the application online then post a URL (or better two, one
to the original, one two the saved XML) then someone here could check
whether it is really UTF-8 or ISO-8859-1 what you get there.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 22 '05 #7
Martin Honnen wrote:
Philipp Lenssen wrote:

If you have the application online then post a URL (or better two,
one to the original, one two the saved XML) then someone here could
check whether it is really UTF-8 or ISO-8859-1 what you get there.


It's already solved, IIRC I posted this here already.

--
Google Blogoscoped
http://blog.outer-court.com
Jul 22 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

19 posts views Thread by Svennglenn | last post: by
2 posts views Thread by Cesar Ronchese | last post: by
reply views Thread by Ersin Gençtürk | last post: by
5 posts views Thread by Neil G Jarman | last post: by
2 posts views Thread by =?Utf-8?B?Um9iZXJ0SGlsbEVEUw==?= | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.