473,383 Members | 1,795 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Invalid characters before xml header

Hello,
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);
It adds some invalid characters before the header itself, only viewable with
a text editor (IE opens the XML ok). This causes some perl code I got, which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>

How can I fix this ?
Thanks.
Nadav.
Mar 7 '06 #1
7 7146
Nadav <na****@gmail.com> wrote:
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);


It adds some invalid characters before the header itself, only viewable with
a text editor (IE opens the XML ok). This causes some perl code I got, which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>


They're not invalid characters. That's the byte order mark. It's
perfectly valid for it to be there - it sounds like the Perl code is
broken. You may well not be able to fix that though, so I guess we need
to sort out how to suppress the BOM from the written file.

You haven't shown how you're writing out the document - could you do
so?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 7 '06 #2
Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the decleration.

The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't. The
reason are probably these chars which don't exist in the Java XML.

Thanks, Nadav.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

Nadav <na****@gmail.com> wrote:
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);
It adds some invalid characters before the header itself, only viewable with a text editor (IE opens the XML ok). This causes some perl code I got, which reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>


They're not invalid characters. That's the byte order mark. It's
perfectly valid for it to be there - it sounds like the Perl code is
broken. You may well not be able to fix that though, so I guess we need
to sort out how to suppress the BOM from the written file.

You haven't shown how you're writing out the document - could you do
so?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 8 '06 #3
I seem to hav emailed you insteda of posting before but here it is:

This is the byte order mark (BOM)and it confused me too at first.

You might think that you've already specified the encoding as "UTF-8" but if
you think about it the reader needs to know the encodding to read the string
"UTF-8" hence the BOM which is a 16 a magic 16 bit unicode value usually put
at the start of the file.

Off the top of my head you have an interaction between XmlTextWriter, the
stream you are writing to and the encoding for that stream.
It IS all documented (just not very clearly) and you definitely can suppress
the BOM (at least for utf-8).
I think you will find that there is a parameter to the encoding constructor
that specifies whether to use the BOM.
Just to confuse things I seem to remember that Encoding.UTF8 and new
UTF8Encoding() are different.

I seem to remember that when I had this problem it was because I was writing
to a MemoryStream which defaulted to Unicode whereas I think file streams
default to UTF-8 for compatibility reasons.

Be careful - it is totally possible to have the xml say "UTF-8" and the BOM
say something else - this will cause a self explanatory error when you try
to load the document.

P.S. Notepad can read and write UTF-8 and unicode big or little endian

see also
http://www.unicode.org/faq/utf_bom.html
http://en.wikipedia.org/wiki/Byte_Order_Mark

"Nadav" <na****@gmail.com> wrote in message
news:%2****************@tk2msftngp13.phx.gbl...
Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the
decleration.

The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when
I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't.
The
reason are probably these chars which don't exist in the Java XML.

Thanks, Nadav.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

Nadav <na****@gmail.com> wrote:
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);
It adds some invalid characters before the header itself, only viewable

with
a text editor (IE opens the XML ok). This causes some perl code I got,

which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>


They're not invalid characters. That's the byte order mark. It's
perfectly valid for it to be there - it sounds like the Perl code is
broken. You may well not be able to fix that though, so I guess we need
to sort out how to suppress the BOM from the written file.

You haven't shown how you're writing out the document - could you do
so?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 8 '06 #4
Nadav wrote:
Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the decleration.
But none of that is what actually writes out the document - what saves
it to disk. That's what I'm interested in, as that's when the BOM is
created.
The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't. The
reason are probably these chars which don't exist in the Java XML.


Well, they probably do depending on how you tell Java to write it out
(and which XML library you use - there are loads of Java XML
libraries).

Jon

Mar 8 '06 #5
Oh, Sorry bout that :)
it's :

doc.Save(fileDialog.FileName);

Thanks...

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:11**********************@i40g2000cwc.googlegr oups.com...
Nadav wrote:
Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the decleration.

But none of that is what actually writes out the document - what saves
it to disk. That's what I'm interested in, as that's when the BOM is
created.
The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and

when I checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't. The reason are probably these chars which don't exist in the Java XML.


Well, they probably do depending on how you tell Java to write it out
(and which XML library you use - there are loads of Java XML
libraries).

Jon

Mar 8 '06 #6
Nadav wrote:
Oh, Sorry bout that :)
it's :

doc.Save(fileDialog.FileName);


Okay. The simplest way round it is to create a StreamWriter using an
encoding (matching the one the document uses) which doesn't use a BOM
(you can create an instance of UTF8Encoding which doesn't use a BOM):

Encoding enc = new Utf8Encoding (
using (StreamWriter writer = new StreamWriter (fileDialog.FileName,
false, new UTF8Encoding(false))
{
doc.Save(writer);
}

That should work...
Jon

Mar 8 '06 #7
Thanks alot ! works great !

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:11**********************@z34g2000cwc.googlegr oups.com...
Nadav wrote:
Oh, Sorry bout that :)
it's :

doc.Save(fileDialog.FileName);


Okay. The simplest way round it is to create a StreamWriter using an
encoding (matching the one the document uses) which doesn't use a BOM
(you can create an instance of UTF8Encoding which doesn't use a BOM):

Encoding enc = new Utf8Encoding (
using (StreamWriter writer = new StreamWriter (fileDialog.FileName,
false, new UTF8Encoding(false))
{
doc.Save(writer);
}

That should work...
Jon

Mar 9 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: Pikkel | last post by:
i'm looking for a way to replace special characters with characters without accents, cedilles, etc.
1
by: Eileene Cordoves | last post by:
hi i'm a newbie in xml and we're using org.apache.xerces.parsers.SAXParser. anyone know what the invalid characters in xml are? one of the value in the parsed xml is '<space><space>1', we...
9
by: Safalra | last post by:
The idea here is relatively simple: a java program (I'm using JDK1.4 if that makes a difference) that loads an HTML file, removes invalid characters (or replaces them in the case of common ones...
30
by: Tim Johansson | last post by:
I'm new to C++, and tried to start making a script that will shuffle an array. Can someone please tell me what's wrong? #include <iostream.h> #include <string.h> int main () {...
1
by: Chief | last post by:
I am unable to load an xml document that contains Chinese characters in an attribute value. I need to load the document into and XmlDocument object and am using the XmlDocument.Load(string...
0
by: Ben Holness | last post by:
Hi all, I have a system which allows users to enter a message on a (PHP) website. This message is then put into a (MySQL) Database. A perl script then picks up the message and creates an XML...
5
by: Doc | last post by:
Hello! I'm experiencing a little problem counting the number of characters in a textarea on a html page. This is the content type of my HTML document content="text/html; charset=iso-8859-1" ...
25
by: Wim Cossement | last post by:
Hello, I was wondering if there are a few good pages and/or examples on how to process form data correctly for putting it in a MySQL DB. Since I'm not used to using PHP a lot, I already found...
4
by: jvictor118 | last post by:
I've been using the xml.sax.handler module to do event-driven parsing of XML files in this python application I'm working on. However, I keep having really pesky invalid token exceptions....
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.