Invalid characters before xml header

Nadav

Hello,
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);
It adds some invalid characters before the header itself, only viewable with
a text editor (IE opens the XML ok). This causes some perl code I got, which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>

How can I fix this ?
Thanks.
Nadav.

Mar 7 '06 #1

Subscribe Post Reply

7146

Jon Skeet [C# MVP]

Nadav <na****@gmail.com> wrote:

When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);

It adds some invalid characters before the header itself, only viewable with
a text editor (IE opens the XML ok). This causes some perl code I got, which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>

They're not invalid characters. That's the byte order mark. It's
perfectly valid for it to be there - it sounds like the Perl code is
broken. You may well not be able to fix that though, so I guess we need
to sort out how to suppress the BOM from the written file.

You haven't shown how you're writing out the document - could you do
so?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 7 '06 #2

Nadav

Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the decleration.

The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't. The
reason are probably these chars which don't exist in the Java XML.

Thanks, Nadav.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

Nadav <na****@gmail.com> wrote:

When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);
It adds some invalid characters before the header itself, only viewable with a text editor (IE opens the XML ok). This causes some perl code I got, which reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>

Mar 8 '06 #3

Nick Hounsome

I seem to hav emailed you insteda of posting before but here it is:

This is the byte order mark (BOM)and it confused me too at first.

You might think that you've already specified the encoding as "UTF-8" but if
you think about it the reader needs to know the encodding to read the string
"UTF-8" hence the BOM which is a 16 a magic 16 bit unicode value usually put
at the start of the file.

Off the top of my head you have an interaction between XmlTextWriter, the
stream you are writing to and the encoding for that stream.
It IS all documented (just not very clearly) and you definitely can suppress
the BOM (at least for utf-8).
I think you will find that there is a parameter to the encoding constructor
that specifies whether to use the BOM.
Just to confuse things I seem to remember that Encoding.UTF8 and new
UTF8Encoding() are different.

I seem to remember that when I had this problem it was because I was writing
to a MemoryStream which defaulted to Unicode whereas I think file streams
default to UTF-8 for compatibility reasons.

Be careful - it is totally possible to have the xml say "UTF-8" and the BOM
say something else - this will cause a self explanatory error when you try
to load the document.

P.S. Notepad can read and write UTF-8 and unicode big or little endian

see also
http://www.unicode.org/faq/utf_bom.html
http://en.wikipedia.org/wiki/Byte_Order_Mark

"Nadav" <na****@gmail.com> wrote in message
news:%2****************@tk2msftngp13.phx.gbl...

Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the
decleration.

The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when
I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't.
The
reason are probably these chars which don't exist in the Java XML.

Thanks, Nadav.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

Nadav <na****@gmail.com> wrote:
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);
It adds some invalid characters before the header itself, only viewable

with
a text editor (IE opens the XML ok). This causes some perl code I got,

which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>

They're not invalid characters. That's the byte order mark. It's
perfectly valid for it to be there - it sounds like the Perl code is
broken. You may well not be able to fix that though, so I guess we need
to sort out how to suppress the BOM from the written file.

You haven't shown how you're writing out the document - could you do
so?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 8 '06 #4

Jon Skeet [C# MVP]

Nadav wrote:

Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the decleration.
But none of that is what actually writes out the document - what saves
it to disk. That's what I'm interested in, as that's when the BOM is
created.
The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't. The
reason are probably these chars which don't exist in the Java XML.

Well, they probably do depending on how you tell Java to write it out
(and which XML library you use - there are loads of Java XML
libraries).

Jon

Mar 8 '06 #5

Nadav

Oh, Sorry bout that :)
it's :

doc.Save(fileDialog.FileName);

Thanks...

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:11**********************@i40g2000cwc.googlegr oups.com...

Nadav wrote:
Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the decleration.

But none of that is what actually writes out the document - what saves
it to disk. That's what I'm interested in, as that's when the BOM is
created.
The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and

when I checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't. The reason are probably these chars which don't exist in the Java XML.

Well, they probably do depending on how you tell Java to write it out
(and which XML library you use - there are loads of Java XML
libraries).

Jon

Mar 8 '06 #6

Jon Skeet [C# MVP]

Nadav wrote:

Oh, Sorry bout that :)
it's :

doc.Save(fileDialog.FileName);

Okay. The simplest way round it is to create a StreamWriter using an
encoding (matching the one the document uses) which doesn't use a BOM
(you can create an instance of UTF8Encoding which doesn't use a BOM):

Encoding enc = new Utf8Encoding (
using (StreamWriter writer = new StreamWriter (fileDialog.FileName,
false, new UTF8Encoding(false))
{
doc.Save(writer);
}

That should work...
Jon

Mar 8 '06 #7

Nadav

Thanks alot ! works great !

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:11**********************@z34g2000cwc.googlegr oups.com...

Nadav wrote:
Oh, Sorry bout that :)
it's :

doc.Save(fileDialog.FileName);

Okay. The simplest way round it is to create a StreamWriter using an
encoding (matching the one the document uses) which doesn't use a BOM
(you can create an instance of UTF8Encoding which doesn't use a BOM):

Encoding enc = new Utf8Encoding (
using (StreamWriter writer = new StreamWriter (fileDialog.FileName,
false, new UTF8Encoding(false))
{
doc.Save(writer);
}

That should work...
Jon

Mar 9 '06 #8

by: Pikkel | last post by:

i'm looking for a way to replace special characters with characters without accents, cedilles, etc.

PHP

Invalid Char in Text

by: Eileene Cordoves | last post by:

hi i'm a newbie in xml and we're using org.apache.xerces.parsers.SAXParser. anyone know what the invalid characters in xml are? one of the value in the parsed xml is '<space><space>1', we...

.NET Framework

Character encodings and invalid characters

by: Safalra | last post by:

The idea here is relatively simple: a java program (I'm using JDK1.4 if that makes a difference) that loads an HTML file, removes invalid characters (or replaces them in the case of common ones...

HTML / CSS

Invalid conversion from 'char' tp 'char*'

by: Tim Johansson | last post by:

I'm new to C++, and tried to start making a script that will shuffle an array. Can someone please tell me what's wrong? #include <iostream.h> #include <string.h> int main () {...

C / C++

Invalid Characters Error calling XmlDocument.Load() when an XML Attribute value contains Chinese Characters

by: Chief | last post by:

I am unable to load an xml document that contains Chinese characters in an attribute value. I need to load the document into and XmlDocument object and am using the XmlDocument.Load(string...

.NET Framework

Finding and replacing Invalid Tokens in an XML document

by: Ben Holness | last post by:

Hi all, I have a system which allows users to enter a message on a (PHP) website. This message is then put into a (MySQL) Database. A perl script then picks up the message and creates an XML...

Perl

Javascript and special characters

by: Doc | last post by:

Hello! I'm experiencing a little problem counting the number of characters in a textarea on a html page. This is the content type of my HTML document content="text/html; charset=iso-8859-1" ...

Javascript

How to upload form data containing special characters correctly?

by: Wim Cossement | last post by:

Hello, I was wondering if there are a few good pages and/or examples on how to process form data correctly for putting it in a MySQL DB. Since I'm not used to using PHP a lot, I already found...

PHP

Please help!! SAXParseException: not well-formed (invalid token)

by: jvictor118 | last post by:

I've been using the xml.sax.handler module to do event-driven parsing of XML files in this python application I'm working on. However, I keep having really pesky invalid token exceptions....

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Invalid characters before xml header

Similar topics