By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,876 Members | 1,206 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,876 IT Pros & Developers. It's quick & easy.

End-of-line Handling in the XmlReader (.NET Framework Version 1.1)

P: n/a
According to the XML 1.0 (Third Edition) W3C Recommendation (http://www.w3.org/TR/2004/REC-xml-20...#sec-line-ends) all #xD, #xA, and #xD#xA character combinations should be converted to a single #xA character.

According to the "Reading XML with the XmlReader" section of the ".NET Framework Developer's Guide" on-line help, the XmlReader will not perform this normalization by default. You can cause the XmlReader to perform this normalization by setting the Normalization property to true. This does not appear to be the case in every situation. The test below was performed using the .NET Framework Version 1.1.

Sample XML File:
<?xml version="1.0"?>
<test>
<input>12345</input>
<input>12
3</input>
<input>12
34</input>
<input>12&#xD;&#xA;3</input>
<input>12&#xD;&#xA;34</input>
<input>12&#xD;3</input>
<input>12&#xD;34</input>
<input>12&#xA;3</input>
<input>12&#xA;34</input>
</test>

Sample XSD Schema File:
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="test">
<xsd:complexType>
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element name="input">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="5"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:choice>
</xsd:complexType>
</xsd:element>
</xsd:schema>

If the XML File above is loaded using a XmlReader and a XmlValidatingReader object with the XmlReader.Normalization property to false, the following two errors are generated:

Error 1:
The 'input' element has an invalid value according to its data type. An error occurred at file: Test Case.xml, (7, 5).
<input>12
34</input>
^

Error 2:
The 'input' element has an invalid value according to its data type. An error occurred at file: Test Case.xml, (9, 25).
<input>12&#xD;&#xA;34</input>
^

These errors are expected since the input file was not normalized and the <input> element can only be 5 characters long. One would assume that setting the XmlReader.Normalization property to true would eliminate these two errors, however that is not the case. The following error still exists even with the XmlReader.Normalization property set to true:

Error 1:
The 'input' element has an invalid value according to its data type. An error occurred at file: Test Case.xml, (9, 25).
<input>12&#xD;&#xA;34</input>
^

It appears as if the XmlReader does not perform normalization if the CR-LF appears as a &#xD;&#xA;. Am I misinterpreting the XML specification or is the XmlReader not handling this case properly?

----------------------------------------------------------------------------
Excerpt from the XML 1.0 (Third Edition) W3C Recommendation (http://www.w3.org/TR/2004/REC-xml-20...ec-line-ends):

2.11 End-of-Line Handling
XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

Nov 12 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
you are wrong.
the spec. says the translation occurs "before parsing".
so, before parsing, &#xD;&#xA; are not line-break sequence.
Nov 12 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.