By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,280 Members | 2,244 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,280 IT Pros & Developers. It's quick & easy.

End-of-line Handling in the XmlReader

P: n/a
According to the XML 1.0 (Third Edition) W3C Recommendation
(http://www.w3.org/TR/2004/REC-xml-20...#sec-line-ends) all #xD, #xA,
and #xD#xA character combinations should be converted to a single #xA
character.

According to the "Reading XML with the XmlReader" section of the ".NET
Framework Developer's Guide" on-line help, the XmlReader will not perform
this normalization by default. You can cause the XmlReader to perform this
normalization by setting the Normalization property to true. This does not
appear to be the case in every situation.

Sample XML File:

<?xml version="1.0"?>

<test>

<input>12345</input>

<input>12

3</input>

<input>12

34</input>

<input>12&#xD;&#xA;3</input>

<input>12&#xD;&#xA;34</input>

<input>12&#xD;3</input>

<input>12&#xD;34</input>

<input>12&#xA;3</input>

<input>12&#xA;34</input>

</test>

Sample XSD Schema File:

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:element name="test">

<xsd:complexType>

<xsd:choice minOccurs="0"
maxOccurs="unbounded">

<xsd:element name="input">

<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="5"/>
</xsd:restriction>
</xsd:simpleType>

</xsd:element>

</xsd:choice>

</xsd:complexType>

</xsd:element>

</xsd:schema>

If the XML File above is loaded using a XmlReader and a XmlValidatingReader
object with the XmlReader.Normalization property to false, the following two
errors are generated:

Error 1:

The 'input' element has an invalid value according to its data type. An
error occurred at file: Test Case.xml, (7, 5).

<input>12

34</input>

^

Error 2:

The 'input' element has an invalid value according to its data type. An
error occurred at file: Test Case.xml, (9, 25).

<input>12&#xD;&#xA;34</input>

^

These errors are expected since the input file was not normalized and the
<input> element can only be 5 characters long. One would assume that setting
the XmlReader.Normalization property to true would eliminate these two
errors, however that is not the case. The following error still exists even
with the XmlReader.Normalization property set to true:

Error 1:

The 'input' element has an invalid value according to its data type. An
error occurred at file: Test Case.xml, (9, 25).

<input>12&#xD;&#xA;34</input>

^

It appears as if the XmlReader does not perform normalization if the CR-LF
appears as a &#xD;&#xA;. Am I misinterpreting the XML specification or is
the XmlReader not handling this case properly?

----------------------------------------------------------------------------
------

Excerpt from the XML 1.0 (Third Edition) W3C Recommendation
(http://www.w3.org/TR/2004/REC-xml-20...ec-line-ends):

2.11 End-of-Line Handling

XML parsed entities are often stored in computer files which, for editing
convenience, are organized into lines. These lines are typically separated
by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED
(#xA).

To simplify the tasks of applications, the XML processor MUST behave as if
it normalized all line breaks in external parsed entities (including the
document entity) on input, before parsing, by translating both the two-chara
cter sequence #xD #xA and any #xD that is not followed by #xA to a single
#xA character.


Nov 12 '05 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.