468,119 Members | 1,902 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,119 developers. It's quick & easy.

XmlTextReader, parsing, space as data

I just ran across this.
#1 <DBColumn> 1 </DBColumn>
#2 <DBColumn> </DBColumn>
The data for #1 will be parsed and returned as " 1 ". I get a sequence of
Element/Text/EndElement.
The data for #2 will not be returned. I get a sequence of
Element/Whitespace/EndElement.

Why is the data (which happens to be spaces) between my start and end tags
being misinterpreted?
TIA

Nov 12 '05 #1
11 1691
Hi Kenneth,

First of all, I would like to confirm my understanding of your issue. From
your description, I understand that you need to preserve whitespace in your
Xml document. If there is any misunderstanding, please feel free to let me
know.

The XmlTextReader only preserves white space that occurs within an
xml:space="preserve" context. So you need to add it to the parent node as
an attibute. If you're using an XmlDocument, you can simply set
PreserveWhitespace property to true before calling Load or LoadXml method.
You can check the following link for more information.

http://msdn.microsoft.com/library/de...us/cpguide/htm
l/cpconHandlingWhiteSpaceWithXmlTextReader.asp

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #2
Kevin,

I only want the space (" ") within a data element, not between elements.
When my data has a non-space character in it, the space data is retrieved.
If I just have one or more spaces it gets written to XML but not pulled out.

#1 <DBColumn> 1 </DBColumn> <--- spaces retrieved
I get a sequence of Element/Text/EndElement.

#2 <DBColumn> </DBColumn> <--- just spaces, no data
retrieved
I get a sequence of Element/Whitespace/EndElement.
"Kevin Yu [MSFT]" wrote:
Hi Kenneth,

First of all, I would like to confirm my understanding of your issue. From
your description, I understand that you need to preserve whitespace in your
Xml document. If there is any misunderstanding, please feel free to let me
know.

The XmlTextReader only preserves white space that occurs within an
xml:space="preserve" context. So you need to add it to the parent node as
an attibute. If you're using an XmlDocument, you can simply set
PreserveWhitespace property to true before calling Load or LoadXml method.
You can check the following link for more information.

http://msdn.microsoft.com/library/de...us/cpguide/htm
l/cpconHandlingWhiteSpaceWithXmlTextReader.asp

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #3
Hi Ken,

I'm afraid this is by design for the XmlTextReader. If we don't have any
context within an element the whitespaces are abandoned if we don't have
xml:space="preserve" attribute set in the element. Please check the MSDN
document I posted in my last post for more information. Thanks!

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #4
ke*****@nospam.nospam wrote:
I just ran across this.
#1 <DBColumn> 1 </DBColumn>
#2 <DBColumn> </DBColumn>
The data for #1 will be parsed and returned as " 1 ". I get a sequence of
Element/Text/EndElement.
The data for #2 will not be returned. I get a sequence of
Element/Whitespace/EndElement.

Why is the data (which happens to be spaces) between my start and end tags
being misinterpreted?


Because that gets treated as insignificant whitespace.
Set Whitespacehandling property of an XmlTextReader to
WhitespaceHandling.All to avoid this behaviour.

--
Oleg Tkachenko [XML MVP, MCAD]
http://blog.tkachenko.com
Nov 12 '05 #5
That property was set to WhitespaceHandling.All

"Oleg Tkachenko [MVP]" wrote:
ke*****@nospam.nospam wrote:
I just ran across this.
#1 <DBColumn> 1 </DBColumn>
#2 <DBColumn> </DBColumn>
The data for #1 will be parsed and returned as " 1 ". I get a sequence of
Element/Text/EndElement.
The data for #2 will not be returned. I get a sequence of
Element/Whitespace/EndElement.

Why is the data (which happens to be spaces) between my start and end tags
being misinterpreted?


Because that gets treated as insignificant whitespace.
Set Whitespacehandling property of an XmlTextReader to
WhitespaceHandling.All to avoid this behaviour.

--
Oleg Tkachenko [XML MVP, MCAD]
http://blog.tkachenko.com

Nov 12 '05 #6
hi,

Looking at the referenced document:
<test>•
••••<item>•
••••••••<item xml:space="preserve">º
ºººººººººººº<item/>º
ºººººººº</item>•
••••</item>•
••••<book>º
ºººººººº<b>This<b>º
ºººººººº<i>is</i>º
ºººººººº<b>a test</b>º
ºººº</book>•
</test>•
The white space shown as (•) is insignificant white space. The white
space shown as (º) is significant white space.
Note The scope of the xml:space attribute changes what would normally
be considered insignificant white space to be significant white space.
Notice that <b>a test</b> is not shown to be affected by the space=preserve
command.

I did try adding this to my xml (two ways), but it had no effect on the
xmltextreader.
<DBRow xml:space="preserve">
<DBColumn />
<DBColumn xml:space="preserve"> </DBColumn>
</DBRow>
<DBRow xml:space="preserve">
<DBColumn />
<DBColumn> </DBColumn>
</DBRow>
Let try this from a different direction:
- I am not using validating readers, xsl, dom, etc.
- I want to output text data and read it back in, and the data can contain
spaces or even conceivably be all spaces
- How do I make sure a space out becomes a space in?
- Note: it seems that having nonspace characters causes space characters to
be read in. Is this always true?

Thanks
"Kevin Yu [MSFT]" wrote:
Hi Ken,

I'm afraid this is by design for the XmlTextReader. If we don't have any
context within an element the whitespaces are abandoned if we don't have
xml:space="preserve" attribute set in the element. Please check the MSDN
document I posted in my last post for more information. Thanks!

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #7
Hi Kenneth,

I tried using XmlTextReader to parse your Xml document. I set the reader's
WhitespaceHandling property to WhitespaceHandling.All and all the white
spaces are preserved. If that doesn't work for you, could you please post a
code snippet and a part of the Xml document here, so that I can reproduce
it? Thanks!

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #8
In what manner are they preserved for you?
I am looking to get a XmlNodeType.Text returned with the elements text
data. Is that what you get? or are you getting whitespace returned?
Here are two data snippet examples I have tried:
<DataBase>
<Index>1</Index>
<Enabled>True</Enabled>
<Series>2</Series>
<DBValues>
<dummy />
<DBRow xml:space="preserve">
<DBColumn />
<DBColumn>1</DBColumn>
</DBRow>
<DBRow xml:space="preserve">
<DBColumn />
<DBColumn>1</DBColumn>
</DBRow>
<DBRow xml:space="preserve">
<DBColumn />
<DBColumn xml:space="preserve"> </DBColumn>
</DBRow>
<DBRow xml:space="preserve">
<DBColumn />
<DBColumn />
</DBRow>
</DBValues>
</DataBase>


<DataBase>
<Index>1</Index>
<Enabled>True</Enabled>
<Series>2</Series>
<DBValues>
<dummy />
<DBRow>
<DBColumn />
<DBColumn>1.1</DBColumn>
<DBColumn>-2</DBColumn>
<DBColumn>3</DBColumn>
<DBColumn>rrr</DBColumn>
</DBRow>
<DBRow>
<DBColumn />
<DBColumn>-1.1</DBColumn>
<DBColumn>2.0</DBColumn>
<DBColumn>-3</DBColumn>
<DBColumn>sss</DBColumn>
</DBRow>
<DBRow>
<DBColumn />
<DBColumn>10</DBColumn>
<DBColumn> </DBColumn>
<DBColumn>1</DBColumn>
<DBColumn>ttt</DBColumn>
</DBRow>
<DBRow>
<DBColumn />
<DBColumn>one</DBColumn>
<DBColumn>two</DBColumn>
<DBColumn>three</DBColumn>
<DBColumn>vvv</DBColumn>
</DBRow>
<DBRow>
<DBColumn />
<DBColumn>11</DBColumn>
<DBColumn> </DBColumn>
<DBColumn />
<DBColumn />
</DBRow>
</DBValues>
</DataBase>

//here is a sample code snippet,
//all the storing of data and error checking is gone,
//as well as getting data from a string instead of a file
using System;
using System.Collections;
using System.Xml;
namespace ConsoleApplication1
{

public class MyClass
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main(string[] args)
{
GetData();
RL();
}

private static void RL()
{
Console.ReadLine();
}
private static void GetData()
{
string xmlFrag = " <DataBase>\n"+
" <Index>1</Index>\n"+
" <Enabled>True</Enabled>\n"+
" <Series>2</Series>\n"+
" <DBValues>\n"+
" <dummy />\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>1.1</DBColumn>\n"+
" <DBColumn> </DBColumn>\n"+
" <DBColumn>3</DBColumn>\n"+
" <DBColumn>rrr</DBColumn>\n"+
" </DBRow>\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>-1.1</DBColumn>\n"+
" <DBColumn>2.0</DBColumn>\n"+
" <DBColumn>-3</DBColumn>\n"+
" <DBColumn>sss</DBColumn>\n"+
" </DBRow>\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>10</DBColumn>\n"+
" <DBColumn> </DBColumn>\n"+
" <DBColumn>1</DBColumn>\n"+
" <DBColumn>ttt</DBColumn>\n"+
" </DBRow>\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>one</DBColumn>\n"+
" <DBColumn>two</DBColumn>\n"+
" <DBColumn>three</DBColumn>\n"+
" <DBColumn>vvv</DBColumn>\n"+
" </DBRow>\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>11</DBColumn>\n"+
" <DBColumn> </DBColumn>\n"+
" <DBColumn />\n"+
" <DBColumn />\n"+
" </DBRow>\n"+
" </DBValues>\n"+
" </DataBase>\n"
;

//Create the XmlNamespaceManager.
NameTable nt = new NameTable();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);
nsmgr.AddNamespace("bk", "urn:sample");

//Create the XmlParserContext.
XmlParserContext context = new XmlParserContext(null, nsmgr,
null, XmlSpace.None);

System.Xml.XmlTextReader xr = new
System.Xml.XmlTextReader(xmlFrag, XmlNodeType.Element, context);
xr.WhitespaceHandling = WhitespaceHandling.All;
try
{
while (xr.Read())
{
switch(xr.NodeType)
{
case XmlNodeType.Element:
{
switch(xr.Name)
{
case "DBValues":
_ParseDBValues(xr,1);
break;
}
}
break;
}
}
}
catch
{
}
}

public static void _ParseDBValues(System.Xml.XmlTextReader xr, int
_nDBInst)
{
try
{
while (xr.Read())
{
switch(xr.NodeType)
{
case XmlNodeType.Element:
{
switch(xr.Name)
{
case "DBRow":
Console.WriteLine("ROW");

_ParseDBColObject(xr, _nDBInst);
break;
}
}
break;

case XmlNodeType.Text:
break;

case XmlNodeType.EndElement:
if(xr.Name.Equals("DBValues"))
{
return;
}
break;
default:
break;
}
}
}
catch
{
throw new Exception("Unexpected element in DBValues...");
}
}

public static void _ParseDBColObject(System.Xml.XmlTextReader xr,
int _nDBInst)
{
// Get data values & range check

string element = "";
int _nColIndex = -1;
// Parse input stream
try
{
while (xr.Read())
{
switch(xr.NodeType)
{
case XmlNodeType.Element:
{
element = xr.Name;
if(element.Equals("DBColumn"))
{
_nColIndex++; // track which column we are on
}
}
break;

case XmlNodeType.Text:
{
switch(element)
{
case "DBColumn":
Console.WriteLine("DBColumn = \"" +
xr.Value + "\"");
break;
}
}
break;

case XmlNodeType.EndElement:
if(xr.Name.Equals("DBRow"))
{
Console.WriteLine("EndRow");
return ;
}
break;
default:
break;
}
}
}
catch
{
throw new Exception("Unexpected element in DBValues...");
}
return ;
}
}
}
"Kevin Yu [MSFT]" wrote:
Hi Kenneth,

I tried using XmlTextReader to parse your Xml document. I set the reader's
WhitespaceHandling property to WhitespaceHandling.All and all the white
spaces are preserved. If that doesn't work for you, could you please post a
code snippet and a part of the Xml document here, so that I can reproduce
it? Thanks!

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #9
When you say the whitespace is preserved, how exactly do you mean that?
Is the data coming in as XmlNodeType.Text or as whitespace?
// the data was written with code like the following
xw.WriteStartElement("DBValues");
xw.WriteStartElement("dummy");
xw.WriteEndElement();
xw.WriteStartElement("DBRow");
string _strColVal = something;
xw.WriteElementString("DBColumn", _strColVal);
xw.WriteEndElement();
xw.WriteEndElement();
xw.WriteEndElement();
// sample code snippet,
// all the data storing and error checking is gone
// and we are getting data from a string instead of a file
using System;
using System.Collections;
using System.Xml;
namespace ConsoleApplication1
{

public class MyClass
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main(string[] args)
{
GetData();
RL();
}

private static void RL()
{
Console.ReadLine();
}
private static void GetData()
{
string xmlFrag = " <DataBase>\n"+
" <Index>1</Index>\n"+
" <Enabled>True</Enabled>\n"+
" <Series>2</Series>\n"+
" <DBValues>\n"+
" <dummy />\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>1.1</DBColumn>\n"+
" <DBColumn> </DBColumn>\n"+
" <DBColumn>3</DBColumn>\n"+
" <DBColumn>rrr</DBColumn>\n"+
" </DBRow>\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>-1.1</DBColumn>\n"+
" <DBColumn>2.0</DBColumn>\n"+
" <DBColumn>-3</DBColumn>\n"+
" <DBColumn>sss</DBColumn>\n"+
" </DBRow>\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>10</DBColumn>\n"+
" <DBColumn> </DBColumn>\n"+
" <DBColumn>1</DBColumn>\n"+
" <DBColumn>ttt</DBColumn>\n"+
" </DBRow>\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>one</DBColumn>\n"+
" <DBColumn>two</DBColumn>\n"+
" <DBColumn>three</DBColumn>\n"+
" <DBColumn>vvv</DBColumn>\n"+
" </DBRow>\n"+
" <DBRow>\n"+
" <DBColumn />\n"+
" <DBColumn>11</DBColumn>\n"+
" <DBColumn> </DBColumn>\n"+
" <DBColumn />\n"+
" <DBColumn />\n"+
" </DBRow>\n"+
" </DBValues>\n"+
" </DataBase>\n"
;

//Create the XmlNamespaceManager.
NameTable nt = new NameTable();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);
nsmgr.AddNamespace("bk", "urn:sample");
//Create the XmlParserContext.
XmlParserContext context = new XmlParserContext(null, nsmgr,
null, XmlSpace.None);

System.Xml.XmlTextReader xr = new
System.Xml.XmlTextReader(xmlFrag, XmlNodeType.Element, context);
xr.WhitespaceHandling = WhitespaceHandling.All;
try
{
while (xr.Read())
{
switch(xr.NodeType)
{
case XmlNodeType.Element:
{
switch(xr.Name)
{
case "DBValues":
_ParseDBValues(xr,1);
break;
}
}
break;
}
}
}
catch
{
}
}

public static void _ParseDBValues(System.Xml.XmlTextReader xr, int
_nDBInst)
{
try
{
while (xr.Read())
{
switch(xr.NodeType)
{
case XmlNodeType.Element:
{
switch(xr.Name)
{
case "DBRow":
Console.WriteLine("ROW");

_ParseDBColObject(xr, _nDBInst);
break;
}
}
break;

case XmlNodeType.Text:
break;

case XmlNodeType.EndElement:
if(xr.Name.Equals("DBValues"))
{
return;
}
break;
default:
break;
}
}
}
catch
{
throw new Exception("Unexpected element in DBValues...");
}
}

public static void _ParseDBColObject(System.Xml.XmlTextReader xr,
int _nDBInst)
{
// Get data values & range check

string element = "";
int _nColIndex = -1;
// Parse input stream
try
{
while (xr.Read())
{
switch(xr.NodeType)
{
case XmlNodeType.Element:
{
element = xr.Name;
if(element.Equals("DBColumn"))
{
_nColIndex++; // track which column we are on
}
}
break;

case XmlNodeType.Text:
{
switch(element)
{
case "DBColumn":
Console.WriteLine("DBColumn = \"" +
xr.Value + "\"");
break;
}
}
break;

case XmlNodeType.EndElement:
if(xr.Name.Equals("DBRow"))
{
Console.WriteLine("EndRow");
return ;
}
break;
default:
break;
}
}
}
catch
{
throw new Exception("Unexpected element in DBValues...");
}
return ;
}
}
}
"Kevin Yu [MSFT]" wrote:
Hi Kenneth,

I tried using XmlTextReader to parse your Xml document. I set the reader's
WhitespaceHandling property to WhitespaceHandling.All and all the white
spaces are preserved. If that doesn't work for you, could you please post a
code snippet and a part of the Xml document here, so that I can reproduce
it? Thanks!

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #10
Thanks for the further response Kenneth,

We'll have a futher investigation on the code and will update you soon.

Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)

Nov 12 '05 #11
Hi Kenneth,

The NodeType I got is SignificantWhitespace. All the whitespaces between
two nodes known as insignificant whitespaces are noted as Whitespace. We
add the xml:space="preserve" to make sure that the value of the node is
preserved. But the NodeType is not Text. You can just look for the
SignificantWhitespace for the blank node values and ignore the Whitespace
nodes. Some SignificantWhitespace you see might belong to the parent node.

This is by design.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Hang Cheng | last post: by
3 posts views Thread by Raghu | last post: by
4 posts views Thread by Andy Neilson | last post: by
5 posts views Thread by Geoff Bennett | last post: by
2 posts views Thread by Q | last post: by
1 post views Thread by Alexander Gnauck | last post: by
17 posts views Thread by Slonocode | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.