Connecting Tech Pros Worldwide Help | Site Map

How to Parse Mixed Content

Iain
Guest
 
Posts: n/a
#1: Apr 3 '07
I've spent a while researching this and my analysis to date indicates it
can't easily be done in dotnet.

what I want to do is to take xml like this
<stuff>
<node id="1"/>Now<node id="2"/<node id="7"/>is<node id="14"/<node
id="15"/>the<node id="19"/<node id="20"/>winter<node id="21"/>
</stuff>

and extract a list with node identifiers and text (which can be whitespace
as in this example)

This seems impossible with the XMLSerializer (which is a shame as this is
embedded in a somewhat complex xml file!).

It *may* be possible with XMLReader, but I'm not too clear on how.

Any help would be much appreciated!

Iain
Martin Honnen
Guest
 
Posts: n/a
#2: Apr 3 '07

re: How to Parse Mixed Content


Iain wrote:
Quote:
what I want to do is to take xml like this
<stuff>
<node id="1"/>Now<node id="2"/<node id="7"/>is<node id="14"/<node
id="15"/>the<node id="19"/<node id="20"/>winter<node id="21"/>
</stuff>
>
and extract a list with node identifiers and text (which can be whitespace
as in this example)
Here is a .NET 2.0 XmlReader example:

using (XmlReader xmlReader = XmlReader.Create(@"file.xml")) {
while (xmlReader.Read()) {
if (xmlReader.NodeType == XmlNodeType.Element &&
xmlReader.Name == "stuff") {
while (xmlReader.Read() && xmlReader.NodeType !=
XmlNodeType.EndElement && xmlReader.Name != "stuff") {
switch (xmlReader.NodeType) {
case XmlNodeType.Element:
Console.WriteLine("Found element {0} with id: {1}.",
xmlReader.Name, xmlReader.GetAttribute("id"));
break;
case XmlNodeType.Text:
Console.WriteLine("Found text node with contents
\"{0}\"", xmlReader.Value);
break;
case XmlNodeType.Whitespace:
Console.WriteLine("Found white space \"{0}\"",
xmlReader.Value);
break;
}
}
}
}
}

Output for that sample is

Found white space "
"
Found element node with id: 1.
Found text node with contents "Now"
Found element node with id: 2.
Found white space " "
Found element node with id: 7.
Found text node with contents "is"
Found element node with id: 14.
Found white space " "
Found element node with id: 15.
Found text node with contents "the"
Found element node with id: 19.
Found white space " "
Found element node with id: 20.
Found text node with contents "winter"
Found element node with id: 21.
Found white space "
"

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Iain
Guest
 
Posts: n/a
#3: Apr 3 '07

re: How to Parse Mixed Content


On Tue, 03 Apr 2007 16:44:51 +0200, Martin Honnen wrote:
Quote:
>
Here is a .NET 2.0 XmlReader example:
Wow.

Thanks!

Do you happen to know if I can zip this into an xml tree otherwise
deserialized by XmlSerializer?

Iain
Martin Honnen
Guest
 
Posts: n/a
#4: Apr 4 '07

re: How to Parse Mixed Content


Iain wrote:
Quote:
Do you happen to know if I can zip this into an xml tree otherwise
deserialized by XmlSerializer?
I am not sure what you want to achieve but I don't think there is an
easy way, you would need to implement a custom XmlReader.


--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Iain
Guest
 
Posts: n/a
#5: Apr 4 '07

re: How to Parse Mixed Content


On Wed, 04 Apr 2007 13:42:17 +0200, Martin Honnen wrote:
Quote:
Iain wrote:
>
Quote:
>Do you happen to know if I can zip this into an xml tree otherwise
>deserialized by XmlSerializer?
>
I am not sure what you want to achieve but I don't think there is an
easy way, you would need to implement a custom XmlReader.
Thanks Martin.

What I wanted to do was to use XmlSerializer to serialise and deserialise
everything except the mixed content section. Which I would do with
XmlReader having overridden the XmlSerializer for that particular class in
some way. There appeared no obvious way of doing this so I used brute
force!

I've ended up coding the whole structure up (the sample plus a half dozen
other entities) with XmlReader which is a pain. If I could have done it
with XmlSerializer it would have taken 10 mins not 2 hours.

Now I'm struggling to cope with whitespace.

Nonetheless it's all working (more or less).

Thanks again...

Iain
Closed Thread