471,119 Members | 1,205 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,119 software developers and data experts.

XmlDocument problem

Serious problem

I'm using Chris Lovett's SgmlReader class
SgmlReader sr = new SgmlReader();
XmlDocument xdoc = new XmlDocument();
sr.DocType = "HTML";
sr.InputStream = new System.IO.StringReader(node.InnerText);
xdoc.Load(sr);
foreach(XmlNode PotentiallyMalicous in xdoc.SelectNodes("//script |
//embed //object | //frameset //frame //iframe | //meta | //link | //style |
//@style"))
{
if (node.ParentNode!=null)
PotentiallyMalicous.ParentNode.RemoveChild(Potenti allyMalicous);
else
xdoc.RemoveChild(PotentiallyMalicous);
}
item.desc = xdoc.InnerText;
Unfrotantely, I'm getting an exception on xdoc.Load(sr), saying:

System.InvalidOperationException: The specified node cannot be
inserted as the valid child of this node, because the specified node is the
wrong type.
at System.Xml.XmlDocument.AppendChildForLoad(XmlNode newChild,
XmlDocument doc)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader,
Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at Roar.RssFeed.UpdateItem(XmlNode ItemNode, XmlNamespaceManager
nsMgr, ArrayList& NewItems, Boolean Update) in c:\documents and
settings\ayende\my documents\visual studio projects\medar\rssfeed.cs:line
233
at Roar.RssFeed.UpdateFeed(XmlDocument feed, ArrayList& NewItems,
Boolean Update)

I've no idea what is causing this.
node.InnerText equals:
<'a href="http://www.ncl.com/fleet/dawn/index.htm"><'img
src="http://monster2.scripting.com/z/images/archiveScriptingCom/2003/12/01/d
awn.jpg" width="125" height="59" border="0" align="right" hspace="15"
vspace="5" alt="A picture named dawn.jpg"><'/a>Two articles, both from the
NY Times, by coincidence happened to show up one after the other in my
aggregator, a stark contrast of how two kinds of Americans live. The first
<'a
href="http://www.nytimes.com/2003/12/01/nyregion/01SHIP.html?ex=1385614800&e
n=f12a99f582744ee2&ei=5007&partner=USERLAND">artic le<'/a> details the
luxurious cruise ship Tom DeLay is bringing to the Republican National
Convention in NYC in August, where George Bush will, presumably, be
nominated for a second term as President. It's a very beautiful ship, very
nice. The second <'a
href="http://www.nytimes.com/2003/11/27/international/worldspecial/27LIST.ht
ml?ex=1385355600&en=1fa37d9cc6c8ca0f&ei=5007&partn er=USERLAND">article<'/a>
is the daily report of US soldiers killed in Iraq. Yesterday only one
soldier died, David Goldberg, 20, an engineer in the Army reserve,
based in Layton, Utah. Needless to say he won't be going to the Republican
National Convention or riding on any cruise ships. "

Any idea what could cause it? Or how to fix it?

Jul 21 '05 #1
2 8494
I use also the SgmlReader and I've never has some problem.

I think your problem is that you need a "root" node, an XmlDocument must
have a unique starting node like:

<html>
<head>
...
</head>
<body>
...
</body>
</html>

or

<doc>
<chapter name=1>
...
</chapter>
<chapter name=2>
...
</chapter>
</doc>

The stucture of your document is:
<a ..><img .../> </a>
<#text>
<a ..></a>
<#text>
<a ..></a>
<#text>

If it's so try the following:

new System.IO.StringReader("<p>"+node.InnerText+"</p>");

"Ayende Rahien" <Ay****@no.spam> schrieb im Newsbeitrag
news:%2****************@TK2MSFTNGP09.phx.gbl...
Serious problem

I'm using Chris Lovett's SgmlReader class
SgmlReader sr = new SgmlReader();
XmlDocument xdoc = new XmlDocument();
sr.DocType = "HTML";
sr.InputStream = new System.IO.StringReader(node.InnerText);
xdoc.Load(sr);
foreach(XmlNode PotentiallyMalicous in xdoc.SelectNodes("//script |
//embed //object | //frameset //frame //iframe | //meta | //link | //style | //@style"))
{
if (node.ParentNode!=null)
PotentiallyMalicous.ParentNode.RemoveChild(Potenti allyMalicous);
else
xdoc.RemoveChild(PotentiallyMalicous);
}
item.desc = xdoc.InnerText;
Unfrotantely, I'm getting an exception on xdoc.Load(sr), saying:

System.InvalidOperationException: The specified node cannot be
inserted as the valid child of this node, because the specified node is the wrong type.
at System.Xml.XmlDocument.AppendChildForLoad(XmlNode newChild,
XmlDocument doc)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader,
Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at Roar.RssFeed.UpdateItem(XmlNode ItemNode, XmlNamespaceManager
nsMgr, ArrayList& NewItems, Boolean Update) in c:\documents and
settings\ayende\my documents\visual studio projects\medar\rssfeed.cs:line
233
at Roar.RssFeed.UpdateFeed(XmlDocument feed, ArrayList& NewItems,
Boolean Update)

I've no idea what is causing this.
node.InnerText equals:
<'a href="http://www.ncl.com/fleet/dawn/index.htm"><'img
src="http://monster2.scripting.com/z/images/archiveScriptingCom/2003/12/01/d awn.jpg" width="125" height="59" border="0" align="right" hspace="15"
vspace="5" alt="A picture named dawn.jpg"><'/a>Two articles, both from the
NY Times, by coincidence happened to show up one after the other in my
aggregator, a stark contrast of how two kinds of Americans live. The first
<'a
href="http://www.nytimes.com/2003/12/01/nyregion/01SHIP.html?ex=1385614800&e n=f12a99f582744ee2&ei=5007&partner=USERLAND">artic le<'/a> details the
luxurious cruise ship Tom DeLay is bringing to the Republican National
Convention in NYC in August, where George Bush will, presumably, be
nominated for a second term as President. It's a very beautiful ship, very
nice. The second <'a
href="http://www.nytimes.com/2003/11/27/international/worldspecial/27LIST.ht ml?ex=1385355600&en=1fa37d9cc6c8ca0f&ei=5007&partn er=USERLAND">article<'/a> is the daily report of US soldiers killed in Iraq. Yesterday only one
soldier died, David Goldberg, 20, an engineer in the Army reserve,
based in Layton, Utah. Needless to say he won't be going to the Republican
National Convention or riding on any cruise ships. "

Any idea what could cause it? Or how to fix it?

Jul 21 '05 #2

"Zürcher See" <aq****@cannabismail.com> wrote in message
news:uV**************@TK2MSFTNGP12.phx.gbl...
I use also the SgmlReader and I've never has some problem.
Try
new System.IO.StringReader("<p>"+node.InnerText+"</p>");


Dude!
It works!
Thanks a Lot
Jul 21 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Ayende Rahien | last post: by
3 posts views Thread by Mungo Jerrie | last post: by
1 post views Thread by itsme | last post: by
1 post views Thread by Joe Monnin | last post: by
14 posts views Thread by jens Jensen | last post: by
3 posts views Thread by Stephen Ward | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.