By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,949 Members | 1,511 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,949 IT Pros & Developers. It's quick & easy.

Remove XML node before validating

P: n/a
Hello,

I need to remove the DTD reference from an xml document, the reason
for this is that we want to validate against a schema instead (which
we have locally). It takes up to a minute to fetch all documents
referred to in the DTD, and as we have no use for them I want to
remove the reference.

I'm using XmlReaderSettings to pass in the xml document and the
schema, but when I loop through the reader it goes and tries to get
the DTD before I can remove it, so I'm assuming there's a better way
to remove it before doing the validation. I've tried using XPath but I
don't know how to find the doctype node. Is it Xpath that I should
use?

I'd be very grateful if anyone could point me in the right direction.

Thanks,

AK
Oct 27 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a

P: n/a
ag***********@gmail.com wrote:
I need to remove the DTD reference from an xml document, the reason
for this is that we want to validate against a schema instead (which
we have locally). It takes up to a minute to fetch all documents
referred to in the DTD, and as we have no use for them I want to
remove the reference.

I'm using XmlReaderSettings to pass in the xml document and the
schema, but when I loop through the reader it goes and tries to get
the DTD before I can remove it, so I'm assuming there's a better way
to remove it before doing the validation. I've tried using XPath but I
don't know how to find the doctype node. Is it Xpath that I should
use?
No, the XPath data model does not know any DTDs so it does certainly not
help.
If you want the XmlReader (or XmlDocument) to ignore the referenced DTD
then you can try to set the XmlResolver property (of the
XmlReaderSettings you create your XmlReader with
http://msdn.microsoft.com/en-us/libr...lresolver.aspx)
to null. That way the reader will not fetch any resources. That will
only work however if the XML document does not references any entities
defined in the DTD.
A bit more work but a more complete solution is to set the XmlResolver
to your own implementation of UrlResolver, for instance by subclassing
XmlUrlResolver, that then uses a locally cached copy of the DTDs.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Oct 27 '08 #3

P: n/a
AK
On Oct 27, 2:26*pm, Martin Honnen <mahotr...@yahoo.dewrote:
No, the XPath data model does not know any DTDs so it does certainly not
help.
If you want the XmlReader (or XmlDocument) to ignore the referenced DTD
then you can try to set the XmlResolver property (of the
XmlReaderSettings you create your XmlReader withhttp://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings....)
to null. That way the reader will not fetch any resources. That will
only work however if the XML document does not references any entities
defined in the DTD.
A bit more work but a more complete solution is to set the XmlResolver
to your own implementation of UrlResolver, for instance by subclassing
XmlUrlResolver, that then uses a locally cached copy of the DTDs.
Thanks for your answer, it took so long before my post came up that I
actually thought it had gone missing at first, only noticed it now!

This is the code I'm using at the moment:

XmlDocument xdoc = new XmlDocument();
bool docIsValid = false;

try
{
xdoc.XmlResolver = null;
xdoc.Load(scorePath);

docIsValid = true;
}
catch (System.Exception ex)
{
errorList.Add(ex.Message);
}

if (docIsValid == true)
{
foreach (XmlNode node in xdoc.ChildNodes)
{
if (node.GetType().ToString().Contains("DocumentType" ))
{
// Delete it
xdoc.RemoveChild(node);
}
}

MemoryStream ms = new MemoryStream();
xdoc.Save(ms);
ms.Position = 0;
XmlReader xmlDoc = XmlReader.Create(ms);

XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
settings.XmlResolver = new LocalXmlResolver();

settings.ValidationEventHandler += new
System.Xml.Schema.ValidationEventHandler(settings_ ValidationEventHandler);

XmlSchema x =
XmlSchema.Read(Utilities.getSchemaFromResources(pv gschema),
settings_ValidationEventHandler);
settings.Schemas.Add(x);

settings.ValidationType = ValidationType.Schema;

XmlReader reader = XmlReader.Create(xmlDoc, settings);

while (reader.Read())
{

}
}

Basically I want to validate against a locally saved schema (which is
set to an embedded resource), and never validate against the DTD. The
code above is not ideal as I'm validating the xml file twice, once to
remove the DTD reference then once against the schema, however it does
avoid me having to go get all the documents referenced in the DTD
(which could take up to a minute).

Also, I've saved all the schemas referenced to in 'pvgschema' locally
and added them as embedded resources, but it doesn't seem like the
XmlResolver works as I thought as it still does an HTTP get for those
schemas on the line settings.Schemas.Add(x);.

Is there a simpler way of doing this?

Many thanks,

AK
Oct 28 '08 #4

P: n/a
AK
On Oct 28, 1:54*pm, AK <agda.karlb...@gmail.comwrote:
Basically I want to validate against a locally saved schema (which is
set to an embedded resource), and never validate against the DTD. The
code above is not ideal as I'm validating the xml file twice, once to
remove the DTD reference then once against the schema, however it does
avoid me having to go get all the documents referenced in the DTD
(which could take up to a minute).

Also, I've saved all the schemas referenced to in 'pvgschema' locally
and added them as embedded resources, but it doesn't seem like the
XmlResolver works as I thought as it still does an HTTP get for those
schemas on the line settings.Schemas.Add(x);.
For the second point, I had made a mistake in the resolver. It now
tries to get the embedded schema but fails as the schema has a
"xs:redefine schemaLocation" in it and I get the error message
"schemaLocation must successfully resolve if <redefinecontains any
child other than <annotation>". Is it possible to solve this or would
it be better to remove the redefine from the schema?

(Apologies if someone has already answered this - I've had troubles
seeing updates and only saw my own answer to this when I came in this
morning even if I posted it yesterday afternoon.)

Many thanks,

AK
Oct 29 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.