Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old August 11th, 2006, 09:35 AM
Confused XML hacker
Guest
 
Posts: n/a
Default What is the best way to parse and validate an arbitary XML documen

My application needs to be able to parse and validate either a DTD or schema
based document without knowing in advance which form of grammar a document is
using. (New documents presented to my system are schema based while the older
ones are DTD - conversion is not an option as these document represent
legally binding contracts and they must be processed as is).

In the .Net 1.1 version of my code I used a XmlValidatingReader instance
configured with ValidationType.Auto which handled both document types. Now
that n I am porting the code to 2.0 I am trying to use the new
XmlReader.Create method but these readers must be configured as either DTD
validating or schema validating and can not do both at once.

I found a several examples on the web of nested XmlReaders and have tried
this approach (see following code) but the data in the <!DOCTYPE(my
application needs the PUBLIC name intact to determine the grammar version)
node of a DTD based instance seems to be lost as it passed through the
readers.

XmlReaderSettings settings;

settings = new XmlReaderSettings ();
settings.XmlResolver = GetResolver ();
settings.ValidationType = ValidationType.DTD;
settings.ProhibitDtd = false;

if (eventHandler != null)
settings.ValidationEventHandler += eventHandler;

XmlReader inner = XmlReader.Create (stream, settings);

settings = new XmlReaderSettings ();
settings.Schemas = GetSchemaSet ();
settings.ValidationType = ValidationType.Schema;
settings.ProhibitDtd = false;

if (eventHandler != null)
settings.ValidationEventHandler += eventHandler;

XmlReader reader = XmlReader.Create (inner, settings);

document.Load (reader);
return (document);

Can someone please confirm if I am using the right approach and whether the
problems with DOCTYPE are bugs, and if so is there a work around. If not then
it looks like I will have to remain with the old and now 'obsolete'
XmlValidatingReader interface which handles this scenario correctly.

The changes to the XML API in 2.0 seem to forget that there is a substantial
amount of information coded using DTD based grammars that will be with us for
many years. Some of the contracts I process represent 10 or 20 year deals.
Not everyone using XML is writing webservices where the XML is simply a
transient wire format. Efficient support for legacy documents and mixed
DTD/schema environments is commercially very important.
  #2  
Old August 11th, 2006, 06:45 PM
John Saunders
Guest
 
Posts: n/a
Default Re: What is the best way to parse and validate an arbitary XML documen

"Confused XML hacker" <Confused XML hacker@discussions.microsoft.comwrote
in message news:F7576F6D-6775-4319-A1F8-CD963532AD7B@microsoft.com...
Quote:
My application needs to be able to parse and validate either a DTD or
schema
based document without knowing in advance which form of grammar a document
is
using. (New documents presented to my system are schema based while the
older
ones are DTD - conversion is not an option as these document represent
legally binding contracts and they must be processed as is).
>
In the .Net 1.1 version of my code I used a XmlValidatingReader instance
configured with ValidationType.Auto which handled both document types. Now
that n I am porting the code to 2.0 I am trying to use the new
XmlReader.Create method but these readers must be configured as either DTD
validating or schema validating and can not do both at once.
>
I found a several examples on the web of nested XmlReaders and have tried
this approach (see following code) but the data in the <!DOCTYPE(my
application needs the PUBLIC name intact to determine the grammar version)
node of a DTD based instance seems to be lost as it passed through the
readers.
>
XmlReaderSettings settings;
>
settings = new XmlReaderSettings ();
settings.XmlResolver = GetResolver ();
settings.ValidationType = ValidationType.DTD;
settings.ProhibitDtd = false;
>
if (eventHandler != null)
settings.ValidationEventHandler += eventHandler;
>
XmlReader inner = XmlReader.Create (stream, settings);
>
settings = new XmlReaderSettings ();
settings.Schemas = GetSchemaSet ();
settings.ValidationType = ValidationType.Schema;
settings.ProhibitDtd = false;
>
if (eventHandler != null)
settings.ValidationEventHandler += eventHandler;
>
XmlReader reader = XmlReader.Create (inner, settings);
>
document.Load (reader);
return (document);
>
Can someone please confirm if I am using the right approach and whether
the
problems with DOCTYPE are bugs, and if so is there a work around. If not
then
it looks like I will have to remain with the old and now 'obsolete'
XmlValidatingReader interface which handles this scenario correctly.
>
The changes to the XML API in 2.0 seem to forget that there is a
substantial
amount of information coded using DTD based grammars that will be with us
for
many years. Some of the contracts I process represent 10 or 20 year deals.
Not everyone using XML is writing webservices where the XML is simply a
transient wire format. Efficient support for legacy documents and mixed
DTD/schema environments is commercially very important.
Could you first attempt it using a schema, and if that fails, using DTD? Or
should I read your post more carefully?

John


 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles