By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,781 Members | 1,453 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,781 IT Pros & Developers. It's quick & easy.

reading small XML file with HUGE DTD (MathML / entities)

P: n/a
I have some *performance* trouble reading MathML files in my application (in
ASP.Net).

- I have small MathML files (2-3k) as input
- as (almost) all MathML files these use entities. I have no way to restrict
the entities used.
- to read an XML file entities into a document, you need to use a DTD, or
you get an exception (any other way ?)
- The MathML DTD is HUGE (2400+ Entities, ~300k of files), loading it in a
document is a big CPU and file access hog, specially for a ASP.net
application. As you see the DTD is easilly a hundred times bigger than the
file to load.

I tried to pay the price only once by caching an empty XmlDocument and
reusing it as a template, but doc.Clone() is also a big CPU hog, and trying
:
doc = docIn.Implementation.CreateDocument();
XmlNode n = doc.ImportNode( docIn.DocumentType, true );
to initialise the DTD is better but still intensive.

Any ideas on a better way to handle XmlDocument with a large number of
entities ? Pointers ? Suggestions ?

Thanks in advance.

Nov 12 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert
the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate
using a cached XmlSchemaSet. But alas, XSD does not have entities.

So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item
for us to consider.
"Michel de Becdelièvre" <m_*****@msn.com> wrote in message
news:uD*************@TK2MSFTNGP11.phx.gbl...
I have some *performance* trouble reading MathML files in my application (in ASP.Net).

- I have small MathML files (2-3k) as input
- as (almost) all MathML files these use entities. I have no way to restrict the entities used.
- to read an XML file entities into a document, you need to use a DTD, or
you get an exception (any other way ?)
- The MathML DTD is HUGE (2400+ Entities, ~300k of files), loading it in a
document is a big CPU and file access hog, specially for a ASP.net
application. As you see the DTD is easilly a hundred times bigger than the
file to load.

I tried to pay the price only once by caching an empty XmlDocument and
reusing it as a template, but doc.Clone() is also a big CPU hog, and trying :
doc = docIn.Implementation.CreateDocument();
XmlNode n = doc.ImportNode( docIn.DocumentType, true );
to initialise the DTD is better but still intensive.

Any ideas on a better way to handle XmlDocument with a large number of
entities ? Pointers ? Suggestions ?

Thanks in advance.


Nov 12 '05 #2

P: n/a

"Chris Lovett" <cl*****@microsoft.com.no_spam> a écrit dans le message de
news:eA**************@TK2MSFTNGP10.phx.gbl...
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate
using a cached XmlSchemaSet. But alas, XSD does not have entities.
I do not *really* need to validate (I'm confident on the quality of the
MathML emitter), but I need to parse the MathML into a XmlDocument tree
(need to be able to backtrack for instance), and I have found no way (even
with Validation.None) to avoid parsing the entities.
So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item
for us to consider.


Thanks. Won't solve my problem now, but will be needed for MathML and may be
for XHTML.

Nov 12 '05 #3

P: n/a
In the meantime you could subclass XmlTextReader and turn off general entity
expansion, and expand the XmlEntityReference nodes yourself based on a
hashtable when they are returned from XmlTextReader.

"Michel de Becdelièvre" <m_*****@msn.com> wrote in message
news:OR**************@TK2MSFTNGP10.phx.gbl...

"Chris Lovett" <cl*****@microsoft.com.no_spam> a écrit dans le message de
news:eA**************@TK2MSFTNGP10.phx.gbl...
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert
the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate using a cached XmlSchemaSet. But alas, XSD does not have entities.


I do not *really* need to validate (I'm confident on the quality of the
MathML emitter), but I need to parse the MathML into a XmlDocument tree
(need to be able to backtrack for instance), and I have found no way (even
with Validation.None) to avoid parsing the entities.
So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item for us to consider.


Thanks. Won't solve my problem now, but will be needed for MathML and may

be for XHTML.

Nov 12 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.