473,509 Members | 2,457 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

reading small XML file with HUGE DTD (MathML / entities)

I have some *performance* trouble reading MathML files in my application (in
ASP.Net).

- I have small MathML files (2-3k) as input
- as (almost) all MathML files these use entities. I have no way to restrict
the entities used.
- to read an XML file entities into a document, you need to use a DTD, or
you get an exception (any other way ?)
- The MathML DTD is HUGE (2400+ Entities, ~300k of files), loading it in a
document is a big CPU and file access hog, specially for a ASP.net
application. As you see the DTD is easilly a hundred times bigger than the
file to load.

I tried to pay the price only once by caching an empty XmlDocument and
reusing it as a template, but doc.Clone() is also a big CPU hog, and trying
:
doc = docIn.Implementation.CreateDocument();
XmlNode n = doc.ImportNode( docIn.DocumentType, true );
to initialise the DTD is better but still intensive.

Any ideas on a better way to handle XmlDocument with a large number of
entities ? Pointers ? Suggestions ?

Thanks in advance.

Nov 12 '05 #1
3 2543
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert
the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate
using a cached XmlSchemaSet. But alas, XSD does not have entities.

So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item
for us to consider.
"Michel de Becdelièvre" <m_*****@msn.com> wrote in message
news:uD*************@TK2MSFTNGP11.phx.gbl...
I have some *performance* trouble reading MathML files in my application (in ASP.Net).

- I have small MathML files (2-3k) as input
- as (almost) all MathML files these use entities. I have no way to restrict the entities used.
- to read an XML file entities into a document, you need to use a DTD, or
you get an exception (any other way ?)
- The MathML DTD is HUGE (2400+ Entities, ~300k of files), loading it in a
document is a big CPU and file access hog, specially for a ASP.net
application. As you see the DTD is easilly a hundred times bigger than the
file to load.

I tried to pay the price only once by caching an empty XmlDocument and
reusing it as a template, but doc.Clone() is also a big CPU hog, and trying :
doc = docIn.Implementation.CreateDocument();
XmlNode n = doc.ImportNode( docIn.DocumentType, true );
to initialise the DTD is better but still intensive.

Any ideas on a better way to handle XmlDocument with a large number of
entities ? Pointers ? Suggestions ?

Thanks in advance.


Nov 12 '05 #2

"Chris Lovett" <cl*****@microsoft.com.no_spam> a écrit dans le message de
news:eA**************@TK2MSFTNGP10.phx.gbl...
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate
using a cached XmlSchemaSet. But alas, XSD does not have entities.
I do not *really* need to validate (I'm confident on the quality of the
MathML emitter), but I need to parse the MathML into a XmlDocument tree
(need to be able to backtrack for instance), and I have found no way (even
with Validation.None) to avoid parsing the entities.
So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item
for us to consider.


Thanks. Won't solve my problem now, but will be needed for MathML and may be
for XHTML.

Nov 12 '05 #3
In the meantime you could subclass XmlTextReader and turn off general entity
expansion, and expand the XmlEntityReference nodes yourself based on a
hashtable when they are returned from XmlTextReader.

"Michel de Becdelièvre" <m_*****@msn.com> wrote in message
news:OR**************@TK2MSFTNGP10.phx.gbl...

"Chris Lovett" <cl*****@microsoft.com.no_spam> a écrit dans le message de
news:eA**************@TK2MSFTNGP10.phx.gbl...
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert
the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate using a cached XmlSchemaSet. But alas, XSD does not have entities.


I do not *really* need to validate (I'm confident on the quality of the
MathML emitter), but I need to parse the MathML into a XmlDocument tree
(need to be able to backtrack for instance), and I have found no way (even
with Validation.None) to avoid parsing the entities.
So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item for us to consider.


Thanks. Won't solve my problem now, but will be needed for MathML and may

be for XHTML.

Nov 12 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
6844
by: Joachim Spoerhase | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, does anybody know a good WYSIWYG MathML Equation Editor for Linux? Many thanks in advance Joachim Spoerhase -----BEGIN PGP...
4
1786
by: Jon Thackray | last post by:
I'm trying to build some mathml for a paper. I've got the mathml2 dtd, and the style sheets also from the canonical website http://www.w3.org/Math/. But I'm having some trouble. I've input the...
3
1476
by: wende598 | last post by:
I have a Jornada 720 and would like to be able to view MathML files. There is a patch from HP that enables XML on the Pocket IE. Would it be possible to configure it to render MathML? TIA,...
0
387
by: Michel de Becdelièvre | last post by:
I have some *performance* trouble reading MathML files in my application (in ASP.Net). - I have small MathML files (2-3k) as input - as (almost) all MathML files these use entities. I have no...
3
1419
by: dan_andrews | last post by:
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd" > <html xmlns="http://www.w3.org/1999/xhtml">...
3
1650
by: BakedBean | last post by:
Hi, This is probably really simple, but I've only just been asked to look at this and I've spent a full day trying to get it to work, so any help will be very gratefully received! At present...
5
1324
by: James Black | last post by:
I am trying to combine using MathML and javascript in the same page, running under Firefox 1.5. I needed to have it called index.xhtml, as it seems to make a difference for the MathML part, but,...
1
2073
by: SteveB | last post by:
I'm porting an application from Apache Xerces to .Net and am having a couple of small problems with deserialization. The XML that I'm reading comes from a variety of sources, and there are two...
0
7233
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7135
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7410
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7067
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5650
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4729
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3201
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1570
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
774
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.