473,233 Members | 1,562 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,233 software developers and data experts.

reading small XML file with HUGE DTD (MathML / entities)

I have some *performance* trouble reading MathML files in my application (in
ASP.Net).

- I have small MathML files (2-3k) as input
- as (almost) all MathML files these use entities. I have no way to restrict
the entities used.
- to read an XML file entities into a document, you need to use a DTD, or
you get an exception (any other way ?)
- The MathML DTD is HUGE (2400+ Entities, ~300k of files), loading it in a
document is a big CPU and file access hog, specially for a ASP.net
application. As you see the DTD is easilly a hundred times bigger than the
file to load.

I tried to pay the price only once by caching an empty XmlDocument and
reusing it as a template, but doc.Clone() is also a big CPU hog, and trying
:
doc = docIn.Implementation.CreateDocument();
XmlNode n = doc.ImportNode( docIn.DocumentType, true );
to initialise the DTD is better but still intensive.

Any ideas on a better way to handle XmlDocument with a large number of
entities ? Pointers ? Suggestions ?

Thanks in advance.

Nov 12 '05 #1
3 2516
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert
the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate
using a cached XmlSchemaSet. But alas, XSD does not have entities.

So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item
for us to consider.
"Michel de Becdelièvre" <m_*****@msn.com> wrote in message
news:uD*************@TK2MSFTNGP11.phx.gbl...
I have some *performance* trouble reading MathML files in my application (in ASP.Net).

- I have small MathML files (2-3k) as input
- as (almost) all MathML files these use entities. I have no way to restrict the entities used.
- to read an XML file entities into a document, you need to use a DTD, or
you get an exception (any other way ?)
- The MathML DTD is HUGE (2400+ Entities, ~300k of files), loading it in a
document is a big CPU and file access hog, specially for a ASP.net
application. As you see the DTD is easilly a hundred times bigger than the
file to load.

I tried to pay the price only once by caching an empty XmlDocument and
reusing it as a template, but doc.Clone() is also a big CPU hog, and trying :
doc = docIn.Implementation.CreateDocument();
XmlNode n = doc.ImportNode( docIn.DocumentType, true );
to initialise the DTD is better but still intensive.

Any ideas on a better way to handle XmlDocument with a large number of
entities ? Pointers ? Suggestions ?

Thanks in advance.


Nov 12 '05 #2

"Chris Lovett" <cl*****@microsoft.com.no_spam> a écrit dans le message de
news:eA**************@TK2MSFTNGP10.phx.gbl...
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate
using a cached XmlSchemaSet. But alas, XSD does not have entities.
I do not *really* need to validate (I'm confident on the quality of the
MathML emitter), but I need to parse the MathML into a XmlDocument tree
(need to be able to backtrack for instance), and I have found no way (even
with Validation.None) to avoid parsing the entities.
So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item
for us to consider.


Thanks. Won't solve my problem now, but will be needed for MathML and may be
for XHTML.

Nov 12 '05 #3
In the meantime you could subclass XmlTextReader and turn off general entity
expansion, and expand the XmlEntityReference nodes yourself based on a
hashtable when they are returned from XmlTextReader.

"Michel de Becdelièvre" <m_*****@msn.com> wrote in message
news:OR**************@TK2MSFTNGP10.phx.gbl...

"Chris Lovett" <cl*****@microsoft.com.no_spam> a écrit dans le message de
news:eA**************@TK2MSFTNGP10.phx.gbl...
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert
the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate using a cached XmlSchemaSet. But alas, XSD does not have entities.


I do not *really* need to validate (I'm confident on the quality of the
MathML emitter), but I need to parse the MathML into a XmlDocument tree
(need to be able to backtrack for instance), and I have found no way (even
with Validation.None) to avoid parsing the entities.
So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item for us to consider.


Thanks. Won't solve my problem now, but will be needed for MathML and may

be for XHTML.

Nov 12 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Joachim Spoerhase | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, does anybody know a good WYSIWYG MathML Equation Editor for Linux? Many thanks in advance Joachim Spoerhase -----BEGIN PGP...
4
by: Jon Thackray | last post by:
I'm trying to build some mathml for a paper. I've got the mathml2 dtd, and the style sheets also from the canonical website http://www.w3.org/Math/. But I'm having some trouble. I've input the...
3
by: wende598 | last post by:
I have a Jornada 720 and would like to be able to view MathML files. There is a patch from HP that enables XML on the Pocket IE. Would it be possible to configure it to render MathML? TIA,...
0
by: Michel de Becdelièvre | last post by:
I have some *performance* trouble reading MathML files in my application (in ASP.Net). - I have small MathML files (2-3k) as input - as (almost) all MathML files these use entities. I have no...
3
by: dan_andrews | last post by:
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd" > <html xmlns="http://www.w3.org/1999/xhtml">...
3
by: BakedBean | last post by:
Hi, This is probably really simple, but I've only just been asked to look at this and I've spent a full day trying to get it to work, so any help will be very gratefully received! At present...
5
by: James Black | last post by:
I am trying to combine using MathML and javascript in the same page, running under Firefox 1.5. I needed to have it called index.xhtml, as it seems to make a difference for the MathML part, but,...
1
by: SteveB | last post by:
I'm porting an application from Apache Xerces to .Net and am having a couple of small problems with deserialization. The XML that I'm reading comes from a variety of sources, and there are two...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.