Volker Hetzer wrote:
Quote:
I'm trying to parse an xhtml document like this:
>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>
>
</title></head>
<body>
<dl>
>
<dt>NAME</dt><dd>VARIANTATTRIBUTE1</dd><dt>TYPE</dt><dd>STRING</dd><dt>VALUE</dt><dd>HEY</dd><dt>LOCKVALUE</dt><dd>31314D32444F2F485A4E6C414A47474B62734B53524 56F676A2F673D</dd><dt>HOME</dt><dd><a
href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1/attributes/VARIANTATTRIBUTE1">VARIANTATTRIBUTE1</a></dd><dt>VARIANT</dt><dd><a
href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1">B1</a></dd>
</dl>
</body>
</html>
>
The problem is that XmlDocument.Load("<!..."); tries to access the DTD
and this takes ages. On the other hand, the browser can display this
page quickly.
Is it possible at all to read this xhtml document offline?
It depends on whether the document references entities defined in the
DTD. If it does not do that then doing
XmlDocument doc = new XmlDocument();
doc.XmlResolver = null
doc.Load("doc.xhtml");
should suffice to avoid that the XML parser fetches the DTD.
If the document however references entitities with e.g. ä then you
would get an error about an a reference to an undefined entity if the
parser has not fetched the DTD.
In that case if you want to speed up parsing your XHTML documents you
should store local copies of the DTD and all files it references and
then write your own XmlResolver (for instance by subclassing
XmlUrlResolver) that makes sure that the local copies are fetched when
the identifiers like "-//W3C//DTD XHTML 1.0 Transitional//EN" are resolved.
Quote:
Can I somehow tell the Document object the contents of the DTD? then I
could store it as a string in the assembly.
That has been done I think, google for .NET XmlResolver assembly.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/