Connecting Tech Pros Worldwide Forums | Help | Site Map

reading xhtml documents offline

Volker Hetzer
Guest
 
Posts: n/a
#1: Oct 7 '08
Hi!
I'm trying to parse an xhtml document like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>

</title></head>
<body>
<dl>
<dt>NAME</dt><dd>VARIANTATTRIBUTE1</dd><dt>TYPE</dt><dd>STRING</dd><dt>VALUE</dt><dd>HEY</dd><dt>LOCKVALUE</dt><dd>31314D32444F2F485A4E6C414A47474B62734B53524 56F676A2F673D</dd><dt>HOME</dt><dd><a href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1/attributes/VARIANTATTRIBUTE1">VARIANTATTRIBUTE1</a></dd><dt>VARIANT</dt><dd><a href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1">B1</a></dd>
</dl>
</body>
</html>

The problem is that XmlDocument.Load("<!..."); tries to access the DTD and this takes ages. On the other hand, the browser can display this page quickly.
Is it possible at all to read this xhtml document offline?
Can I somehow tell the Document object the contents of the DTD? then I could store it as a string in the assembly.

Lots of Greetings!
Volker
--
For email replies, please substitute the obvious.

Volker Hetzer
Guest
 
Posts: n/a
#2: Oct 7 '08

re: reading xhtml documents offline


Volker Hetzer schrieb:
Quote:
Hi!
I'm trying to parse an xhtml document like this:
[...]
Quote:
The problem is that XmlDocument.Load("<!..."); tries to access the DTD
and this takes ages. On the other hand, the browser can display this
page quickly.
Is it possible at all to read this xhtml document offline?
Can I somehow tell the Document object the contents of the DTD? then I
could store it as a string in the assembly.
Found out.
There is a nice explanation with a downloadable project here:
http://blogs.pingpoet.com/overflow/a...7/20/6607.aspx .
I just had to modify it not to create its own DTD.

Lots of Greetings!
Volker
--
For email replies, please substitute the obvious.
Martin Honnen
Guest
 
Posts: n/a
#3: Oct 7 '08

re: reading xhtml documents offline


Volker Hetzer wrote:
Quote:
I'm trying to parse an xhtml document like this:
>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>
>
</title></head>
<body>
<dl>
>
<dt>NAME</dt><dd>VARIANTATTRIBUTE1</dd><dt>TYPE</dt><dd>STRING</dd><dt>VALUE</dt><dd>HEY</dd><dt>LOCKVALUE</dt><dd>31314D32444F2F485A4E6C414A47474B62734B53524 56F676A2F673D</dd><dt>HOME</dt><dd><a
href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1/attributes/VARIANTATTRIBUTE1">VARIANTATTRIBUTE1</a></dd><dt>VARIANT</dt><dd><a
href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1">B1</a></dd>
</dl>
</body>
</html>
>
The problem is that XmlDocument.Load("<!..."); tries to access the DTD
and this takes ages. On the other hand, the browser can display this
page quickly.
Is it possible at all to read this xhtml document offline?
It depends on whether the document references entities defined in the
DTD. If it does not do that then doing
XmlDocument doc = new XmlDocument();
doc.XmlResolver = null
doc.Load("doc.xhtml");
should suffice to avoid that the XML parser fetches the DTD.

If the document however references entitities with e.g. &auml; then you
would get an error about an a reference to an undefined entity if the
parser has not fetched the DTD.

In that case if you want to speed up parsing your XHTML documents you
should store local copies of the DTD and all files it references and
then write your own XmlResolver (for instance by subclassing
XmlUrlResolver) that makes sure that the local copies are fetched when
the identifiers like "-//W3C//DTD XHTML 1.0 Transitional//EN" are resolved.
Quote:
Can I somehow tell the Document object the contents of the DTD? then I
could store it as a string in the assembly.
That has been done I think, google for .NET XmlResolver assembly.



--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Closed Thread