On Sun, 6 Jul 2008 13:51:29 -0700 (PDT), "je**********@gmail.com"
<je**********@gmail.comwrote:
>Hello:
Currently, I have a system that will use Regex to find tags in a
string of HTML. Recently my company needs me to read the HTML
dynamically from a stream, so as to avoid long waits on large pages or
slow servers.
You cannot be sure that you have seen all that there is on the page
until it has all loaded. Reading from the input stream will not make
a slow external server run any faster. Unless your processing is
taking a long time I suspect your boses will be disappointed.
Finding tags is a matter of looking for "<" and parsing the subsequent
characters. Do you need all tags or just a subset of them?
>
Does anyone know of a good way to do this? There is no guarantee that
the pages are proper HTML, since this pulls from real web sites.
How tolerant are the XmlReaders when it comes to bad HTML?
Not at all. Better to run the page through an HTML to XHTML
translator first, that way the XML parser will not throw a wobbly.
rossum
>
Thanks,
Travis