By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,667 Members | 1,920 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,667 IT Pros & Developers. It's quick & easy.

Finding HTML tags in streaming HTML

P: n/a
Hello:

Currently, I have a system that will use Regex to find tags in a
string of HTML. Recently my company needs me to read the HTML
dynamically from a stream, so as to avoid long waits on large pages or
slow servers.

Does anyone know of a good way to do this? There is no guarantee that
the pages are proper HTML, since this pulls from real web sites.

How tolerant are the XmlReaders when it comes to bad HTML?

Thanks,
Travis
Jul 6 '08 #1
Share this Question
Share on Google+
1 Reply


P: n/a
On Sun, 6 Jul 2008 13:51:29 -0700 (PDT), "je**********@gmail.com"
<je**********@gmail.comwrote:
>Hello:

Currently, I have a system that will use Regex to find tags in a
string of HTML. Recently my company needs me to read the HTML
dynamically from a stream, so as to avoid long waits on large pages or
slow servers.
You cannot be sure that you have seen all that there is on the page
until it has all loaded. Reading from the input stream will not make
a slow external server run any faster. Unless your processing is
taking a long time I suspect your boses will be disappointed.

Finding tags is a matter of looking for "<" and parsing the subsequent
characters. Do you need all tags or just a subset of them?
>
Does anyone know of a good way to do this? There is no guarantee that
the pages are proper HTML, since this pulls from real web sites.

How tolerant are the XmlReaders when it comes to bad HTML?
Not at all. Better to run the page through an HTML to XHTML
translator first, that way the XML parser will not throw a wobbly.

rossum
>
Thanks,
Travis
Jul 7 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.