473,466 Members | 1,347 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Finding HTML tags in streaming HTML

Hello:

Currently, I have a system that will use Regex to find tags in a
string of HTML. Recently my company needs me to read the HTML
dynamically from a stream, so as to avoid long waits on large pages or
slow servers.

Does anyone know of a good way to do this? There is no guarantee that
the pages are proper HTML, since this pulls from real web sites.

How tolerant are the XmlReaders when it comes to bad HTML?

Thanks,
Travis
Jul 6 '08 #1
1 1228
On Sun, 6 Jul 2008 13:51:29 -0700 (PDT), "je**********@gmail.com"
<je**********@gmail.comwrote:
>Hello:

Currently, I have a system that will use Regex to find tags in a
string of HTML. Recently my company needs me to read the HTML
dynamically from a stream, so as to avoid long waits on large pages or
slow servers.
You cannot be sure that you have seen all that there is on the page
until it has all loaded. Reading from the input stream will not make
a slow external server run any faster. Unless your processing is
taking a long time I suspect your boses will be disappointed.

Finding tags is a matter of looking for "<" and parsing the subsequent
characters. Do you need all tags or just a subset of them?
>
Does anyone know of a good way to do this? There is no guarantee that
the pages are proper HTML, since this pulls from real web sites.

How tolerant are the XmlReaders when it comes to bad HTML?
Not at all. Better to run the page through an HTML to XHTML
translator first, that way the XML parser will not throw a wobbly.

rossum
>
Thanks,
Travis
Jul 7 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Greg | last post by:
Hi. I have a rather large xml document (object) that can have one or more nodes with a certain attribute throughout (at ANY depth, not at the same level necessarily). I need to find this...
1
by: Andrew | last post by:
I'm adding this as it to me a while to figure out all the pieces to be able to do this without using Microsoft.Office.Interop which caused me problems on the web-server. Streaming is the easy...
0
by: Sindbaad | last post by:
I'm having a code in dblclick event on AxWebBrowser. I'm having a problem in identifying the Tag in the event. Below is the html code: <TD> <FirstName>FNAME</FirstName>...
2
by: mpaliath | last post by:
Hi guys I am currently involved in a project which requires me to recieve and play streaming video as well as send it. In Visual C++ is there any free library which helps me do this as...
14
by: kelvin.jones | last post by:
Hi, if I had the ID of an input element, how can I find the input's FORM in javascript? Basically, given a input's dom id, I want to insert something in the onSubmit of the Form that that input...
1
by: MichiMichi | last post by:
I am trying to secure filedownload via streaming to protect files on the server. This works very well but when I open the file in a notepad editor it shows always HTML code at the end of the file....
5
by: Ramdas | last post by:
I am doing some HTML scrapping for a side project. I need a method using sgmllib or HTMLParser to parse an HTML file and get line nos of all the tags I tried a few things, but I am just not...
8
by: sristhrashguy | last post by:
Hi everyone, i want .net(VB or C#) code for finding broken links in a website. The requirement is that the user will be able to type the url in a text box so once the button is...
1
by: Faisal Shafiq | last post by:
I want to upload a file direct to the Silverlight Streaming Service from a Web Client such as silverlight application. As per our product requirement we want to upload a .WMV file directly from...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.