Finding HTML tags in streaming HTML

jehugaleahsa

Hello:

Currently, I have a system that will use Regex to find tags in a
string of HTML. Recently my company needs me to read the HTML
dynamically from a stream, so as to avoid long waits on large pages or
slow servers.

Does anyone know of a good way to do this? There is no guarantee that
the pages are proper HTML, since this pulls from real web sites.

How tolerant are the XmlReaders when it comes to bad HTML?

Thanks,
Travis

Jul 6 '08 #1

Subscribe Reply

1228

rossum

On Sun, 6 Jul 2008 13:51:29 -0700 (PDT), "je**********@gmail.com"
<je**********@gmail.comwrote:

>Hello:

Currently, I have a system that will use Regex to find tags in a
string of HTML. Recently my company needs me to read the HTML
dynamically from a stream, so as to avoid long waits on large pages or
slow servers.

You cannot be sure that you have seen all that there is on the page
until it has all loaded. Reading from the input stream will not make
a slow external server run any faster. Unless your processing is
taking a long time I suspect your boses will be disappointed.

Finding tags is a matter of looking for "<" and parsing the subsequent
characters. Do you need all tags or just a subset of them?

>
Does anyone know of a good way to do this? There is no guarantee that
the pages are proper HTML, since this pulls from real web sites.

How tolerant are the XmlReaders when it comes to bad HTML?

Not at all. Better to run the page through an HTML to XHTML
translator first, that way the XML parser will not throw a wobbly.

rossum

>
Thanks,
Travis

Jul 7 '08 #2

by: Greg | last post by:

Hi. I have a rather large xml document (object) that can have one or more nodes with a certain attribute throughout (at ANY depth, not at the same level necessarily). I need to find this...

.NET Framework

How to on word doc output (page setup, streaming html and datagrid, open file)

by: Andrew | last post by:

I'm adding this as it to me a while to figure out all the pieces to be able to do this without using Microsoft.Office.Interop which caused me problems on the web-server. Streaming is the easy...

ASP.NET

AxWebBrowser: Finding Tag on an event.

by: Sindbaad | last post by:

I'm having a code in dblclick event on AxWebBrowser. I'm having a problem in identifying the Tag in the event. Below is the html code: <TD> <FirstName>FNAME</FirstName>...

Visual Basic .NET

Video Streaming in Windows

by: mpaliath | last post by:

Hi guys I am currently involved in a project which requires me to recieve and play streaming video as well as send it. In Visual C++ is there any free library which helps me do this as...

C / C++

Finding a Form Element from an Input

by: kelvin.jones | last post by:

Hi, if I had the ID of an input element, how can I find the input's FORM in javascript? Basically, given a input's dom id, I want to insert something in the onSubmit of the Form that that input...

Javascript

file download via Response.BinaryWrite shows HTML tags

by: MichiMichi | last post by:

I am trying to secure filedownload via streaming to protect files on the server. This works very well but when I open the file in a notepad editor it shows always HTML code at the end of the file....

ASP.NET

Finding Line numbers of HTML file

by: Ramdas | last post by:

I am doing some HTML scrapping for a side project. I need a method using sgmllib or HTMLParser to parse an HTML file and get line nos of all the tags I tried a few things, but I am just not...

Python

Finding Broken Link in WebSite

by: sristhrashguy | last post by:

Hi everyone, i want .net(VB or C#) code for finding broken links in a website. The requirement is that the user will be able to type the url in a text box so once the button is...

.NET Framework

Direct Uploading on Silverlight Streaming Service

by: Faisal Shafiq | last post by:

I want to upload a file direct to the Silverlight Streaming Service from a Web Client such as silverlight application. As per our product requirement we want to upload a .WMV file directly from...

ASP.NET

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Finding HTML tags in streaming HTML

Similar topics