472,969 Members | 1,505 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,969 software developers and data experts.

Parsing / processing a stream of HTML

Hi,

I'm using HttpWebRequest and HttpWebResponse to return a stream of HTML.
Looking for advice as to the accepted / easiest / most efficient way to
process this HTML in the background i.e. I don't want to display it all to
the user, just pull out certain pieces of it.

Specifically, I'm looking to evaluate the tabledefs it contains - walk
through their rows and columns etc.

Any assistance gratefully received as ever.

Best regards,

Mark Rae
Nov 15 '05 #1
6 6263
Mark,
I would seriously consider using regular expressions to extract the
content you are looking for out of your html string.

http://www.regular-expressions.info/dotnet.html
http://www.ondotnet.com/pub/a/dotnet...11/regex2.html
--
Jay Douglas
Fort Collins, CO

"Mark Rae" <ma**@markrae.co.uk> wrote in message
news:uK****************@TK2MSFTNGP10.phx.gbl...
Hi,

I'm using HttpWebRequest and HttpWebResponse to return a stream of HTML.
Looking for advice as to the accepted / easiest / most efficient way to
process this HTML in the background i.e. I don't want to display it all to
the user, just pull out certain pieces of it.

Specifically, I'm looking to evaluate the tabledefs it contains - walk
through their rows and columns etc.

Any assistance gratefully received as ever.

Best regards,

Mark Rae

Nov 15 '05 #2
"Jay Douglas" <RE*********************************@squarei.com > wrote in
message news:uT****************@TK2MSFTNGP10.phx.gbl...
Mark,
I would seriously consider using regular expressions to extract the
content you are looking for out of your html string.

http://www.regular-expressions.info/dotnet.html
http://www.ondotnet.com/pub/a/dotnet...11/regex2.html


Thanks for the reply. Will that, e.g. allow me to extract all the text
between "<table" and "</table>"?

Alternatively, is there a way to reference a stream of HTML and treat it as
if it were an HTML document from which I could evaluate the tabledefs
collection etc?

Mark
Nov 15 '05 #3
Mark,

With regular expressions, you can extract text from all sorts of
different patterns including text in-between table tags.

Now about changing attributes and elements of the html string... I've
seen some examples where html is actually transformed into xml string and
then attributes of certain elements are then modified then returned back to
an html string.

Here's a link to start your research with:

http://www.fawcette.com/vsm/2002_03/..._wagner_03_18/
--
Jay Douglas
Fort Collins, CO

"Mark Rae" <ma**@markrae.co.uk> wrote in message
news:ep****************@tk2msftngp13.phx.gbl...
Thanks for the reply. Will that, e.g. allow me to extract all the text
between "<table" and "</table>"?

Alternatively, is there a way to reference a stream of HTML and treat it as if it were an HTML document from which I could evaluate the tabledefs
collection etc?

Mark

Nov 15 '05 #4
"Jay Douglas" <RE*********************************@squarei.com > wrote in
message news:e5****************@TK2MSFTNGP11.phx.gbl...

Jay,

With regular expressions, you can extract text from all sorts of
different patterns including text in-between table tags.

Now about changing attributes and elements of the html string... I've
seen some examples where html is actually transformed into xml string and
then attributes of certain elements are then modified then returned back to an html string.

Here's a link to start your research with:

http://www.fawcette.com/vsm/2002_03/..._wagner_03_18/


Thanks for this. I looked at it, and found that it was more than I needed.

In the end, I extracted the various <tr>...</tr> lines out of the HTML
stream, and then processeded them with the standard Substring() and
IndexOf() methods of the String object.

Job done.

Best,

Mark
Nov 15 '05 #5
Jay Douglas wrote:
Mark,

With regular expressions, you can extract text from all sorts of
different patterns including text in-between table tags.


Not really. You cannot match corresponding opening and closing tags for
example, because there's no way to express such constructs using regular
expressions (see context-free grammars).

I'd rather use a real parser such as the Chris Lovett's SGML parser.

http://www.gotdotnet.com/Community/U...4-C3BD760564BC

Cheers,

--
Joerg Jooss
jo*********@gmx.net

Nov 15 '05 #6
"Joerg Jooss" <jo*********@gmx.net> wrote in message
news:eC**************@TK2MSFTNGP11.phx.gbl...
Jay Douglas wrote:
Mark,

With regular expressions, you can extract text from all sorts of
different patterns including text in-between table tags.
Not really. You cannot match corresponding opening and closing tags for
example, because there's no way to express such constructs using regular
expressions (see context-free grammars).


I'm having no problems thus far extracting strings between the following
tags:

<tr>...</tr>
<td>...</td>
<p>...</p>

I'd rather use a real parser such as the Chris Lovett's SGML parser.

http://www.gotdotnet.com/Community/U...4-C3BD760564BC

Very useful!

Mark
Nov 15 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Girish | last post by:
Hello.. I need to understand how parsing of a file or a stream(XML data in memory) takes place in Xerces C++. I am using SAX XMLReader and passing xml input as a file path or a MemBufInputSource...
0
by: creativewebpros | last post by:
I know there is a way to designate a HTML folder to export a Crystal Report's output. However, I would like to export the report to a HTML stream instead. I understand that the designated HTML...
0
by: june | last post by:
Hi, I have a big problem with parsing HTML into a XHTML using Cberneko to validate the html. First I tried to work with a HTML-File. This solutions works fine: String aHTMLFile =...
4
by: baldwasagar | last post by:
I want to parse a HTML file in Java which has JavaScript also in it. I want to fetch the data of Java Script tag also. The tag is SCRIPT. Please help with suggestions / solutions. I have tried...
29
by: lenbell | last post by:
It's old stupid and lazy here again I have been wanting to keep using my WYSIWYG (What You See Is What You Get - for my fellow stupids) html editor. But I was told that you HAD to rename your...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.