473,378 Members | 1,468 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Extract data from web page.

I've got this type of info on a web page:

----------------------------------------------------------------------------
--------------------------------------------
<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106110">
Quality ebay auction</a>&nbsp;</font>
<font class="sub-row">in General&nbsp;/&nbsp;The Lounge</font><font
class="sub-row"><br>Started 7/15/2005 - pages <a
href="topic.asp?tid=106110">1</a> - last posted by <a
href="profile.asp?action=view&id=Shandy" onmouseover="window.status='Show
the authors profile'; return true;" onmouseout="window.status=''; return
true;">Shandy</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=DiscoInferno"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">DiscoInf<BR>erno</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">9</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/15/2005<br>
<font class="sub-row">5:02:16 PM</font></font></td>
</tr>
----------------------------------------------------------------------------
--------------------------------------------

It's a table which shows the latest posts of a forum. I'd like to pull out
the following information:
Topic: Quality ebay auction
Original poster: DiscoInferno
Started: 7/15/2005
Last Post By: Shandy
Last Post Date: 7/15/2005 5:02:16 PM

This *type* of information is repeated down the web page although the data
will change.
.....

and I want to do this with the whole page/table. Should I use RegEx to get
the data or simply do a string search when I download the page's source into
my application?
--
|
+-- Thief_
|
Nov 21 '05 #1
2 4471
Hi,

Here is a start. It uses a regex to extract links.
Dim wc As New System.Net.WebClient

Dim sr As New System.IO.StreamReader(wc.OpenRead("http://news.google.com/"))

Dim strHtml As String

Dim regLink As New
System.Text.RegularExpressions.Regex("\""(?<url>[^\""]*)\""")

Dim regTitle As New System.Text.RegularExpressions.Regex(">(.*?)\<")

Dim regHref As New System.Text.RegularExpressions.Regex("\<a
href=""(.*?)""\>(.*?)\<\/a\>")

Dim m As System.Text.RegularExpressions.Match

strHtml = sr.ReadToEnd

Try

For Each m In regHref.Matches(strHtml)

Dim mLink As System.Text.RegularExpressions.Match

For Each mLink In regLink.Matches(m.ToString())

Trace.WriteLine(String.Format("Link {0}", mLink.ToString))

Next

For Each mLink In regTitle.Matches(m.ToString())

Dim strTitle As String = mLink.ToString

strTitle = strTitle.Replace(">", "")

strTitle = strTitle.Replace("<", "")

Trace.WriteLine(String.Format("Title {0}", strTitle))

Next

Next

Catch

End Try

sr.Close()

wc.Dispose()

Good resource for Regular Expression Examples.

http://www.regexlib.com/DisplayPatte...4&categoryId=8

Ken

----------------------------

"Thief_" <th****@hotmail.com> wrote in message
news:OZ**************@TK2MSFTNGP12.phx.gbl...
I've got this type of info on a web page:

----------------------------------------------------------------------------
--------------------------------------------
<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106110">
Quality ebay auction</a>&nbsp;</font>
<font class="sub-row">in General&nbsp;/&nbsp;The Lounge</font><font
class="sub-row"><br>Started 7/15/2005 - pages <a
href="topic.asp?tid=106110">1</a> - last posted by <a
href="profile.asp?action=view&id=Shandy" onmouseover="window.status='Show
the authors profile'; return true;" onmouseout="window.status=''; return
true;">Shandy</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=DiscoInferno"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">DiscoInf<BR>erno</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">9</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/15/2005<br>
<font class="sub-row">5:02:16 PM</font></font></td>
</tr>
----------------------------------------------------------------------------
--------------------------------------------

It's a table which shows the latest posts of a forum. I'd like to pull out
the following information:
Topic: Quality ebay auction
Original poster: DiscoInferno
Started: 7/15/2005
Last Post By: Shandy
Last Post Date: 7/15/2005 5:02:16 PM

This *type* of information is repeated down the web page although the data
will change.
.....

and I want to do this with the whole page/table. Should I use RegEx to get
the data or simply do a string search when I download the page's source into
my application?
--
|
+-- Thief_
|

Nov 21 '05 #2
"Thief_" <th****@hotmail.com> schrieb:
I've got this type of info on a web page:

----------------------------------------------------------------------------
--------------------------------------------
<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New
Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106110">
Quality ebay auction</a>&nbsp;</font>
<font class="sub-row">in General&nbsp;/&nbsp;The Lounge</font><font
class="sub-row"><br>Started 7/15/2005 - pages <a
href="topic.asp?tid=106110">1</a> - last posted by <a
href="profile.asp?action=view&id=Shandy" onmouseover="window.status='Show
the authors profile'; return true;" onmouseout="window.status=''; return
true;">Shandy</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=DiscoInferno"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return
true;">DiscoInf<BR>erno</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">9</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/15/2005<br>
<font class="sub-row">5:02:16 PM</font></font></td>
</tr>
----------------------------------------------------------------------------
--------------------------------------------

It's a table which shows the latest posts of a forum. I'd like to pull out
the following information:
Topic: Quality ebay auction
Original poster: DiscoInferno
Started: 7/15/2005
Last Post By: Shandy
Last Post Date: 7/15/2005 5:02:16 PM

This *type* of information is repeated down the web page although the data
will change.
....

and I want to do this with the whole page/table. Should I use RegEx to get
the data or simply do a string search when I download the page's source
into
my application?


Parsing an HTML file:

MSHTML Reference
<URL:http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp>

- or -

..NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML...
<URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>

Download:

<URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>

- or -

SgmlReader 1.4
<URL:http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC>

If the file read is in XHTML format, you can use the classes contained in
the 'System.Xml' namespace for reading information from the file.

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://classicvb.org/petition/>

Nov 21 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: jrefactors | last post by:
How to extract data from html page? For example, if i want to get the information of weather (http://weather.yahoo.com/forecast/USCA1005.html) and put in my web page. Is it possible to do that? ...
1
by: Chris | last post by:
If this is not the right place to post, please someone direct me to the correct place. I'm having problems extracting the binary data that's included in an xml response back from a server. It's...
2
by: jjouett | last post by:
I'm trying to setup an ASPX web page to service requests from an existing Java Client that posts multi-part data as a way to upload files, and I can't find a straightforward way to process the...
9
by: chrisspencer02 | last post by:
I am looking for a method to extract the links embedded within the Javascript in a web page: an ActiveX component, or example code in C++/Pascal/etc. I am looking for a general solution, not one...
1
by: caine | last post by:
I want to extract web data from a news feed page http://everling.nierchi.net/mmubulletins.php. Just want to extract necessary info between open n closing tags of <title>, <categoryand <link>....
4
by: seberino | last post by:
I'm trying to extract some data from an XHTML Transitional web page. What is best way to do this? xml.dom.minidom.parseString("text of web page") gives errors about it not being well formed...
11
by: seberino | last post by:
How extract the visible numerical data from this Microsoft financial web site? http://tinyurl.com/yw2w4h If you simply download the HTML file you'll see the data is *not* embedded in it but...
3
by: SteveB | last post by:
I have posted this question in the Visual Basic 2005 and Visual Basic .Net 2005 discussion groups, also. Hi. I am developing an application/web page with VB.Net that will populate a SQL...
5
by: Steve | last post by:
Hi all Does anybody please know a way to extract an Image from a pdf file and save it as a TIFF? I have used a scanner to scan documents which are then placed on a server, but I need to...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.