472,985 Members | 2,885 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,985 software developers and data experts.

How to read contents of html table with .net?

I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.
Thanks, Jim

Aug 29 '06 #1
2 5187


Jim S wrote:
I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.
You can use WebClient to make a HTTP request and get the response. You
can also use WebRequest/HttpWebRequest to make a HTTP request and get
the response.
HTML parsing is however not supported in the .NET framework unless you
want to use the WebBrowser control in .NET 2.0 or COM interop with
MSHTML in .NET 1.x.
There are third party tools to parse HTML, for instance
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Aug 29 '06 #2
Thanks Martin. The Html Agility Pack looks promising.

"Martin Honnen" wrote:
>

Jim S wrote:
I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.

You can use WebClient to make a HTTP request and get the response. You
can also use WebRequest/HttpWebRequest to make a HTTP request and get
the response.
HTML parsing is however not supported in the .NET framework unless you
want to use the WebBrowser control in .NET 2.0 or COM interop with
MSHTML in .NET 1.x.
There are third party tools to parse HTML, for instance
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Aug 29 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: wardellcastles | last post by:
I need to transform from an xml specification to a document with a "book-style" table of contents; one with page numbers, not html hyperlinks (The output of the XSLT is NOT html). For example,...
11
by: Michael Mayo | last post by:
I have a simple html page that contains an image in a single table cell, surrounded by a border: <http://www.softrains.com/lc/test.html>. I would like to eliminate the space between the table...
1
by: jaktharkhan | last post by:
Hi, I really really need help in trying to figure out how can I do a CloneNode on an Iframe where the cloned IFRAME clones with all its contents?. Basically what I am doing is dynamically building...
0
by: whitemoss | last post by:
Hi All, I had written a code to read a file and insert it's contents to the database. Since I will receive 3 files every hour, so, this program should read those files and insert the contents...
3
by: Jim S | last post by:
I have a need to read the contents of an html table on a remote web page into a variable. I guess this is called screen scraping but not sure. I'm not sure where to start or what the best...
4
by: Kuldeep | last post by:
Hi All, I am trying to read the contents of a page through its URL. My code snippet is as follows: public void mtdGetPageDataHWR() { HttpWebRequest objRequ =...
1
by: ducky801 | last post by:
Using VB 2005 express: I am using the webbrowser control to manipulate a website. i need to get the contents of a named table off of one of the pages and put into an array. how can i do this? ...
3
by: =?Utf-8?B?ZGF2aWQ=?= | last post by:
I try to follow Steve's paper to build a database, and store a small text file into SQL Server database and retrieve it later. Only difference between my table and Steve's table is that I use NTEXT...
28
by: tlpell | last post by:
Hey, read some tips/pointers on PHP.net but can't seem to solve this problem. I have a php page that reads the contents of a file and then displays the last XX lines of the file. Problem is...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.