473,327 Members | 2,071 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

How to read contents of html table with .net?

I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.
Thanks, Jim

Aug 29 '06 #1
2 5222


Jim S wrote:
I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.
You can use WebClient to make a HTTP request and get the response. You
can also use WebRequest/HttpWebRequest to make a HTTP request and get
the response.
HTML parsing is however not supported in the .NET framework unless you
want to use the WebBrowser control in .NET 2.0 or COM interop with
MSHTML in .NET 1.x.
There are third party tools to parse HTML, for instance
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Aug 29 '06 #2
Thanks Martin. The Html Agility Pack looks promising.

"Martin Honnen" wrote:
>

Jim S wrote:
I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.

You can use WebClient to make a HTTP request and get the response. You
can also use WebRequest/HttpWebRequest to make a HTTP request and get
the response.
HTML parsing is however not supported in the .NET framework unless you
want to use the WebBrowser control in .NET 2.0 or COM interop with
MSHTML in .NET 1.x.
There are third party tools to parse HTML, for instance
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Aug 29 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: wardellcastles | last post by:
I need to transform from an xml specification to a document with a "book-style" table of contents; one with page numbers, not html hyperlinks (The output of the XSLT is NOT html). For example,...
11
by: Michael Mayo | last post by:
I have a simple html page that contains an image in a single table cell, surrounded by a border: <http://www.softrains.com/lc/test.html>. I would like to eliminate the space between the table...
1
by: jaktharkhan | last post by:
Hi, I really really need help in trying to figure out how can I do a CloneNode on an Iframe where the cloned IFRAME clones with all its contents?. Basically what I am doing is dynamically building...
0
by: whitemoss | last post by:
Hi All, I had written a code to read a file and insert it's contents to the database. Since I will receive 3 files every hour, so, this program should read those files and insert the contents...
3
by: Jim S | last post by:
I have a need to read the contents of an html table on a remote web page into a variable. I guess this is called screen scraping but not sure. I'm not sure where to start or what the best...
4
by: Kuldeep | last post by:
Hi All, I am trying to read the contents of a page through its URL. My code snippet is as follows: public void mtdGetPageDataHWR() { HttpWebRequest objRequ =...
1
by: ducky801 | last post by:
Using VB 2005 express: I am using the webbrowser control to manipulate a website. i need to get the contents of a named table off of one of the pages and put into an array. how can i do this? ...
3
by: =?Utf-8?B?ZGF2aWQ=?= | last post by:
I try to follow Steve's paper to build a database, and store a small text file into SQL Server database and retrieve it later. Only difference between my table and Steve's table is that I use NTEXT...
28
by: tlpell | last post by:
Hey, read some tips/pointers on PHP.net but can't seem to solve this problem. I have a php page that reads the contents of a file and then displays the last XX lines of the file. Problem is...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.