473,385 Members | 1,647 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

How to read contents of html table with .net?

I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.
Thanks, Jim

Aug 29 '06 #1
2 5225


Jim S wrote:
I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.
You can use WebClient to make a HTTP request and get the response. You
can also use WebRequest/HttpWebRequest to make a HTTP request and get
the response.
HTML parsing is however not supported in the .NET framework unless you
want to use the WebBrowser control in .NET 2.0 or COM interop with
MSHTML in .NET 1.x.
There are third party tools to parse HTML, for instance
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Aug 29 '06 #2
Thanks Martin. The Html Agility Pack looks promising.

"Martin Honnen" wrote:
>

Jim S wrote:
I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.

You can use WebClient to make a HTTP request and get the response. You
can also use WebRequest/HttpWebRequest to make a HTTP request and get
the response.
HTML parsing is however not supported in the .NET framework unless you
want to use the WebBrowser control in .NET 2.0 or COM interop with
MSHTML in .NET 1.x.
There are third party tools to parse HTML, for instance
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Aug 29 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: wardellcastles | last post by:
I need to transform from an xml specification to a document with a "book-style" table of contents; one with page numbers, not html hyperlinks (The output of the XSLT is NOT html). For example,...
11
by: Michael Mayo | last post by:
I have a simple html page that contains an image in a single table cell, surrounded by a border: <http://www.softrains.com/lc/test.html>. I would like to eliminate the space between the table...
1
by: jaktharkhan | last post by:
Hi, I really really need help in trying to figure out how can I do a CloneNode on an Iframe where the cloned IFRAME clones with all its contents?. Basically what I am doing is dynamically building...
0
by: whitemoss | last post by:
Hi All, I had written a code to read a file and insert it's contents to the database. Since I will receive 3 files every hour, so, this program should read those files and insert the contents...
3
by: Jim S | last post by:
I have a need to read the contents of an html table on a remote web page into a variable. I guess this is called screen scraping but not sure. I'm not sure where to start or what the best...
4
by: Kuldeep | last post by:
Hi All, I am trying to read the contents of a page through its URL. My code snippet is as follows: public void mtdGetPageDataHWR() { HttpWebRequest objRequ =...
1
by: ducky801 | last post by:
Using VB 2005 express: I am using the webbrowser control to manipulate a website. i need to get the contents of a named table off of one of the pages and put into an array. how can i do this? ...
3
by: =?Utf-8?B?ZGF2aWQ=?= | last post by:
I try to follow Steve's paper to build a database, and store a small text file into SQL Server database and retrieve it later. Only difference between my table and Steve's table is that I use NTEXT...
28
by: tlpell | last post by:
Hey, read some tips/pointers on PHP.net but can't seem to solve this problem. I have a php page that reads the contents of a file and then displays the last XX lines of the file. Problem is...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.