470,815 Members | 1,252 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,815 developers. It's quick & easy.

How to read contents of html table with .net?

I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.
Thanks, Jim

Aug 29 '06 #1
2 5057


Jim S wrote:
I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.
You can use WebClient to make a HTTP request and get the response. You
can also use WebRequest/HttpWebRequest to make a HTTP request and get
the response.
HTML parsing is however not supported in the .NET framework unless you
want to use the WebBrowser control in .NET 2.0 or COM interop with
MSHTML in .NET 1.x.
There are third party tools to parse HTML, for instance
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Aug 29 '06 #2
Thanks Martin. The Html Agility Pack looks promising.

"Martin Honnen" wrote:
>

Jim S wrote:
I have a need to read the contents of an html table on a remote web page into
a variable. I guess this is called screen scraping but not sure. I'm not
sure where to start or what the best practices are to accomplish this. For
instance; I have a healthcare app that need to check a gov't we page for a
user's license no# periodically. There is no login and I can put the user
info in the request URL no problem but not sure how to read the response data
in the tables. What is the namespace and class(s) I should be looking at?
Nothing jumped out at me under System.Web.

You can use WebClient to make a HTTP request and get the response. You
can also use WebRequest/HttpWebRequest to make a HTTP request and get
the response.
HTML parsing is however not supported in the .NET framework unless you
want to use the WebBrowser control in .NET 2.0 or COM interop with
MSHTML in .NET 1.x.
There are third party tools to parse HTML, for instance
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Aug 29 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by wardellcastles | last post: by
reply views Thread by whitemoss | last post: by
4 posts views Thread by Kuldeep | last post: by
3 posts views Thread by =?Utf-8?B?ZGF2aWQ=?= | last post: by
28 posts views Thread by tlpell | last post: by
reply views Thread by mihailmihai484 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.