471,338 Members | 993 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,338 software developers and data experts.

How to download a web page into a DOM-tree?

What is the simplest way of getting a web page downloaded,
and parsed into a dom-tree in C#?

There is a couple of web pages I need to "screen-scrape",
and want to do all my new development in C#. (coming from VB.Classic).

TIA...

--
Dag.
Aug 21 '08 #1
4 6154
"Dag Sunde" <me@dagsunde.comwrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
What is the simplest way of getting a web page downloaded,
and parsed into a dom-tree in C#?
One way to do it (if your program is a WinForm) is to use a WebBrowser
control (which you don't need to make visible). Navigate to the page that
you desire, and then access the DOM by means of the Document property of the
control.

Aug 21 '08 #2
"Alberto Poblacion" <ea******************************@poblacion.orgwro te
in message news:ez**************@TK2MSFTNGP02.phx.gbl...
"Dag Sunde" <me@dagsunde.comwrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
>What is the simplest way of getting a web page downloaded,
and parsed into a dom-tree in C#?

One way to do it (if your program is a WinForm) is to use a WebBrowser
control (which you don't need to make visible). Navigate to the page that
you desire, and then access the DOM by means of the Document property of
the control.
Thanks!

Is there a way to do this in a Console App?

Or is there a simple HTML DOM parser available in
the .NET framework?

I have this code now, so I already have the HTML in a string
or byte array:

WebClient clnt = new WebClient();
byte[] resHTML = clnt.DownloadData("http://myurl.somewhere.no");
UTF8Encoding enc = new UTF8Encoding();
Console.WriteLine(enc.GetString(resHTML));

What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.

--
Dag.
Aug 21 '08 #3
What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.
What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.
That'll be the Html Agility Pack, then:
http://www.codeplex.com/htmlagilitypack

Marc
Aug 21 '08 #4
"Marc Gravell" <ma**********@gmail.comwrote in message
news:es*************@TK2MSFTNGP06.phx.gbl...
>
>What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.

What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.

That'll be the Html Agility Pack, then:
http://www.codeplex.com/htmlagilitypack
Fantastic!!!

That worked like a charm...

Thank you..

--
Dag.
Aug 21 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Matt | last post: by
7 posts views Thread by Brian Paul | last post: by
1 post views Thread by Alfred Salton | last post: by
reply views Thread by Wictor Wilén | last post: by
3 posts views Thread by Jeff Jarrell | last post: by
reply views Thread by taekani | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.