473,216 Members | 1,279 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,216 software developers and data experts.

How to download a web page into a DOM-tree?

What is the simplest way of getting a web page downloaded,
and parsed into a dom-tree in C#?

There is a couple of web pages I need to "screen-scrape",
and want to do all my new development in C#. (coming from VB.Classic).

TIA...

--
Dag.
Aug 21 '08 #1
4 6241
"Dag Sunde" <me@dagsunde.comwrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
What is the simplest way of getting a web page downloaded,
and parsed into a dom-tree in C#?
One way to do it (if your program is a WinForm) is to use a WebBrowser
control (which you don't need to make visible). Navigate to the page that
you desire, and then access the DOM by means of the Document property of the
control.

Aug 21 '08 #2
"Alberto Poblacion" <ea******************************@poblacion.orgwro te
in message news:ez**************@TK2MSFTNGP02.phx.gbl...
"Dag Sunde" <me@dagsunde.comwrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...
>What is the simplest way of getting a web page downloaded,
and parsed into a dom-tree in C#?

One way to do it (if your program is a WinForm) is to use a WebBrowser
control (which you don't need to make visible). Navigate to the page that
you desire, and then access the DOM by means of the Document property of
the control.
Thanks!

Is there a way to do this in a Console App?

Or is there a simple HTML DOM parser available in
the .NET framework?

I have this code now, so I already have the HTML in a string
or byte array:

WebClient clnt = new WebClient();
byte[] resHTML = clnt.DownloadData("http://myurl.somewhere.no");
UTF8Encoding enc = new UTF8Encoding();
Console.WriteLine(enc.GetString(resHTML));

What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.

--
Dag.
Aug 21 '08 #3
What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.
What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.
That'll be the Html Agility Pack, then:
http://www.codeplex.com/htmlagilitypack

Marc
Aug 21 '08 #4
"Marc Gravell" <ma**********@gmail.comwrote in message
news:es*************@TK2MSFTNGP06.phx.gbl...
>
>What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.

What would be nice now, would be to have some kind of HTMLDOMDocument
class, that i could feed my text, and later just traverse the nodes.

That'll be the Html Agility Pack, then:
http://www.codeplex.com/htmlagilitypack
Fantastic!!!

That worked like a charm...

Thank you..

--
Dag.
Aug 21 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Jim Bayers | last post by:
Currently, users click on a button to download data. They click and wait patiently for a minute for the sql server to send them their data. Problem is, some users keep clicking on the button over...
2
by: Matt | last post by:
Apparently Microsoft gives away the Visual C++ compiler for free download, but after looking at their website I haven't been able to find it. What is the URL for the VC++ download? TIA
7
by: Brian Paul | last post by:
When a user clicks on a linkbutton on a page, i would like to render a printer-friendly version of the asp.net page and download it as an html attachment to the browser. The code below works great,...
1
by: Alfred Salton | last post by:
Can anyone confirm that asp.net provides no method for manipulating/modifiying the outgoing page using the document object model prior to sending it to the client? If I am wrong about this, can...
0
by: Wictor Wilén | last post by:
Heya, I need help creating a download page that should be used to download files from a server, and the files on the server contains filenames that contains non-us characters such as the swedish...
3
by: Jeff Jarrell | last post by:
I want to setup a downloads page on my site. Most of the time they are zip files but they are also MSI files. Things work ok if I simply put an <a> element referencing the file to download but...
1
by: Brett Kelly | last post by:
Ok, I know this sounds odd. Let me explain further. I have an ASP.net page (w/ C# code behind) that, when given a session variable containing the path to a local file, will attempt to start the...
1
by: Eddiekx | last post by:
Dear All I have a problem on file downloading with using ASP i need to write a page to get some document (mainly doc and jpg) with using ASP. A "Save As" dialog will be prompted to save the...
2
by: haderach | last post by:
Hi everyone, I have a HTML page and I'm replacing a div by some HTML code using Ajax. The problem is that I cannot access the newly added HTML tags as their IDs are not part of the original...
0
by: taekani | last post by:
hi~ if user oepn file then ↓↓↓ (filename) i want see the file name of korean language Response.AppendHeader("Content-Disposition", "attachment;filename=\"" + filename);
0
by: veera ravala | last post by:
ServiceNow is a powerful cloud-based platform that offers a wide range of services to help organizations manage their workflows, operations, and IT services more efficiently. At its core, ServiceNow...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.