By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,610 Members | 1,989 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,610 IT Pros & Developers. It's quick & easy.

Extracting html source from a web page...

P: n/a
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
--
forum member
http://www.visual-basic-data-mining.net/forum
Nov 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
if you want to see HTML source code 'manually' i'd recommend attaching to IE process using VS.NET

as for getting HTML source via code i think you should look at the IObjectWithSite interface and related ones .. have a look at : http://weblogs.asp.net/stevencohn/articles/60948.aspx for more information

HTH

Konrad
"ne**@visual-basic-data-mining.net" <si******@gmail.com> wrote in message news:up**************@TK2MSFTNGP11.phx.gbl...
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
--
forum member
http://www.visual-basic-data-mining.net/forum
Nov 18 '05 #2

P: n/a
ne**@visual-basic-data-mining.net wrote:
I am trying to get at the source of a web page. Looking at the
innerHTML element is only part of the story. In IE, right-clicking
on various different parts of the page gives me different results
when I click on view_source.
Because you're looking at multiple sources...
The source I need is contained inside IFRAME tags (which contain
references to jsp pages)... The html content isn't available when I
look at the innerHTML of the document returned in the
DocumentComplete event of the WebBrowser control. My question is
basically, how do I get the html generated by the jsp page in the
IFRAME?
Simply download the contents referenced by the IFRAME's SRC attribute using
Systetm.Net.WebClient or System.Net.WebRequest.
Better yet, how do I get the complete html as it is rendered
by IE?


There's no such thing. You're basically looking at two distinct HTML
documents at the same time.

Cheers,

--
Joerg Jooss
jo*********@gmx.net
Nov 18 '05 #3

P: n/a
I did that with one of my pages ,like this

[code]
//We check the extension of file if it is HTML or NOT

//lets say we have this string containg the file name

string html = "http://localhost/Project/test.htm";

if(html.EndsWith(".htm") || html.EndsWith(".html"))

{

//Remove white spaces

html = html.Trim();

//Construct string builder object

StringBuilder sBuilder = new StringBuilder();

string temp="";

try

{

//Request

System.Net.HttpWebRequest webrequest = (HttpWebRequest)System.Net.WebRequest.Create(html) ;

//Get

System.Net.HttpWebResponse webresponse=(HttpWebResponse)webrequest.GetRespons e();

//Read the content of HTML file

StreamReader webstream = new StreamReader(webresponse.GetResponseStream(),Encod ing.Default);

//Loop until End-Of-File

while((temp=webstream.ReadLine())!= null)

{

sBuilder.Append(temp + "\n\r");

}

//Save the content in temporary variable

string HtmlContent = sBuilder.ToString();

hope that what u need?

Regards

"ne**@visual-basic-data-mining.net" <si******@gmail.com> wrote in message news:up**************@TK2MSFTNGP11.phx.gbl...
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
--
forum member
http://www.visual-basic-data-mining.net/forum
Nov 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.