473,322 Members | 1,496 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Extracting html source from a web page...

I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
--
forum member
http://www.visual-basic-data-mining.net/forum
Nov 18 '05 #1
3 3319
if you want to see HTML source code 'manually' i'd recommend attaching to IE process using VS.NET

as for getting HTML source via code i think you should look at the IObjectWithSite interface and related ones .. have a look at : http://weblogs.asp.net/stevencohn/articles/60948.aspx for more information

HTH

Konrad
"ne**@visual-basic-data-mining.net" <si******@gmail.com> wrote in message news:up**************@TK2MSFTNGP11.phx.gbl...
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
--
forum member
http://www.visual-basic-data-mining.net/forum
Nov 18 '05 #2
ne**@visual-basic-data-mining.net wrote:
I am trying to get at the source of a web page. Looking at the
innerHTML element is only part of the story. In IE, right-clicking
on various different parts of the page gives me different results
when I click on view_source.
Because you're looking at multiple sources...
The source I need is contained inside IFRAME tags (which contain
references to jsp pages)... The html content isn't available when I
look at the innerHTML of the document returned in the
DocumentComplete event of the WebBrowser control. My question is
basically, how do I get the html generated by the jsp page in the
IFRAME?
Simply download the contents referenced by the IFRAME's SRC attribute using
Systetm.Net.WebClient or System.Net.WebRequest.
Better yet, how do I get the complete html as it is rendered
by IE?


There's no such thing. You're basically looking at two distinct HTML
documents at the same time.

Cheers,

--
Joerg Jooss
jo*********@gmx.net
Nov 18 '05 #3
I did that with one of my pages ,like this

[code]
//We check the extension of file if it is HTML or NOT

//lets say we have this string containg the file name

string html = "http://localhost/Project/test.htm";

if(html.EndsWith(".htm") || html.EndsWith(".html"))

{

//Remove white spaces

html = html.Trim();

//Construct string builder object

StringBuilder sBuilder = new StringBuilder();

string temp="";

try

{

//Request

System.Net.HttpWebRequest webrequest = (HttpWebRequest)System.Net.WebRequest.Create(html) ;

//Get

System.Net.HttpWebResponse webresponse=(HttpWebResponse)webrequest.GetRespons e();

//Read the content of HTML file

StreamReader webstream = new StreamReader(webresponse.GetResponseStream(),Encod ing.Default);

//Loop until End-Of-File

while((temp=webstream.ReadLine())!= null)

{

sBuilder.Append(temp + "\n\r");

}

//Save the content in temporary variable

string HtmlContent = sBuilder.ToString();

hope that what u need?

Regards

"ne**@visual-basic-data-mining.net" <si******@gmail.com> wrote in message news:up**************@TK2MSFTNGP11.phx.gbl...
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
--
forum member
http://www.visual-basic-data-mining.net/forum
Nov 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Markus Ernst | last post by:
Hello I have a regex problem, spent about 7 hours on this now, but I don't find the answer in the manual and googling, though I think this must have been discussed before. I try to simply...
1
by: Keith | last post by:
Is there a way of processing an ASP page and placing the resulting HTML source code into a TEXTAREA on another page? I have a page that will be generated dynamically from database content....
1
by: Cognizance | last post by:
Hi gang, I'm an ASP developer by trade, but I've had to create client side scripts with JavaScript many times in the past. Simple things, like validating form elements and such. Now I've been...
1
by: lucanos | last post by:
Hi All, I am toying with the idea of making a GreaseMonkey script, or similar (depending on how far out of my comfort zone I am willing to venture), which would translate a page automatically. ...
2
by: Chris Belcher | last post by:
First some background... The database tracks Action Items assigned to a group of 20 or so managers. Once the assignment is created it is then emailed to each of the managers that are included in...
1
by: John Seeliger | last post by:
I am pretty new to VB, so please forgive the simplistic question. This is using VB .NET Standard 2003. My form has three objects on it: a TextBox named URL, a Button named Extract and a...
2
by: s. d. rose | last post by:
Hello All. I am learning Python, and have never worked with HTML. However, I would like to write a simple script to audit my 100+ Netware servers via their web portal. I was reading Chapter 8...
3
by: Frank Potter | last post by:
There are ten web pages I want to deal with. from http://www.af.shejis.com/new_lw/html/125926.shtml to http://www.af.shejis.com/new_lw/html/125936.shtml Each of them uses the charset of...
1
by: nayijitu | last post by:
Hi, I need some thing like this. I have an XML file and it contain some elements(tag) with a proper hierarchy... In that XML a particular element <page> contain some child node element like...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.