473,587 Members | 2,267 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

WebBrowser and Returing the raw HTML

Ok I'm making a fairly simple application. It contains 2
web browsers, the top one is used so that you can view a
website (i.e. one you have created). Every time you load
a page, the HTML which was received is then sent to the
http://validator.w3.org website to validate your HTML /
XHTML.

So far I've got everything to work, even the part where
the HTML is posted to the w3.org website.

But all of the following commands (Browser1 is the main
WebBrowser control) produce a form of HTML for the
document, but all the tags get converted to uppercase and
parts of the document go missing such as the "DOCTYPE".. .

Browser1.Docume nt.ToString()
Browser1.Docume nt.documentelem ent.outerhtml
Browser1.Docume nt.documentelem ent.innerhtml
Browser1.Docume nt.Body.outerht ml
Browser1.Docume nt.Body.innerht ml
Browser1.Docume nt.All(0).outer html
Browser1.Docume nt.All(0).inner html
Browser1.Docume nt.All(1).outer html
Browser1.Docume nt.All(1).inner html
Browser1.Docume nt.All(2).outer html
Browser1.Docume nt.All(2).inner html
NOTE: The HTML sent to the w3.org website must be exactly
the same as what the server sends otherwise what's the
point in validating it?

Finally, because it will be used on interactive websites
(with a user login), you cant use controls such as the
Inet to return the HTML as then the user (main browser)
will make a request to the server (which may delete a
record) then the Inet or Winsock (etc) will make a
request, but this will then return a different page
(saying you cant delete a record).
Nov 20 '05 #1
11 14272
Hi Craig

Because the DOCTYPE tag is outside the main document, it is not included
when you retrieve inner and outer HTML. To include the entire file you will
need to use the IPersistStreamI nit interface, e.g.

<interface>
Imports System.Runtime. InteropServices

' IPersistStreamI nit interface
<ComVisible(Tru e), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAt tribute(ComInte rfaceType.Inter faceIsIUnknown) > _
Public Interface IPersistStreamI nit
Sub GetClassID(ByRe f pClassID As Guid)

<PreserveSig( )> Function IsDirty() As Integer
<PreserveSig( )> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig( )> Function Save(ByVal pstm As UCOMIStream, ByVal fClearDirty
As Boolean) As Integer
<PreserveSig( )> Function GetSizeMax(<InA ttribute(), Out(),
MarshalAs(Unman agedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig( )> Function InitNew() As Integer
End Interface
</interface>

<code>
Dim ips as IPersistStreamI nit

ips = DirectCast(Brow ser1.document, IPersistStreamI nit)

ips.Save(strm, False)
</code>

This will save the complete HTML to a stream, which you can turn into a
string.

Regarding the conversion to uppercase, is this actually a problem? The
change of case should not affect the validity of the parsing.

There also two particular newsgroups which may give further help:

microsoft.publi c.inetsdk.progr amming.mshtml_h osting
microsoft.publi c.inetsdk.progr amming.webbrows er_ctl

HTH

Charles

"Craig Francis" <1@1.com> wrote in message
news:08******** *************** *****@phx.gbl.. .
Ok I'm making a fairly simple application. It contains 2
web browsers, the top one is used so that you can view a
website (i.e. one you have created). Every time you load
a page, the HTML which was received is then sent to the
http://validator.w3.org website to validate your HTML /
XHTML.

So far I've got everything to work, even the part where
the HTML is posted to the w3.org website.

But all of the following commands (Browser1 is the main
WebBrowser control) produce a form of HTML for the
document, but all the tags get converted to uppercase and
parts of the document go missing such as the "DOCTYPE".. .

Browser1.Docume nt.ToString()
Browser1.Docume nt.documentelem ent.outerhtml
Browser1.Docume nt.documentelem ent.innerhtml
Browser1.Docume nt.Body.outerht ml
Browser1.Docume nt.Body.innerht ml
Browser1.Docume nt.All(0).outer html
Browser1.Docume nt.All(0).inner html
Browser1.Docume nt.All(1).outer html
Browser1.Docume nt.All(1).inner html
Browser1.Docume nt.All(2).outer html
Browser1.Docume nt.All(2).inner html
NOTE: The HTML sent to the w3.org website must be exactly
the same as what the server sends otherwise what's the
point in validating it?

Finally, because it will be used on interactive websites
(with a user login), you cant use controls such as the
Inet to return the HTML as then the user (main browser)
will make a request to the server (which may delete a
record) then the Inet or Winsock (etc) will make a
request, but this will then return a different page
(saying you cant delete a record).

Nov 20 '05 #2
Thank you for your quick reply.

But is that VB code? I've been using VB5/6 for several
years and that looks slightly C like - this project is
being written in VB.NET, but I've only just upgraded and
finding some of these new methods a little strange.

Also RE the tags being changed to uppercase - The reason
I mentioned it was because it shows that the HTML
document is being changed, probably into a form that the
browser can easily understand (and is probably strict XML
even if the input wasn't XML based).

Anyway, thanks for giving me something else to try.

Craig
Nov 20 '05 #3
Got it, you put all the

<interface></interface>

before the "Public Class Form1" bit - so the first part
of the form, then the

<code></code>

in the function which returns the HTML code. Well that
method doesn't bring up any errors apart from what "strm"
should be dimed as - I've never used a stream before.

But thanks again - this is the most progress I've made in
the past 2 days!

Nov 20 '05 #4
Hi Craig

Yes, sorry about that. It's just a habit I have got into to show where code
and stuff begins and ends. Add the following for the stream handling:

<code>
<DllImport("OLE 32.DLL")> _
Public Shared Sub CreateStreamOnH Global(ByVal hGlobal As IntPtr, ByVal
fDelete As Boolean, ByRef stm As UCOMIStream)
' LEAVE THIS BLANK - PLACEHOLDER
End Sub

<DllImport("OLE 32.DLL")> _
Public Shared Sub GetHGlobalFromS tream(ByVal stm As UCOMIStream, ByRef
hGlobal As IntPtr)
' LEAVE THIS BLANK - PLACEHOLDER
End Sub

Private Function GetStream(ByVal size As Integer) As UCOMIStream

Dim iptr As IntPtr
Dim strm As UCOMIStream

iptr = Marshal.AllocHG lobal(size)
CreateStreamOnH Global(iptr, True, strm)

Return strm

End Function

Private Function StreamToString( ByVal strm As UCOMIStream) As String

Dim iptr As IntPtr
Dim s As String

GetHGlobalFromS tream(strm, iptr)
s = Marshal.PtrToSt ringAnsi(iptr)

Return s

End Function
</code>

<code>
Dim strm As UCOMIStream
Dim s As String

' Allocate a reasonably high value!
strm = GetStream(2048)

' Save HTML and convert to a string
ips.Save(strm, False)
s = StreamToString( strm)
</code>

The code above should allow you to be able get the full HTML. The only issue
with this is the allocation of the stream. IPersistStreamI nit.GetSizeMax( )
should return a value indicating the size of the stream required, but it
always returns zero. The best way, therefore is to read the stream a bit at
a time until the buffer is empty, but for simplicity I have just allocated a
stream that should be big enough to take it all in one go. You can make it
bigger of course if you need to.

HTH

Charles
"Craig Francis" <1@1.com> wrote in message
news:09******** *************** *****@phx.gbl.. .
Got it, you put all the

<interface></interface>

before the "Public Class Form1" bit - so the first part
of the form, then the

<code></code>

in the function which returns the HTML code. Well that
method doesn't bring up any errors apart from what "strm"
should be dimed as - I've never used a stream before.

But thanks again - this is the most progress I've made in
the past 2 days!

Nov 20 '05 #5
You are a f________ god!

Thankyou, it works perfectly!!!
Nov 20 '05 #6
Hello,

"Craig Francis" <1@1.com> schrieb:
Ok I'm making a fairly simple application. It contains 2
web browsers, the top one is used so that you can view a
website (i.e. one you have created). Every time you load
a page, the HTML which was received is then sent to the
http://validator.w3.org website to validate your HTML /
XHTML.

So far I've got everything to work, even the part where
the HTML is posted to the w3.org website.

But all of the following commands (Browser1 is the main
WebBrowser control) produce a form of HTML for the
document, but all the tags get converted to uppercase and
parts of the document go missing such as the "DOCTYPE".. .


I don't really understand why you use the WebBrowser control to download the
web page. Why not use, for example, the 'WebRequest' class?

--
Herfried K. Wagner
MVP · VB Classic, VB.NET
http://www.mvps.org/dotnet
Nov 20 '05 #7
>I don't really understand why you use the WebBrowser
control to download the
web page. Why not use, for example, the 'WebRequest'

class?

Because im fairly new to VB.NET and wanted a simple
application to create - well what I thought might be
simple.

Also I've used the WebBrowser control before and it was a
simple way to add a browser to the application where the
user could navigate in exactly the same way as in IE.
Nov 20 '05 #8
Cor
Charles,
Just a question, I have seen you uses always the mshtml.IHtmldoc ument2
I use the mshtml.Htmldocu ment.
I have the idea, that with that I can access all <tags> including the src,
innertext and innerhtml etc per framepage.

What do I mis?
Cor
Nov 20 '05 #9
Hi Cor

Long time no speak.

The simple answer is speed. Try the following on an initialised WebBrowser
control and you may be surprised:

<code>
Dim doc As mshtml.HTMLDocu ment
Dim doc2 As mshtml.IHTMLDoc ument2
Dim elem As mshtml.IHTMLEle ment

Dim dt As Date

MsgBox("Start")

dt = Now

For i As Integer = 1 To 1000
doc = DirectCast(AxWe bBrowser1.Docum ent, mshtml.HTMLDocu ment)
elem = doc.createEleme nt("INPUT")
Next i

MsgBox(Now.Subt ract(dt).ToStri ng)

dt = Now

For i As Integer = 1 To 1000
doc2 = DirectCast(AxWe bBrowser1.Docum ent, mshtml.IHTMLDoc ument2)
elem = doc2.createElem ent("INPUT")
Next i

MsgBox(Now.Subt ract(dt).ToStri ng)
</code>

I used mshtml.HTMLDocu ment once in the earlier post because New doesn't work
on interfaces of course. But otherwise I use the interfaces. It means a bit
more code to cast to the correct one all the time [long live Option Strict
On], but it's worth it in performance.

Regards

Charles
"Cor" <no*@non.com> wrote in message
news:3f******** *************** @reader22.wxs.n l...
Charles,
Just a question, I have seen you uses always the mshtml.IHtmldoc ument2
I use the mshtml.Htmldocu ment.
I have the idea, that with that I can access all <tags> including the src,
innertext and innerhtml etc per framepage.

What do I mis?
Cor

Nov 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
3797
by: ASP .NET Newbie | last post by:
How can I run a WebBrowser control using ASP.NET/VB.NET? I know I can use the WebClient to get the page data, but I need to be able to use the WebBrowser (AxWebBrowser)? Thanks, Chad
4
12623
by: Toma Marinov | last post by:
Hello ! I made some test with webbrowser control in VS.2005. When I load a word document in webbrowser through .Navigate method (from my hdd), I want to get the stream of the loaded doc file with .DocumentStream, but this property is null. The property .DocumentText is = "" too. Am I missing something ? Thank you very much !
1
2072
by: eskildb | last post by:
First, please be gently. I am fairly new to the programming world (1.5 years with some expermentation prior to). I have been working on a project that has to print HTML pages with graphics in a unattended automated fashion. I have a webbrowser that is created with code but not seen. I found the below code on the internet. It creates a...
1
2881
by: eskildb | last post by:
First, please be gently. I am fairly new to the programming world (1.5 years with some expermentation prior to). I have been working on a project that has to print HTML pages with graphics in a unattended automated fashion. I have a webbrowser that is created with code but not seen. I found the below code on the internet. It creates a...
12
6344
by: Alex Clark | last post by:
Greetings, (.NET 2.0, WinXP Pro/Server 2003, IE6 with latest service packs). I've decided to take advantage of the layout characteristics of HTML documents to simplify my printing tasks, but of course it's thrown up a whole host of new issues... I'm generating a multi page printable document in HTML from my app, and displaying it in a...
4
11992
by: Steve Richter | last post by:
I would like to build an HTML stream as a string and have the WebBrowser control render that HTML. Then on PostBack, or whatever it is called, I would like my code to be the one that receives what the WebBrowser control is sending. Effectively, my code would be the web server and the WebBrowser control would be the web client. All the...
6
7537
by: titan.nyquist | last post by:
The WebBrowser control won't load a css file written in the same directory as the program. If I put an absolute path to it, it will load it. Thus, the current directory of the WebBrowser control isn't the current directory of the program. What is the current directory? I don't want to use an absolute path, since its contents are saved to...
2
1593
by: scottbvfx | last post by:
Hi, I'm trying to launch a web browser along with an html file with a fragment identifier in its path. I'm using the webbrowser module for this. ie. webbrowser.open('file:///C:/myfile.html#SomeEntryInTheHTML') for some reason it is truncating the path to 'file:///C:/myfile.html'. Does anyone know a way of getting the fragment...
0
3248
by: =?Utf-8?B?Q29kZVJhem9y?= | last post by:
I am converting a windows application which contains a web browser control into an ASP.net application. The Windows project references all manner of html controls in the WebBrowser control and retrieves values using the HtmlControlCollection class etc. I was hoping I was going to be able to work with the Windows.Forms.WebBrowser object...
0
7915
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
8339
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7967
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6619
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5712
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3840
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2347
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1452
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1185
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.