473,385 Members | 1,907 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

heavy problem with HTMLDocument

Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?
Nov 20 '05 #1
8 8197
In article <00****************************@phx.gbl>, pr******@ina.fr
says...
Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)


Use the built-in .NET networking objects. See:

http://tinyurl.com/98ey

--
Patrick Steele
Microsoft .NET MVP
http://weblogs.asp.net/psteele
Nov 20 '05 #2
Thank you, Patrick
I've just read the article...
but it doesn't seems that it can help me to parse the
html... using mshtml.HTMLDocument, I though I could use the
"links" property which is supposed to give an access to
links in html...

In article <00****************************@phx.gbl>, pr******@ina.frsays...
Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)


Use the built-in .NET networking objects. See:

http://tinyurl.com/98ey

--
Patrick Steele
Microsoft .NET MVP
http://weblogs.asp.net/psteele
.

Nov 20 '05 #3
Cor
Pierre,
I' never seen this methode, so I am curious if it works, but that is not in
one time.

I will advise you to take a look at the "webbrowser" with that you can
"navigate" to an URL
(It uses Internet explorer 6, don't ask me how)

Then with the "documentscomplete" events from the "webbrowser" you can get
the documents conform the dom.

When there is a frame's there is for every frame a document.
There is too a navigate-complete, but with that you get only the last page
downloaded

That's why I find the methode you use strange, but I saw it too in the
documentation

I hope I did bring you in the right direction.
It is to much to give a quick example.

And the webbrowser is only one of the methode's I think you can use, but
that I use for this things at the moment.

I hope it helps you a little bit.
Cor

Nov 20 '05 #4
In article <18****************************@phx.gbl>, pr******@ina.fr
says...
Thank you, Patrick
I've just read the article...
but it doesn't seems that it can help me to parse the
html... using mshtml.HTMLDocument, I though I could use the
"links" property which is supposed to give an access to
links in html...


Sorry -- forgot about your parsing issue.

Perhaps you could get the raw HTML using the .NET WebRequest and then
feed that into the mshtml.HTMLDocument object. I've never used that
object before so I'm not sure if you can load it with your own HTML.

--
Patrick Steele
Microsoft .NET MVP
http://weblogs.asp.net/psteele
Nov 20 '05 #5
Hi Pierre

The problem is that although you create a new mshtml.HTMLDocument, it is not
being initialised.

Try the following:

<code>
Dim objMSHTML As New mshtml.HTMLDocument
Dim objDocument As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit

ips = DirectCast(objMSHTML, IPersistStreamInit)
ips.InitNew()

objDocument = objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop

Debug.WriteLine(objDocument.body.outerHTML)
</code>

At the end of this you can access the DOM. Note that you need to define the
IPersistStreamInit interface.

HTH

Charles
"pierre" <pr******@ina.fr> wrote in message
news:00****************************@phx.gbl...
Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?
Nov 20 '05 #6
Pierre

In case you don't have it, here is the IPersistStreamInit interface
definition

<code>
Imports System.Runtime.InteropServices

<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.InterfaceI sIUnknown)> _
Public Interface IPersistStreamInit
' IPersist interface
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByVal
fClearDirty As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</code>

HTH

Charles
"Charles Law" <bl**@thingummy.com> wrote in message
news:%2***************@TK2MSFTNGP11.phx.gbl...
Hi Pierre

The problem is that although you create a new mshtml.HTMLDocument, it is not being initialised.

Try the following:

<code>
Dim objMSHTML As New mshtml.HTMLDocument
Dim objDocument As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit

ips = DirectCast(objMSHTML, IPersistStreamInit)
ips.InitNew()

objDocument = objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop

Debug.WriteLine(objDocument.body.outerHTML)
</code>

At the end of this you can access the DOM. Note that you need to define the IPersistStreamInit interface.

HTH

Charles
"pierre" <pr******@ina.fr> wrote in message
news:00****************************@phx.gbl...
Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?

Nov 20 '05 #7
Cor
Charles,
Thanks, saves me a lot of time looking this up.
Cor
Nov 20 '05 #8
Thanks a lot, it works perfectly :)
P.
Pierre

In case you don't have it, here is the IPersistStreamInit interfacedefinition

<code>
Imports System.Runtime.InteropServices

<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.Interface IsIUnknown)> _Public Interface IPersistStreamInit
' IPersist interface
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer <PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByValfClearDirty As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</code>

HTH

Charles
"Charles Law" <bl**@thingummy.com> wrote in message
news:%2***************@TK2MSFTNGP11.phx.gbl...
Hi Pierre

The problem is that although you create a new mshtml.HTMLDocument, it is
not
being initialised.

Try the following:

<code>
Dim objMSHTML As New mshtml.HTMLDocument
Dim objDocument As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit

ips = DirectCast(objMSHTML, IPersistStreamInit)
ips.InitNew()

objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr", vbNullString)

Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop

Debug.WriteLine(objDocument.body.outerHTML)
</code>

At the end of this you can access the DOM. Note that you

need to definethe
IPersistStreamInit interface.

HTH

Charles
"pierre" <pr******@ina.fr> wrote in message
news:00****************************@phx.gbl...
Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?

.

Nov 20 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Kookymon1 | last post by:
This is an attempt to respond to an older question (several months). Date: 2002-03-07 13:10:23 PST Subject: On the Common DOM API and Applets. The original message was: >LiveConnect and the...
9
by: James Marshall | last post by:
I'm writing a library where I want to override document.write(), but for all document objects; thus, I want to put it in the prototype. I tried Document.prototype.write= my_doc_write ; but it...
2
by: plmanikandan | last post by:
Hi, I need to integrate the browser with my C# windows application. When I search thru the websites for this,I found SHDocVw.dll is needed for integrating Web browser into c# application.I...
0
by: Irfan | last post by:
Hello, I want to load HTML file into HTMLDocument object. I don't want to use webbrowser object or any asyncrohonous call to load HTML into this file. Like if I call HTTPWebRequest to download...
0
by: forcedfx | last post by:
I'm faced with a bit of a conundrum. I'm trying to post a form using the HTMLDocument object. I've got the form posting working prefectly, however, in order to retrieve the HTML page that contains...
5
by: Jeff | last post by:
Is there a standard way of getting the HTMLDocument object representation of a remote page using Javascript? If I request an HTML page, the xmlHttpRequest returns either text or an XMLDocument. I...
0
by: nickin4u | last post by:
I have a application that is used to automate certain task, I have been using mshtml.HTMLDocument class but certain events like click a button do not fire. I have tried a number of combinations but...
3
by: sam6 | last post by:
Hi, I have developed a small dll class which extracts me the innerhtml of a htmldocument.I am using mshtml lib in achieving the same.The code runs fine when using from another EXE.When I try to...
2
by: CSharper | last post by:
Is there a class I can use which loads HtmlDocument and performs default html validation to see if the document is a valid html document like XDocument? HtmlDocument seems to me only used to create...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.