Connecting Tech Pros Worldwide Forums | Help | Site Map

Output html file to text file

jimleon's Avatar
Member
 
Join Date: Nov 2006
Posts: 65
#1: Oct 7 '09
Hi People,

I have a link in my vba code:

Application.FollowHyperlink "http://www.xxxxxx.co.uk/"

but I dont want to open a browser (or show it minimised) but need to dump the resultant html page to a text file to search for a string.

I have tried:
Application.FollowHyperlink "http://www.xxxxxx.co.uk/" >c:\temp.txt

but that doesnt work.

Any ideas?
best answer - posted by Delerna
Hi Neopa
Yes I normally use option explict, but the code is something that I threw together in about 15 minutes, for the sole purpose of answering the posted question. So to answer your question I did not use Option explicit and the posted code is complete.

I hope I can be forgiven for the bad programming practice because, I can try and excuse it with the simplicity of the program but there is no excuse in reality :{
No offence is taken, you are just....right!

As to the "Microsoft Internet Controls" library, that is definitely the one.
I also have a reference to the "Microsoft HTML Object" Library, so I tried removing the reference to the "Microsoft Internet Controls" and then checked intellisense by typing a space immediately after "As" in

Private ieBrowser As InternetExplorer

and there was no InternetExplorer in the list.
so its definitely "Microsoft Internet Controls"

The code is actually written from ideas presented on a few sites I found with google. Much of the code was copy paste as evidenced by the variable you point out sDocHTML. My normal practice would have it as strDocHTML
Anyway, here is a cleaned up version of the code

Expand|Select|Wrap|Line Numbers
  1. Option Compare Database
  2. Option Explicit
  3.  
  4. Private ieBrowser As InternetExplorer
  5.  
  6. Private Sub Form_Load()
  7.    Dim strDocHTML As String, dteStartTime As Date
  8.    'Create a browser object
  9.    Set ieBrowser = CreateObject("internetexplorer.application")
  10.    ieBrowser.Navigate "http://www.delerna.com/Index.asp"
  11.  
  12.    'Wait for the page to load. Exit Form_load sub, doing nothing, if loading the page takes too long
  13.    dteStartTime = Now
  14.    Do While ieBrowser.readyState <> READYSTATE_COMPLETE
  15.       If DateDiff("s", dteStartTime, Now) > 240 Then Exit Sub
  16.    Loop
  17.  
  18.    'Get the page contents
  19.    strDocHTML = ieBrowser.Document.documentElement.innerHTML
  20.  
  21.    'And save it
  22.    Open "c:\Test.txt" For Output As 1
  23.    Print #1, strDocHTML
  24.    Close #1
  25.  
  26.    'destroy the browser object
  27.    Set ieBrowser = Nothing
  28. End Sub
  29.  

NeoPa's Avatar
Administrator
 
Join Date: Oct 2006
Location: London - UK
Posts: 15,747
#2: Oct 7 '09

re: Output html file to text file


I cannot help much, other than to point you towards the Microsoft Web Browser OLE class. It needs a Reference set up to the Microsoft HTML Reference Library.

Hope this helps.

I'm pretty sure using the .FollowHyperlink won't get you what you want.
Delerna's Avatar
Expert
 
Join Date: Jan 2008
Location: Sydney
Posts: 790
#3: Oct 16 '09

re: Output html file to text file


Here is my attempt. First time I have done this and I like it and I will use it.
Its something I have contemplated for a quite while but never got around to it.
I have kept the code as simple as possible so its workings are obvious.
You will need to add error checking etc etc.


Step 1
add a reference to the "Microsoft Internet Controlls" type library

Step 2
add the code and modify for your situation
Expand|Select|Wrap|Line Numbers
  1. Private ieBrowser As InternetExplorer
  2.  
  3. Private Sub Form_Load()
  4.    'Create a browser object
  5.    Set ieBrowser = CreateObject("internetexplorer.application")
  6.    ieBrowser.Navigate "http://www.delerna.com/Index.asp"
  7.  
  8.    'Wait for the page to load
  9.    dtStartTime = Now
  10.    Do While ieBrowser.readyState <> READYSTATE_COMPLETE
  11.       If DateDiff("s", dtStartTime, Now) > 240 Then Exit Sub
  12.    Loop
  13.  
  14.    'Get the page contents
  15.    sDocHTML = ieBrowser.Document.documentElement.innerHTML
  16.  
  17.    'And save it
  18.    Open "c:\Test.txt" For Output As 1
  19.    Print #1, sDocHTML
  20.    Close #1
  21.  
  22.    'destroy the browser object
  23.    Set ieBrowser = Nothing
  24. End Sub
  25.  
Good luck and thanks for influencing me to actually sit down and finally do it.
NeoPa's Avatar
Administrator
 
Join Date: Oct 2006
Location: London - UK
Posts: 15,747
#4: Oct 16 '09

re: Output html file to text file


Quote:

Originally Posted by Delerna View Post

Step 1
add a reference to the "Microsoft Internet Controlls" type library

I found a Microsoft HTML Object Library reference Delerna, but nothing similar to that :S
NeoPa's Avatar
Administrator
 
Join Date: Oct 2006
Location: London - UK
Posts: 15,747
#5: Oct 16 '09

re: Output html file to text file


Like you I think I'm ready to start playing in this area. I'd be interested in following your code more closely, but I find there are items which are not declared (sDocHTML, etc). Is this because it's declared elsewhere or do you not have Option Explicit set? I ask this not to criticise you understand, but if you don't use this as standard then may I suggest you reconsider that approach. I have a short article on the matter here (Require Variable Declaration).

Again, I expect you may be doing this already and just omitted a couple of lines of your code. In that case please just ignore this (but I'm interested in the full code anyway ;)).
Delerna's Avatar
Expert
 
Join Date: Jan 2008
Location: Sydney
Posts: 790
#6: Oct 18 '09

re: Output html file to text file


Hi Neopa
Yes I normally use option explict, but the code is something that I threw together in about 15 minutes, for the sole purpose of answering the posted question. So to answer your question I did not use Option explicit and the posted code is complete.

I hope I can be forgiven for the bad programming practice because, I can try and excuse it with the simplicity of the program but there is no excuse in reality :{
No offence is taken, you are just....right!

As to the "Microsoft Internet Controls" library, that is definitely the one.
I also have a reference to the "Microsoft HTML Object" Library, so I tried removing the reference to the "Microsoft Internet Controls" and then checked intellisense by typing a space immediately after "As" in

Private ieBrowser As InternetExplorer

and there was no InternetExplorer in the list.
so its definitely "Microsoft Internet Controls"

The code is actually written from ideas presented on a few sites I found with google. Much of the code was copy paste as evidenced by the variable you point out sDocHTML. My normal practice would have it as strDocHTML
Anyway, here is a cleaned up version of the code

Expand|Select|Wrap|Line Numbers
  1. Option Compare Database
  2. Option Explicit
  3.  
  4. Private ieBrowser As InternetExplorer
  5.  
  6. Private Sub Form_Load()
  7.    Dim strDocHTML As String, dteStartTime As Date
  8.    'Create a browser object
  9.    Set ieBrowser = CreateObject("internetexplorer.application")
  10.    ieBrowser.Navigate "http://www.delerna.com/Index.asp"
  11.  
  12.    'Wait for the page to load. Exit Form_load sub, doing nothing, if loading the page takes too long
  13.    dteStartTime = Now
  14.    Do While ieBrowser.readyState <> READYSTATE_COMPLETE
  15.       If DateDiff("s", dteStartTime, Now) > 240 Then Exit Sub
  16.    Loop
  17.  
  18.    'Get the page contents
  19.    strDocHTML = ieBrowser.Document.documentElement.innerHTML
  20.  
  21.    'And save it
  22.    Open "c:\Test.txt" For Output As 1
  23.    Print #1, strDocHTML
  24.    Close #1
  25.  
  26.    'destroy the browser object
  27.    Set ieBrowser = Nothing
  28. End Sub
  29.  
NeoPa's Avatar
Administrator
 
Join Date: Oct 2006
Location: London - UK
Posts: 15,747
#7: Oct 18 '09

re: Output html file to text file


Perfectly timed Delerna :)

I was just answering a thread where the OP wanted information about the public facing IP address they were published as (I know. Don't even ask). I tried to explain why this was less straightforward than they imagined but ...

Anyway, I wanted to post a link to this thread but I was struggling to find it again. At this point you posted. Nice one!

FYI: The other thread is Anyone out there have a clean Get_External_IP_Address function?.
Delerna's Avatar
Expert
 
Join Date: Jan 2008
Location: Sydney
Posts: 790
#8: Oct 18 '09

re: Output html file to text file


Where to get "Microsoft Internet Controls"
I had the impression from the web sites that I visited that it was included with access. I have Access 2003.
I rarely use Access these days, we use asp and html as front ends to SQL server. Thats part of the reason I come here to answer questions...it maintains my access skillset....Access is a great tool and it also happens to be where I learned much of what I know.

Anyway, if it's not part of access then I also have installed on my computer.
Dot Net Framework SDK v2.0
Visual Web Dexeloper 2008 Express
and Visual Studio Pro 2005

Maybe it came from one of those?
NeoPa's Avatar
Administrator
 
Join Date: Oct 2006
Location: London - UK
Posts: 15,747
#9: Oct 18 '09

re: Output html file to text file


Quote:

Originally Posted by Delerna View Post

As to the "Microsoft Internet Controls" library, that is definitely the one.
I also have a reference to the "Microsoft HTML Object" Library, so I tried removing the reference to the "Microsoft Internet Controls" and then checked intellisense by typing a space immediately after "As" in

Private ieBrowser As InternetExplorer

and there was no InternetExplorer in the list.
so its definitely "Microsoft Internet Controls"

I did some searching and it seems that the Internet Client SDK is required for that reference.
Quote:

Originally Posted by Microsoft Internet Controls Life Saver

Microsoft Internet Controls Life Saver
The Internet Client SDK can be downloaded from http://www.microsoft.com/ie/ie50

Thanks for your clearly explained answer. That has helped me find what I needed to proceed on this :) It should also help the OP of the linked thread. Bonus!
Delerna's Avatar
Expert
 
Join Date: Jan 2008
Location: Sydney
Posts: 790
#10: 5 Days Ago

re: Output html file to text file


I had a need to download the content of a web page on a regular basis so that I could monitor the current state of some facts and figures. So I wrote a vbscript version of the above so I could schedule it. I don't know if anyone is interested but I thought I would post it here.
The conversion was not very difficult, so you probably could have figured it out anyway.

Expand|Select|Wrap|Line Numbers
  1. const READYSTATE_COMPLETE=4
  2.  
  3. Dim ieBrowser  
  4.    'Create a browser object 
  5.    Set ieBrowser = CreateObject("internetexplorer.application") 
  6.    SavePageContent("c:\scripts\Test.txt","page URL")
  7.    Set ieBrowser = Nothing   'destroy the browser object 
  8.    MsgBox "Done"
  9.  
  10.  
  11. sub SavePageContent(Pth,Pge)
  12.    Dim strDocHTML, dteStartTime
  13.  
  14.    ieBrowser.Navigate Pge 
  15.  
  16.    'Wait for the page to load. Exit sub, doing nothing, if loading the page takes too long 
  17.    dteStartTime = Now 
  18.    Do While ieBrowser.readyState <> READYSTATE_COMPLETE 
  19.       If DateDiff("s", dteStartTime, Now) > 1000 Then Exit Sub 
  20.    Loop 
  21.  
  22.    'And save it 
  23.    Dim fso, MyFile
  24.    Set fso = CreateObject("Scripting.FileSystemObject")
  25.    Set MyFile = fso.CreateTextFile(Pth)
  26.    MyFile.WriteLine(ieBrowser.Document.documentElement.innerHTML )
  27.    MyFile.Close
  28.    set MyFile=nothing
  29.    set fso=nothing
  30. end sub
  31.  
Reply