How do I load a HTML page (via URL) and parse the DOM in a Console
Application?
I've successfully done all this in a Windows Application by using the
WebBrowser control, calling the Navigate method on the specified URL, and
then, within the DocumentComplete event, parsing the HTML page using
mshtml.HTMLDocument.
I'm writing it as a console app because I don't need to display the HTML,
just search for a specific tag and retrieve a href value from it.
Thanks for any help on this. 9 6818
"John Williams" <jo******************@NOhotmailSPAM.com> wrote in message
news:%2***************@TK2MSFTNGP11.phx.gbl... How do I load a HTML page (via URL) and parse the DOM in a Console Application?
I found the following thread (note the * at the end is part of the URL) http://groups.google.com/groups?hl=e...languages.vb.*
but was unable to make the solution by Charles Law work on my m/c (I have
defined the IPersistStreamInit interface). In my code the readstate is
always 'loading' and therefore it loops indefinitely at:
Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop
You may be interested in this article, albeit with examples in C# http://www.vsj.co.uk/articles/display.asp?id=389
"John Williams" wrote: How do I load a HTML page (via URL) and parse the DOM in a Console Application?
I've successfully done all this in a Windows Application by using the WebBrowser control, calling the Navigate method on the specified URL, and then, within the DocumentComplete event, parsing the HTML page using mshtml.HTMLDocument.
I'm writing it as a console app because I don't need to display the HTML, just search for a specific tag and retrieve a href value from it.
Thanks for any help on this.
"Charles Law" <bl***@nowhere.com> wrote in message
news:uF**************@TK2MSFTNGP10.phx.gbl... Hi John
I have made a simple console app that demonstrates the loading of HTML
from a url, based on the thread you found below. It works on my m/c, but gives
an unrelated error about being unable to set focus. Just ignore the error and it will continue normally.
Let me know if you have problems getting the zip file and I will mail it instead.
HTH
Charles, thanks for your reply and the sample code. Your code works fine
when run in the VS IDE, however when run from a command window it sits in
the loop:
Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop
because readyState is "loading", then "uninitialized", never "complete". If
I comment out Application.DoEvents(), readyState stays "loading". I don't
understand this!
Thanks.
"Rowland Shaw" <Ro*********@discussions.microsoft.com> wrote in message
news:5B**********************************@microsof t.com... You may be interested in this article, albeit with examples in C#
http://www.vsj.co.uk/articles/display.asp?id=389
"John Williams" wrote:
How do I load a HTML page (via URL) and parse the DOM in a Console Application?
I've successfully done all this in a Windows Application by using the WebBrowser control, calling the Navigate method on the specified URL,
and then, within the DocumentComplete event, parsing the HTML page using mshtml.HTMLDocument.
Thanks Rowland, it looks promising, particularly the use of HttpWebRequest
and HttpWebResponse to get the web page in the first place. I'll have a
play around with the VB version.
Thanks again for responding.
Hi John
Unfortunately I don't get the same problem. I opened a command window and
ran the executable. I have ZoneAlarm running, so it warned me that the
application was trying to access the internet. I allowed it to continue and
then I got an error about setting focus (as I mentioned). I clicked on No
and the command window filled with the HTML.
I am running XP Pro with SP2, and .NET Framework 1.1 SP1. I also have IE6
installed. What are you running with?
Charles
"John Williams" <jo******************@NOhotmailSPAM.com> wrote in message
news:eC**************@TK2MSFTNGP10.phx.gbl... "Charles Law" <bl***@nowhere.com> wrote in message news:uF**************@TK2MSFTNGP10.phx.gbl... Hi John
I have made a simple console app that demonstrates the loading of HTML from a url, based on the thread you found below. It works on my m/c, but gives an unrelated error about being unable to set focus. Just ignore the error and it will continue normally.
Let me know if you have problems getting the zip file and I will mail it instead.
HTH
Charles, thanks for your reply and the sample code. Your code works fine when run in the VS IDE, however when run from a command window it sits in the loop: Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop
because readyState is "loading", then "uninitialized", never "complete". If I comment out Application.DoEvents(), readyState stays "loading". I don't understand this!
Thanks.
Just start with a windows app, then delete the code that the wizard
generates, and put the code that you normally get from the
console wizard, because I don't think you will be saving anything
by not using a window, the .net overhead is there whether you
create windows or not, I think?
"John Williams" <jo******************@NOhotmailSPAM.com> wrote in message
news:%2***************@TK2MSFTNGP11.phx.gbl... How do I load a HTML page (via URL) and parse the DOM in a Console Application?
I've successfully done all this in a Windows Application by using the WebBrowser control, calling the Navigate method on the specified URL, and then, within the DocumentComplete event, parsing the HTML page using mshtml.HTMLDocument.
I'm writing it as a console app because I don't need to display the HTML, just search for a specific tag and retrieve a href value from it.
Thanks for any help on this.
Hi Charles,
After more investigation, my Debug version works fine from a command window.
It's my Release version which sits in the loop, which probably means
something isn't being initialised. I then found this: http://www.google.com/groups?hl=zh-c...TNGP10.phx.gbl
which says:
<quote>
I then checked the ReadyState property in a loop, and it was
returning 1 ("loading") all the time.
I tracked the problem down to my CoInitialize() call. The plain old
CoInitialize(NULL) didn't work but when I replaced it with the following,
everything started working fine:
CoInitializeEx(NULL,COINIT_MULTITHREADED);
</quote>
Do you know how to implement or call (?) CoInitializeEx in a VB .Net
program, if in fact that is what I need?
Thanks.
"Charles Law" <bl***@nowhere.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl... Hi John
Unfortunately I don't get the same problem. I opened a command window and ran the executable. I have ZoneAlarm running, so it warned me that the application was trying to access the internet. I allowed it to continue
and then I got an error about setting focus (as I mentioned). I clicked on No and the command window filled with the HTML.
I am running XP Pro with SP2, and .NET Framework 1.1 SP1. I also have IE6 installed. What are you running with?
Charles
"John Williams" <jo******************@NOhotmailSPAM.com> wrote in message news:eC**************@TK2MSFTNGP10.phx.gbl... "Charles Law" <bl***@nowhere.com> wrote in message news:uF**************@TK2MSFTNGP10.phx.gbl... Hi John
I have made a simple console app that demonstrates the loading of HTML from a url, based on the thread you found below. It works on my m/c, but
gives an unrelated error about being unable to set focus. Just ignore the error and it will continue normally.
Let me know if you have problems getting the zip file and I will mail
it instead.
HTH
Charles, thanks for your reply and the sample code. Your code works
fine when run in the VS IDE, however when run from a command window it sits
in the loop: Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop
because readyState is "loading", then "uninitialized", never "complete". If I comment out Application.DoEvents(), readyState stays "loading". I
don't understand this!
Thanks.
Hi John
Yes, I see what you mean. I have modified the application slightly so it now
works in release build outside the IDE. I have removed the DoEvents because
that requires the Windows forms assembly, and HTML documents are loaded
asynchronously (on another thread), so all we need really is to set the
apartment to multithreaded and then go to sleep in the loop while we are
waiting for the document to load.
HTH
Charles
"John Williams" <jo******************@NOhotmailSPAM.com> wrote in message
news:eA*************@tk2msftngp13.phx.gbl... Hi Charles,
After more investigation, my Debug version works fine from a command window. It's my Release version which sits in the loop, which probably means something isn't being initialised. I then found this:
http://www.google.com/groups?hl=zh-c...TNGP10.phx.gbl
which says: <quote> I then checked the ReadyState property in a loop, and it was returning 1 ("loading") all the time.
I tracked the problem down to my CoInitialize() call. The plain old CoInitialize(NULL) didn't work but when I replaced it with the following, everything started working fine:
CoInitializeEx(NULL,COINIT_MULTITHREADED); </quote>
Do you know how to implement or call (?) CoInitializeEx in a VB .Net program, if in fact that is what I need?
Thanks.
"Charles Law" <bl***@nowhere.com> wrote in message news:%2****************@TK2MSFTNGP12.phx.gbl... Hi John
Unfortunately I don't get the same problem. I opened a command window and ran the executable. I have ZoneAlarm running, so it warned me that the application was trying to access the internet. I allowed it to continue and then I got an error about setting focus (as I mentioned). I clicked on No and the command window filled with the HTML.
I am running XP Pro with SP2, and .NET Framework 1.1 SP1. I also have IE6 installed. What are you running with?
Charles
"John Williams" <jo******************@NOhotmailSPAM.com> wrote in message news:eC**************@TK2MSFTNGP10.phx.gbl... > "Charles Law" <bl***@nowhere.com> wrote in message > news:uF**************@TK2MSFTNGP10.phx.gbl... >> Hi John >> >> I have made a simple console app that demonstrates the loading of HTML > from >> a url, based on the thread you found below. It works on my m/c, but gives > an >> unrelated error about being unable to set focus. Just ignore the error >> and >> it will continue normally. >> >> Let me know if you have problems getting the zip file and I will mail it >> instead. >> >> HTH > > Charles, thanks for your reply and the sample code. Your code works fine > when run in the VS IDE, however when run from a command window it sits in > the loop: > Do Until objDocument.readyState = "complete" > > Application.DoEvents() > > Loop > > because readyState is "loading", then "uninitialized", never > "complete". > If > I comment out Application.DoEvents(), readyState stays "loading". I don't > understand this! > > Thanks. > >
Thank you, Charles, that works perfectly now :)
I've come up with another version which uses HTTPWebRequest/HTTPWebResponse,
which has the advantage of providing a timeout property, though a timeout
would be easy to implement in your version. I'm not sure of the pros and
cons of either method but it was an interesting exercise!
Thanks again for replying and helping out.
"Charles Law" <bl***@nowhere.com> wrote in message
news:uy**************@TK2MSFTNGP10.phx.gbl... Hi John
Yes, I see what you mean. I have modified the application slightly so it
now works in release build outside the IDE. I have removed the DoEvents
because that requires the Windows forms assembly, and HTML documents are loaded asynchronously (on another thread), so all we need really is to set the apartment to multithreaded and then go to sleep in the loop while we are waiting for the document to load.
HTH
Charles
"John Williams" <jo******************@NOhotmailSPAM.com> wrote in message news:eA*************@tk2msftngp13.phx.gbl... Hi Charles,
After more investigation, my Debug version works fine from a command window. It's my Release version which sits in the loop, which probably means something isn't being initialised. I then found this:
http://www.google.com/groups?hl=zh-c...TNGP10.phx.gbl which says: <quote> I then checked the ReadyState property in a loop, and it was returning 1 ("loading") all the time.
I tracked the problem down to my CoInitialize() call. The plain old CoInitialize(NULL) didn't work but when I replaced it with the
following, everything started working fine:
CoInitializeEx(NULL,COINIT_MULTITHREADED); </quote>
Do you know how to implement or call (?) CoInitializeEx in a VB .Net program, if in fact that is what I need?
Thanks.
"Charles Law" <bl***@nowhere.com> wrote in message news:%2****************@TK2MSFTNGP12.phx.gbl... Hi John
Unfortunately I don't get the same problem. I opened a command window
and ran the executable. I have ZoneAlarm running, so it warned me that the application was trying to access the internet. I allowed it to continue and then I got an error about setting focus (as I mentioned). I clicked on
No and the command window filled with the HTML.
I am running XP Pro with SP2, and .NET Framework 1.1 SP1. I also have
IE6 installed. What are you running with?
Charles
"John Williams" <jo******************@NOhotmailSPAM.com> wrote in
message news:eC**************@TK2MSFTNGP10.phx.gbl... > "Charles Law" <bl***@nowhere.com> wrote in message > news:uF**************@TK2MSFTNGP10.phx.gbl... >> Hi John >> >> I have made a simple console app that demonstrates the loading of
HTML > from >> a url, based on the thread you found below. It works on my m/c, but gives > an >> unrelated error about being unable to set focus. Just ignore the
error >> and >> it will continue normally. >> >> Let me know if you have problems getting the zip file and I will
mail it >> instead. >> >> HTH > > Charles, thanks for your reply and the sample code. Your code works fine > when run in the VS IDE, however when run from a command window it
sits in > the loop: > Do Until objDocument.readyState = "complete" > > Application.DoEvents() > > Loop > > because readyState is "loading", then "uninitialized", never > "complete". > If > I comment out Application.DoEvents(), readyState stays "loading". I don't > understand this! > > Thanks. > >
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: William Stacey [MVP] |
last post by:
Anyone know of some library that will parse files like following:
options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225;...
|
by: Flip |
last post by:
I know the int.Parse("123") will result in an int of 123, but what happens
with a null? I believe it give a null exception (seems like I get either
NullArgumentException or ArgumentNullException...
|
by: Charles Law |
last post by:
Does anyone have a regex pattern to parse HTML from a stream?
I have a well structured file, where each line is of the form
<sometag someattribute='attr'>text</sometag>
for example
<SPAN...
|
by: thompson_38 |
last post by:
I am currently working on a C# application that uses an axWebBrowser
control which displays HTML pages. This is being done to reuse an
existing web application which resides on a client's...
|
by: seigo |
last post by:
Hi there,
I use the following code to transform xml to html document:
try
{
XPathDocument myXPathDoc = new XPathDocument(sXmlPath);
XslTransform myXslTrans = new XslTransform();...
|
by: gs |
last post by:
let say I have to deal with various date format and I am give format string
from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
|
by: ak1dnar |
last post by:
Hi,
I have build up a client registration application with HTML/JSP/JS.
the form consist with two fields Name,Email.
Before insert these values to MySQL table i have done a validation...
|
by: AdrianH |
last post by:
Assumptions
I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C programming.
FYI
Although I have called this...
|
by: =?Utf-8?B?RGF2aWRN?= |
last post by:
Hello, I have an XML file generated from a third party application that I
would like to parse. Ideally, I plan on having a windows service setup to
scan various folders for XML files and parse the...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
|
by: Aliciasmith |
last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
|
by: tracyyun |
last post by:
Hello everyone,
I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
|
by: giovanniandrean |
last post by:
The energy model is structured as follows and uses excel sheets to give input data:
1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
|
by: NeoPa |
last post by:
Hello everyone.
I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report).
I know it can be done by selecting :...
|
by: Teri B |
last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course.
0ne-to-many. One course many roles.
Then I created a report based on the Course form and...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
|
by: isladogs |
last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, Mike...
| |