By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,997 Members | 1,470 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,997 IT Pros & Developers. It's quick & easy.

Extracting Data from IE

P: n/a
Hi,

I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE

'// Find text to extract
txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
txt = IeApp.Document.GetElementByID("a2copy").innerhtml

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove. I've looked at the object model but not real sure what I
should be looking for, so wondering if anyone here can spare a bit of
time to provide a pointer. For example, is there a tag that would more
easily refer to the required text?

Many thanks in advance if you can share some advice or guidance.
Regards,
Chris Adams

Oct 30 '06 #1
Share this Question
Share on Google+
2 Replies


P: n/a
ch***********@hotmail.com wrote:
I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE

'// Find text to extract
txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
txt = IeApp.Document.GetElementByID("a2copy").innerhtml

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove.
So you are not using JavaScript at all but you are automating Internet
Explorer with VBA. The IE object model for HTML documents is documented
here:
<http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/dhtml_reference_entry.asp>

You might be after the |innerText| property instead of the |innerHTML|
property of element objects. Or you might want to look at specific child
or descendant nodes of an element you have found with getElementById.

For instance
IeApp.Document.getElementById("a2copy")
gives you a div element object which then has other nodes (e.g. table
element) as child nodes. Once you have an element node you can access
its |firstChild|, |lastChild|, |childNodes| collection, you can call
|getElementsByTagName| on the element to find descendant elements of a
certain tag name.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Oct 30 '06 #2

P: n/a
ch***********@hotmail.com said the following on 10/30/2006 11:54 AM:
Hi,

I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE

'// Find text to extract
txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
txt = IeApp.Document.GetElementByID("a2copy").innerhtml

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove. I've looked at the object model but not real sure what I
should be looking for, so wondering if anyone here can spare a bit of
time to provide a pointer. For example, is there a tag that would more
easily refer to the required text?
Your code is written in VB (naturally) and you are in a Javascript
Newsgroup. That aside, the question you have to answer first is what do
a2title and a2copy refer to? And, since you are scripting IE you can
look into the IE only innerText to get just the text if you don't want
the HTML code that goes with it. Not sure if innerText is valid in VBA
or not though.

microsoft.public.word.vba might be a better group to ask about Word/VBA.

--
Randy
Chance Favors The Prepared Mind
comp.lang.javascript FAQ - http://jibbering.com/faq
Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
Oct 30 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.