473,789 Members | 2,550 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Extracting Data from IE

Hi,

I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustra lia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyStat e = READYSTATE_COMP LETE

'// Find text to extract
txtTitle = IeApp.Document. GetElementByID( "a2title").inne rhtml
txt = IeApp.Document. GetElementByID( "a2copy").inner html

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove. I've looked at the object model but not real sure what I
should be looking for, so wondering if anyone here can spare a bit of
time to provide a pointer. For example, is there a tag that would more
easily refer to the required text?

Many thanks in advance if you can share some advice or guidance.
Regards,
Chris Adams

Oct 30 '06 #1
2 4014
ch***********@h otmail.com wrote:
I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustra lia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyStat e = READYSTATE_COMP LETE

'// Find text to extract
txtTitle = IeApp.Document. GetElementByID( "a2title").inne rhtml
txt = IeApp.Document. GetElementByID( "a2copy").inner html

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove.
So you are not using JavaScript at all but you are automating Internet
Explorer with VBA. The IE object model for HTML documents is documented
here:
<http://msdn.microsoft. com/library/default.asp?url =/workshop/author/dhtml/reference/dhtml_reference _entry.asp>

You might be after the |innerText| property instead of the |innerHTML|
property of element objects. Or you might want to look at specific child
or descendant nodes of an element you have found with getElementById.

For instance
IeApp.Document. getElementById( "a2copy")
gives you a div element object which then has other nodes (e.g. table
element) as child nodes. Once you have an element node you can access
its |firstChild|, |lastChild|, |childNodes| collection, you can call
|getElementsByT agName| on the element to find descendant elements of a
certain tag name.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Oct 30 '06 #2
ch***********@h otmail.com said the following on 10/30/2006 11:54 AM:
Hi,

I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustra lia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyStat e = READYSTATE_COMP LETE

'// Find text to extract
txtTitle = IeApp.Document. GetElementByID( "a2title").inne rhtml
txt = IeApp.Document. GetElementByID( "a2copy").inner html

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove. I've looked at the object model but not real sure what I
should be looking for, so wondering if anyone here can spare a bit of
time to provide a pointer. For example, is there a tag that would more
easily refer to the required text?
Your code is written in VB (naturally) and you are in a Javascript
Newsgroup. That aside, the question you have to answer first is what do
a2title and a2copy refer to? And, since you are scripting IE you can
look into the IE only innerText to get just the text if you don't want
the HTML code that goes with it. Not sure if innerText is valid in VBA
or not though.

microsoft.publi c.word.vba might be a better group to ask about Word/VBA.

--
Randy
Chance Favors The Prepared Mind
comp.lang.javas cript FAQ - http://jibbering.com/faq
Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
Oct 30 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
3000
by: Steve | last post by:
Hi, I have a very long string, someting like: DISPLAY=localhost:0.0,FORT_BUFFERED=true, F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis, GAMESS=/opt/gamess,GAUSS_ARCHDIR=/opt/g03b05/g03/arch, GAUSS_EXEDIR=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g
5
2956
by: Michael Hill | last post by:
Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past, I have used individual entry fields for each variable. I would now like to use text area boxes to simplify the data entry (this way, data can be produced by another program--FORTRAN, "C", etc.--but analyzed online, so long as it is first...
1
17193
by: v0lcan0 | last post by:
Any help on extracting the time part from the datetime field in SQL database. even though i had entered only the time part in the database when i extract the field it gives me only the date part. i’m using Vb.net datagrid as a front end. any assistance appreciated!! :?: --
3
1929
by: Alfred | last post by:
Hi I would like to extract only 15 records at a time from the backend in alfabetic order. Click on a button and then the next 15. Reason data must come over a 56k modem. The data is not alphabetticaly in database. Any ideas how to right such a function thanks alfred
0
3730
by: Nadav | last post by:
Hi, Introduction: *************************** I am using the MSI API to extract MSI embedded files, I do this by iterating through all of the records in the ‘_Streams’ table and dumping each to a local directory on the disk, the following illustrate this: 01)MsiDatabaseOpenView(m_hMSI, L"SELECT `Name`,`Data` FROM `_Streams`", &hStreamsView)
2
2817
by: Dickyb | last post by:
Extracting an Icon and Placing It On The Desktop (C# Language) I constructed a suite of programs in C++ several years ago that handle my financial portfolio, and now I have converted them to C#. The only significant problem that I have encountered in the conversion is this one - extracting an icon from the 'KTEntryPoint' program into the software suite and placing that icon on the PC Desktop.
13
3738
by: Randy | last post by:
Is there any way to do this? I've tried tellg() followed by seekg(), inserting the stream buffer to an ostringstream (ala os << is.rdbuf()), read(), and having no luck. The problem is, all of these methods EXTRACT the data at one point or another. The other problem is there appears to be NO WAY to get at the actual buffer pointer (char*) of the characters in the stream. There is a way to get the streambuf object associated with the...
0
1378
by: runner7 | last post by:
I used file_get_contents() to read a pdf into a string and then tried to extract the encoded part between the "stream" and "endstream" words using the strpos() and substr() functions. (I could not get preg_match() to work.) The substr() pulled it out, but read past the length I entered by 12 characters to include "endstream en". Besides that minor problem, I tried gzuncompress() on the extracted string which only generated a data error....
0
1481
by: sgsiaokia | last post by:
I need help in extracting data from another source file using VBA. I have problems copying the extracted data and format into the required data format. And also, how do i delete the row that is not required in the output file, in the below example: The row, D0, is not needed. An Example Data Format From the SOURCE file: W1 W2 W3 W4 Oct05 AverageYield 95% 96% 92% 91% 94% D0 0.1 ...
6
4455
by: Werner | last post by:
Hi, I try to read (and extract) some "self extracting" zipefiles on a Windows system. The standard module zipefile seems not to be able to handle this. False Is there a wrapper or has some one experience with other libaries to
0
9663
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10195
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9016
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7525
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6765
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5415
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5548
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4090
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3695
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.