Connecting Tech Pros Worldwide Help | Site Map

Reading an HTML document & extracting content

Cognizance
Guest
 
Posts: n/a
#1: Jul 23 '05
Hi gang,

I'm an ASP developer by trade, but I've had to create client side
scripts with JavaScript many times in the past. Simple things, like
validating form elements and such.

Now I've been assigned the task of extracting content from a given HTML
page. If anyone's familiar with the Yahoo! Store order confirmation
screen, I need to be able to grab the total amount from the table to
the right-hand side. (Sample File:
http://www.2beyourself.com/t/sample.html)

If you view the source, this is in a table and enclosed with ugly html.
the value I want to retrieve is wrapped with b tags. Originally I was
thinking of using innerHTML or innerText for extracting the value. But
I find that we cannot gain control of this piece of the Yahoo! Store to
make it work!

So after talking with peers, we thought of reading in the entire HTML
page and using regular expressions to try and extract the value.
Something along the lines of: '\<b\>[0-9]+\.[0-9]{2}\<\/b\/>'

I'm not sure how to accomplish this. Could someone please point me in
the right direction? If this solution is even a good one. If you have
something better, I'm all ears! (eyes) If using the regular expression
would be a good solution, I need to find out how to read in the entire
HTML doc, and then parse out that piece.

Any tips and suggestions will be appreciate greatly!!

And I hope your week is starting off right. ^^

McKirahan
Guest
 
Posts: n/a
#2: Jul 23 '05

re: Reading an HTML document & extracting content


"Cognizance" <cognizance42@gmail.com> wrote in message
news:1116880311.480118.48520@f14g2000cwb.googlegro ups.com...[color=blue]
> Hi gang,
>
> I'm an ASP developer by trade, but I've had to create client side
> scripts with JavaScript many times in the past. Simple things, like
> validating form elements and such.
>
> Now I've been assigned the task of extracting content from a given HTML
> page. If anyone's familiar with the Yahoo! Store order confirmation
> screen, I need to be able to grab the total amount from the table to
> the right-hand side. (Sample File:
> http://www.2beyourself.com/t/sample.html)
>
> If you view the source, this is in a table and enclosed with ugly html.
> the value I want to retrieve is wrapped with b tags. Originally I was
> thinking of using innerHTML or innerText for extracting the value. But
> I find that we cannot gain control of this piece of the Yahoo! Store to
> make it work!
>
> So after talking with peers, we thought of reading in the entire HTML
> page and using regular expressions to try and extract the value.
> Something along the lines of: '\<b\>[0-9]+\.[0-9]{2}\<\/b\/>'
>
> I'm not sure how to accomplish this. Could someone please point me in
> the right direction? If this solution is even a good one. If you have
> something better, I'm all ears! (eyes) If using the regular expression
> would be a good solution, I need to find out how to read in the entire
> HTML doc, and then parse out that piece.
>
> Any tips and suggestions will be appreciate greatly!!
>
> And I hope your week is starting off right. ^^
>[/color]

RegEx would be better but this works:

<html>
<head>
<title>Total.htm</title>
<script type="text/javascript">
function total() {
var sURL = "http://www.2beyourself.com/t/sample.html";
var oXML = new ActiveXObject("Microsoft.XMLHTTP");
oXML.Open("GET",sURL,false);
oXML.send();
try {
var sXML = oXML.ResponseText;
// Find Total's label
var iTAG = sXML.indexOf("<b>Total:</b>");
var sVAL = sXML.substr(iTAG);
// Find Total's decimal
var iDOT = sVAL.indexOf(".");
sVAL = sVAL.substr(0,iDOT+3);
// Find Total's start
iTAG = sVAL.lastIndexOf(">")
sVAL = sVAL.substr(iTAG+1)
// Show Total's value
alert(sVAL);
} catch(e) {
alert(sURL + " not found!");
}
}
</script>
</head>
<body onload="total()">
</body>
</html>



Closed Thread