By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,607 Members | 1,982 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,607 IT Pros & Developers. It's quick & easy.

Extracting text via DOM

P: n/a
Hello,

I wish to extract some text from certain elements on the page and
process them. I've done this in the past by keying on the className
but I don't have that option in this case. Below is an example of what
I have to work with. I need to extract the SKU, Product, Qty, Price &
Extended Price.

Here's one try I made:

function getValue(name){

var elements = document.getElementsByTagName("span");
for (var i = 0; i < elements.length; i++){
var l = elements[i];
if (l.className.match("default") && l.data.match("SKU")){
alert(l.data);
}
}
}

As you can see, I'm missing some essential understanding of how to
access the text within the <span> tag. Any help would be much
appreciated.

Thanks.

mp

<tr valign="top">
<!-- -------------------------- Item# / SKU -------------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td><span class="default">TP59524</span></td>
<!-- -------------------------- Product ------------------------------ -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td><span class="default">Product One Description</span></td>
<!-- -------------------------- Status ------------------------------ -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td><span class="default">In Stock</span></td>
<!-- -------------------------- Quantity ----------------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<!--<INPUT name="qty" value="" size=3 maxlength=3>-->
<td align="center"><span class="default">6</span></td>
<!-- -------------------------- Price -------------------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td align="right"><div class="default">$2.99</div>
</td>
<!-- -------------------------- Extended Price ----------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td align="right"><div class="default">$17.94</div>
</td>
<!-- -------------------------- Remove Item -------------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td align="center"><a href="https://www.domain.com/checkout/invoice/invoicemain.jsp?remove=6929940&amp;orderID=5943646 " class="xremove">Remove</a></td>
<td><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
</tr>

--
Michael Powe mi*****@trollope.org Naugatuck CT USA

"I don't like America because America doesn't like the world."
- Ahmed Sultan Ghanem, 25-year-old Saudi student
Jul 23 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Michael Powe said:
I wish to extract some text from certain elements on the page...


The nodeValue property of the Text node (nodeType == 3) is what you're looking for... or innerHTML ;-) Try this...

function getValue()
{
if (document.getElementsByTagName) {
var s = '', ele, elements = document.getElementsByTagName("span");
for (var i = 0; i < elements.length; i++) {
ele = elements[i];
if (ele.className.match("default")) {
if (typeof ele.firstChild != 'undefined') {
s += '\n' + i + ': ' + ele.firstChild.nodeValue;
}
else if (typeof ele.innerHTML != 'undefined') {
s += '\n' + i + ': ' + ele.innerHTML;
}
}
}
alert(s);
}
}
Jul 23 '05 #2

P: n/a
DU
Michael Powe wrote:
Hello,

I wish to extract some text from certain elements on the page and
process them. I've done this in the past by keying on the className
but I don't have that option in this case.
Well, then maybe, keying on the className is not recommendable: my opinion.
Below is an example of what I have to work with. I need to extract the SKU, Product, Qty, Price &
Extended Price.
I'm missing some essential understanding of how to
access the text within the <span> tag.


Then you should look into understanding what are text nodes and how to
use DOM 2 CharacterData interfaces methods and properties. These methods
and properties are universally and perfectly supported in modern
browsers (Mozilla family of browsers, MSIE 6, Opera 7.x, Safari 1.x,
Konqueror 3.x, etc.). So there is no need for cross-browser code here.

You need to test if node content are of type == 3 or nodeName is "#text"
before applying a substring method or accessing some property like data
or nodeValue.

DOM 2 CharacterData interface tests of properties and methods:
http://www10.brinkster.com/doctorunc...acterData.html

Whitespace in the DOM (more difficult test)
http://www.mozilla.org/docs/dom/technote/whitespace/

DU
--
The site said to use Internet Explorer 5 or better... so I switched to
Netscape 7.2 :)
Jul 23 '05 #3

P: n/a
DU
Mike Foster wrote:
Michael Powe said:
I wish to extract some text from certain elements on the page...

The nodeValue property of the Text node (nodeType == 3) is what you're
looking for... or innerHTML ;-) Try this...

function getValue()
{
if (document.getElementsByTagName) {
var s = '', ele, elements = document.getElementsByTagName("span");
for (var i = 0; i < elements.length; i++) {
ele = elements[i];
if (ele.className.match("default")) {
if (typeof ele.firstChild != 'undefined')


The code seems awkward, not optimized. Span nodes "identified" as
class="default" shouldn't be checked for their first child: you either
registered span under class="default" as having text nodes from the
beginning otherwise, you would not have. So, personally, I would skip
this if instruction to improve performance.

{
Also, your code does not best fit your previous comment.
You probably meant to say:
if(ele.childNodes[0] && ele.childNodes[0].nodeType == 3)

There is also DOM 3 textContent method which is supported by Mozilla
1.5+. Opera 7.x should (will?) follow up on this as a valid replacement
for innerText.

http://www.w3.org/TR/2004/REC-DOM-Le...e3-textContent
s += '\n' + i + ': ' + ele.firstChild.nodeValue; }
else if (typeof ele.innerHTML != 'undefined') {
s += '\n' + i + ': ' + ele.innerHTML;
}
}
}
alert(s);
}
}


All these if..else affect performance. If you can reduce the number of
them, it is better.

DU
--
The site said to use Internet Explorer 5 or better... so I switched to
Netscape 7.2 :)
Jul 23 '05 #4

P: n/a
>>>>> "Mike" == Mike Foster <mi********@mfosternospam.com> writes:

Mike> Michael Powe said:
I wish to extract some text from certain elements on the
page...


Mike> The nodeValue property of the Text node (nodeType == 3) is
Mike> what you're looking for... or innerHTML ;-) Try this...

Mike> function getValue() { if (document.getElementsByTagName) {
Mike> var s = '', ele, elements =
Mike> document.getElementsByTagName("span"); for (var i = 0; i <
Mike> elements.length; i++) { ele = elements[i]; if
Mike> (ele.className.match("default")) { if (typeof ele.firstChild
Mike> != 'undefined') { s += '\n' + i + ': ' +
Mike> ele.firstChild.nodeValue; } else if (typeof ele.innerHTML !=
Mike> 'undefined') { s += '\n' + i + ': ' + ele.innerHTML; } } }
Mike> alert(s); } }

function getValue()
{
if (document.getElementsByTagName) {
var s = '', ele, elements = document.getElementsByTagName("span");
for (var i = 0; i < elements.length; i++) {
ele = elements[i];
if (ele.className.match("default")) {
if (typeof ele.firstChild != 'undefined') {
s += '\n' + i + ': ' + ele.firstChild.nodeValue;
}
else if (typeof ele.innerHTML != 'undefined') {
s += '\n' + i + ': ' + ele.innerHTML;
}
}
}
alert(s);
}
}

This almost works. Thanks for the clue, I've learned quite a bit from
it. But now I'm hung up on the fact that after I get part of the way
through the loop, the code crashes with the error

'ele.firstChild has no properties'

This seems to come from the attempt to access ele.firstChild.nodeValue
when it doesn't exist. I've tried everything I can think of to get
around this problem. Any thoughts on how to circumvent this problem
would be much appreciated.

Thanks.

mp

--
Michael Powe mi*****@trollope.org Naugatuck CT USA
"When a person behaves in keeping with his conscience, when he
tries to speak as a citizen even under conditions where
citizenship is degraded, it may not lead to anything, yet it might.
But what surely will not lead to anything is when a person calculates
whether it will lead to something or not." -- Vaclav Havel, 1989
Jul 23 '05 #5

P: 18
You might want to check to make sure the element you are looking at actually HAS nodes. A common mistake some developers make is assuming that the nodes they are querying have content. Try the hasChildNodes() function before querying the firstChild.
Oct 1 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.