473,320 Members | 1,820 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Extracting text via DOM

Hello,

I wish to extract some text from certain elements on the page and
process them. I've done this in the past by keying on the className
but I don't have that option in this case. Below is an example of what
I have to work with. I need to extract the SKU, Product, Qty, Price &
Extended Price.

Here's one try I made:

function getValue(name){

var elements = document.getElementsByTagName("span");
for (var i = 0; i < elements.length; i++){
var l = elements[i];
if (l.className.match("default") && l.data.match("SKU")){
alert(l.data);
}
}
}

As you can see, I'm missing some essential understanding of how to
access the text within the <span> tag. Any help would be much
appreciated.

Thanks.

mp

<tr valign="top">
<!-- -------------------------- Item# / SKU -------------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td><span class="default">TP59524</span></td>
<!-- -------------------------- Product ------------------------------ -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td><span class="default">Product One Description</span></td>
<!-- -------------------------- Status ------------------------------ -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td><span class="default">In Stock</span></td>
<!-- -------------------------- Quantity ----------------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<!--<INPUT name="qty" value="" size=3 maxlength=3>-->
<td align="center"><span class="default">6</span></td>
<!-- -------------------------- Price -------------------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td align="right"><div class="default">$2.99</div>
</td>
<!-- -------------------------- Extended Price ----------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td align="right"><div class="default">$17.94</div>
</td>
<!-- -------------------------- Remove Item -------------------------- -->
<td width="4"><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
<td align="center"><a href="https://www.domain.com/checkout/invoice/invoicemain.jsp?remove=6929940&amp;orderID=5943646 " class="xremove">Remove</a></td>
<td><img src="invoicemain.jsp_files/spacer01.gif" border="0"
height="1" width="4"></td>
</tr>

--
Michael Powe mi*****@trollope.org Naugatuck CT USA

"I don't like America because America doesn't like the world."
- Ahmed Sultan Ghanem, 25-year-old Saudi student
Jul 23 '05 #1
5 1765
Michael Powe said:
I wish to extract some text from certain elements on the page...


The nodeValue property of the Text node (nodeType == 3) is what you're looking for... or innerHTML ;-) Try this...

function getValue()
{
if (document.getElementsByTagName) {
var s = '', ele, elements = document.getElementsByTagName("span");
for (var i = 0; i < elements.length; i++) {
ele = elements[i];
if (ele.className.match("default")) {
if (typeof ele.firstChild != 'undefined') {
s += '\n' + i + ': ' + ele.firstChild.nodeValue;
}
else if (typeof ele.innerHTML != 'undefined') {
s += '\n' + i + ': ' + ele.innerHTML;
}
}
}
alert(s);
}
}
Jul 23 '05 #2
DU
Michael Powe wrote:
Hello,

I wish to extract some text from certain elements on the page and
process them. I've done this in the past by keying on the className
but I don't have that option in this case.
Well, then maybe, keying on the className is not recommendable: my opinion.
Below is an example of what I have to work with. I need to extract the SKU, Product, Qty, Price &
Extended Price.
I'm missing some essential understanding of how to
access the text within the <span> tag.


Then you should look into understanding what are text nodes and how to
use DOM 2 CharacterData interfaces methods and properties. These methods
and properties are universally and perfectly supported in modern
browsers (Mozilla family of browsers, MSIE 6, Opera 7.x, Safari 1.x,
Konqueror 3.x, etc.). So there is no need for cross-browser code here.

You need to test if node content are of type == 3 or nodeName is "#text"
before applying a substring method or accessing some property like data
or nodeValue.

DOM 2 CharacterData interface tests of properties and methods:
http://www10.brinkster.com/doctorunc...acterData.html

Whitespace in the DOM (more difficult test)
http://www.mozilla.org/docs/dom/technote/whitespace/

DU
--
The site said to use Internet Explorer 5 or better... so I switched to
Netscape 7.2 :)
Jul 23 '05 #3
DU
Mike Foster wrote:
Michael Powe said:
I wish to extract some text from certain elements on the page...

The nodeValue property of the Text node (nodeType == 3) is what you're
looking for... or innerHTML ;-) Try this...

function getValue()
{
if (document.getElementsByTagName) {
var s = '', ele, elements = document.getElementsByTagName("span");
for (var i = 0; i < elements.length; i++) {
ele = elements[i];
if (ele.className.match("default")) {
if (typeof ele.firstChild != 'undefined')


The code seems awkward, not optimized. Span nodes "identified" as
class="default" shouldn't be checked for their first child: you either
registered span under class="default" as having text nodes from the
beginning otherwise, you would not have. So, personally, I would skip
this if instruction to improve performance.

{
Also, your code does not best fit your previous comment.
You probably meant to say:
if(ele.childNodes[0] && ele.childNodes[0].nodeType == 3)

There is also DOM 3 textContent method which is supported by Mozilla
1.5+. Opera 7.x should (will?) follow up on this as a valid replacement
for innerText.

http://www.w3.org/TR/2004/REC-DOM-Le...e3-textContent
s += '\n' + i + ': ' + ele.firstChild.nodeValue; }
else if (typeof ele.innerHTML != 'undefined') {
s += '\n' + i + ': ' + ele.innerHTML;
}
}
}
alert(s);
}
}


All these if..else affect performance. If you can reduce the number of
them, it is better.

DU
--
The site said to use Internet Explorer 5 or better... so I switched to
Netscape 7.2 :)
Jul 23 '05 #4
>>>>> "Mike" == Mike Foster <mi********@mfosternospam.com> writes:

Mike> Michael Powe said:
I wish to extract some text from certain elements on the
page...


Mike> The nodeValue property of the Text node (nodeType == 3) is
Mike> what you're looking for... or innerHTML ;-) Try this...

Mike> function getValue() { if (document.getElementsByTagName) {
Mike> var s = '', ele, elements =
Mike> document.getElementsByTagName("span"); for (var i = 0; i <
Mike> elements.length; i++) { ele = elements[i]; if
Mike> (ele.className.match("default")) { if (typeof ele.firstChild
Mike> != 'undefined') { s += '\n' + i + ': ' +
Mike> ele.firstChild.nodeValue; } else if (typeof ele.innerHTML !=
Mike> 'undefined') { s += '\n' + i + ': ' + ele.innerHTML; } } }
Mike> alert(s); } }

function getValue()
{
if (document.getElementsByTagName) {
var s = '', ele, elements = document.getElementsByTagName("span");
for (var i = 0; i < elements.length; i++) {
ele = elements[i];
if (ele.className.match("default")) {
if (typeof ele.firstChild != 'undefined') {
s += '\n' + i + ': ' + ele.firstChild.nodeValue;
}
else if (typeof ele.innerHTML != 'undefined') {
s += '\n' + i + ': ' + ele.innerHTML;
}
}
}
alert(s);
}
}

This almost works. Thanks for the clue, I've learned quite a bit from
it. But now I'm hung up on the fact that after I get part of the way
through the loop, the code crashes with the error

'ele.firstChild has no properties'

This seems to come from the attempt to access ele.firstChild.nodeValue
when it doesn't exist. I've tried everything I can think of to get
around this problem. Any thoughts on how to circumvent this problem
would be much appreciated.

Thanks.

mp

--
Michael Powe mi*****@trollope.org Naugatuck CT USA
"When a person behaves in keeping with his conscience, when he
tries to speak as a citizen even under conditions where
citizenship is degraded, it may not lead to anything, yet it might.
But what surely will not lead to anything is when a person calculates
whether it will lead to something or not." -- Vaclav Havel, 1989
Jul 23 '05 #5
UniDyne
18
You might want to check to make sure the element you are looking at actually HAS nodes. A common mistake some developers make is assuming that the nodes they are querying have content. Try the hasChildNodes() function before querying the firstChild.
Oct 1 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: lecichy | last post by:
Hello Heres the situation: I got a file with lines like: name:second_name:somenumber:otherinfo etc with different values between colons ( just like passwd file) What I want is to extract...
5
by: Michael Hill | last post by:
Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past,...
1
by: Cognizance | last post by:
Hi gang, I'm an ASP developer by trade, but I've had to create client side scripts with JavaScript many times in the past. Simple things, like validating form elements and such. Now I've been...
4
by: kirill_uk | last post by:
Help with extracting please folks.! Hi. I have this: a variable like: <a href="http://www.some_html.com/text.html" >some text</a><br> I heed to extract the "http://www.some_html.com/text.html "...
2
by: Chris Belcher | last post by:
First some background... The database tracks Action Items assigned to a group of 20 or so managers. Once the assignment is created it is then emailed to each of the managers that are included in...
1
by: Mark Jones | last post by:
Can anyone point me towards information/.net components that can be used for text extraction and pattern recognition? In particular, I am interested in working with a screenshot and extracting...
2
by: Kevin K | last post by:
Hi, I'm having a problem with extracting text from a Word document using StreamReader. As I'm developing a web application, I do NOT want the server to make calls to Word. I want to simply...
2
by: chris_j_adams | last post by:
Hi, I'm slowly discovering the world of JavaScript, so I'm not sure I'm attacking this problem in the right manner, thus if I'm in the wrong newsgroup, my apologies. What I'm trying to do is...
6
by: sunil | last post by:
I have a button named Button1, and I wrote an event handler for the OnClick event. protected void Button1_Click(object sender, System.EventArgs e) { this.Response.Redirect("Default.aspx?q=" +...
2
by: VictorTan | last post by:
Hello. I'm new to this forum. Hope that I don't make mistakes in here but if I do, please correct me if there is. Thanks. I also wanted to ask you guys regarding about the following following...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.