Daz wrote:
Hi everyone.
Is there a simple way for me to get the value of the textNodes from
this piece of HTML, without iterating through the whole thing?
You can use a number of strategies based on feature detection: firstly
try textContent, if that is not supported, try innerText. If that
isn't supported, you have a choice of innerHTML and striping out the
tags, or you can recursively iterate over all the nodes and grab just
the text.
There are some functions posted here:
<URL:
http://groups.google.com/group/comp....f5c61c0ce91bfe
>
Copies are included below.
[...]
>
Please note the format of the text is different in each cell, and that
the code I need to obtain the textNodes from is not mine, so I cannot
change that format. I am simply using JavaScript to make a browser
extension that will do useful things with the page.
It's probably better if you say what you want the script to do, simply
getting all the text may not be what you really need.
Posted functions:
Using fallback to innerHTML and a regular expression to remove tags:
function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;
return el.innerHTML.replace(/<[^>]+>/g,'');
}
A better regular expression might be:
.replace( /<[^<>]+>/g, '' )
Suggested by Mike Winter:
<URL:
http://groups.google.com.au/group/co...06dda8f672ef5f
>
To avoid issues with regular expressions, use recursion - it will be
slower but that may not matter:
function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;
// If both fail, use recursion
return getText2(el);
// Recursive inner function
function getText2(el) {
var x = el.childNodes;
var txt = '';
for (var i=0, len=x.length; i<len; ++i){
if (3 == x[i].nodeType) {
txt += x[i].data;
} else if (1 == x[i].nodeType){
txt += getText2(x[i]);
}
}
// Collapse whitespace before returning
return txt.replace(/\s+/g,' ');
}
}
--
Rob