Getting the complete text content of a node...

Arancaytar

(Note: I am a Javascript newbie. I can handle PHP and Java, but this
is unfamiliar territory.)

For a wordcount feature, I need to collect the complete text content
of a 'div' element inside a variable. Because of the issues with
paragraphs and markup, the content is split into different nodes in
the DOM.

For example:

<div>Hello, <p>this text is <span style="font-style:italic">italic</
span></p></div>

This will cause (more or less) a DOM tree like this:

[div]
-Hello,
- [p]
-this text is
-[span]
-italic

Now, my function has, as a starting point, the [div] node that is the
top parent here.

function aggregateTextNode(textNode) {
...
return allText;
}

Since I don't know the depth of the nodes, I am trying to build this
as a recursive function. "depth" is a parameter that ensures I can
limit the recursion to a certain level.

function aggregateTextNode(textNode,depth) {
var text=textNode.nodeValue; // get the text value of the current
node
if (depth==0) return text; // recursion limit reached
for (i=0;i<textNode.childNodes.length;i++) { // if the node has
child nodes, aggregate these
text+=aggregateTextNode(textNode.childNodes[i],depth-1); //
append aggregated text
}
return text;
}

However, no matter where I set the recursion limit, the script
invariably freezes Firefox until the timeout is reached and I can
abort it - infinite loop, apparently.

Can you see what's wrong with my code? It's very clearly the recursion
that causes it, because if the node has no child nodes at all (say
"<div>Just text</div>"), it succeeds. But if there is only a single
child node, it hangs itself.

Meanwhile, I've managed to do it with a very ugly nested loop that can
go three levels deep, but I'd really rather use the recursive approach
if at all possible.

Feb 1 '07 #1

Subscribe Post Reply

5618

Christoph Burschka

Arancaytar schrieb:

>
However, no matter where I set the recursion limit, the script
invariably freezes Firefox until the timeout is reached and I can
abort it - infinite loop, apparently.

I forgot to add: The Error Console shows no warnings or notices - I
don't have any indication that the script is crashing apart from the
freeze itself.

Also, a more readable version of the function with shorter, non-wrapped
lines:

function aggregateTextNode(node,depth) {
var text=node.nodeValue;
if (depth==0) return text;
for (i=0;i<node.childNodes.length;i++) {
text+=aggregateTextNode(node.childNodes[i],depth-1);
}
return text;
}

--
CB

Feb 1 '07 #2

p.lepin

On Feb 1, 3:01 pm, "Arancaytar"
<arancaytar.ilya...@gmail.comwrote:

For a wordcount feature, I need to collect the complete
text content of a 'div' element inside a variable.
Because of the issues with paragraphs and markup, the
content is split into different nodes in the DOM.

Since I don't know the depth of the nodes, I am trying to
build this as a recursive function. "depth" is a
parameter that ensures I can limit the recursion to a
certain level.

[code snipped]

Meanwhile, I've managed to do it with a very ugly nested
loop that can go three levels deep, but I'd really rather
use the recursive approach if at all possible.

I'm not sure what the problem with your code is, but I was
a bit surprised you don't check the nodeType. Anyway, the
following works for me in Firefox 1.5.0.7 and Konqueror
3.5.4. Cannot check it with other browsers at the moment.

<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title></title>
<script type="text/javascript">
function grabText ( node , maxDepth )
{
if ( 3 == node . nodeType )
{
return node . nodeValue ;
}
else if
(
( 1 == node . nodeType ) && ( 0 < maxDepth )
)
{
var result = '' ;
for
(
var i = 0 ;
i < node . childNodes . length ;
i ++
)
{
result +=
grabText
(
node . childNodes [ i ] , maxDepth - 1
) ;
}
return result ;
}
return '' ;
}
</script>
</head>
<body>
<div onclick=" alert ( grabText ( this , 3 ) ) ; ">
Some <i>stupid</itext with
<b><span>fa<i>n</i>cy</spanformatting</b>!
</div>
</body>
</html>

--
Pavel Lepin

Feb 1 '07 #3

RobG

On Feb 1, 11:01 pm, "Arancaytar" <arancaytar.ilya...@gmail.comwrote:

(Note: I am a Javascript newbie. I can handle PHP and Java, but this
is unfamiliar territory.)

For a wordcount feature, I need to collect the complete text content
of a 'div' element inside a variable. Because of the issues with
paragraphs and markup, the content is split into different nodes in
the DOM.

Here's one I prepared earlier...

It tries the W3C compliant textContent first, if that isn't
supported, it tries IE's innerText. Finally, it tries innerHTML with
a regular expression to strip HTML tags. The final method should only
be used in a very small number of browsers, and may fail in those in a
few cases.

If you really want a recursive function, that is included further
down.

// Using textConent || innerText || innerHTML = regEx
function getText (el) {
if (el.textContent) {return el.textContent;}
if (el.innerText) {return el.innerText;}
if (typeof el.innerHTML == 'string') {
return el.innerHTML.replace(/<[^<>]+>/g,'');
}
}
// Using textConent || innerText || recursion
function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;
return getText2(el);

function getText2(el) {
var x = el.childNodes;
var txt = '';
for (var i=0, len=x.length; i<len; ++i){
if (3 == x[i].nodeType) {
txt += x[i].data;
} else if (1 == x[i].nodeType){
txt += getText2(x[i]);
}
}
return txt.replace(/\s+/g,' ');
}
}

--
Rob

Feb 1 '07 #4

Christoph Burschka

RobG wrote:

If you really want a recursive function, that is included further
down.

I don't want recursion at any cost - I just assumed it's necessary
because of the way the DOM tree stores text. If it's possible to get the
"flat" text content of the node in another way, that would be just great.

I haven't tried out your code yet, but from what I see, I guess
"textContent" and "innerText" can do just that without a need for a
messy recursion.

So by trying "textContent", "innerText" and stripped "innerHTML" in that
order, I can support almost all browsers that matter?

--
CB

Feb 1 '07 #5

Elegie

RobG wrote:

Hi Rob,

<snip>

Finally, it tries innerHTML with
a regular expression to strip HTML tags. The final method should only
be used in a very small number of browsers, and may fail in those in a
few cases.

I think so, too: innerHTML returns some text in which HTML entities
should logically not be expanded (as it normally represents a valid HTML
fragment). Therefore, if the code were to include some, those entities
would appear "as is" in the returned text.

For that very reason, while I'd admit the innerHTML approach is
definitely appealing, I think I'd prefer to stick to the recursion model
as the third fall back technique.
Kind regards,
Elegie.

Feb 1 '07 #6

RobG

Christoph Burschka wrote:

RobG wrote:

>If you really want a recursive function, that is included further
down.

I don't want recursion at any cost - I just assumed it's necessary
because of the way the DOM tree stores text. If it's possible to get the
"flat" text content of the node in another way, that would be just great.

I haven't tried out your code yet, but from what I see, I guess
"textContent" and "innerText" can do just that without a need for a
messy recursion.

So by trying "textContent", "innerText" and stripped "innerHTML" in that
order, I can support almost all browsers that matter?

Yes.

I don't know of any recent browser that doesn't support either
textContent or innerHTML, maybe there are some mobile browsers in that
category. If you keep the tag content simple (no '<' or '>' characters
in attribute values) then the fall-back to innerHTML should be pretty
solid too.
--
Rob

Feb 1 '07 #7

Christoph Burschka

RobG schrieb:

Christoph Burschka wrote:

>RobG wrote:

>>If you really want a recursive function, that is included further
down.

I don't want recursion at any cost - I just assumed it's necessary
because of the way the DOM tree stores text. If it's possible to get
the "flat" text content of the node in another way, that would be just
great.

I haven't tried out your code yet, but from what I see, I guess
"textContent" and "innerText" can do just that without a need for a
messy recursion.

So by trying "textContent", "innerText" and stripped "innerHTML" in
that order, I can support almost all browsers that matter?

Yes.

I don't know of any recent browser that doesn't support either
textContent or innerHTML, maybe there are some mobile browsers in that
category. If you keep the tag content simple (no '<' or '>' characters
in attribute values) then the fall-back to innerHTML should be pretty
solid too.

Well, since the wordcount is a cosmetic feature, it won't break the page
if by some chance the browser doesn't support it.

Anyway, I've replaced my current nested loop with this function, and it
works perfectly. Thanks a lot!

--
CB

Feb 1 '07 #8

by: Michael Bierman | last post by:

Please forgive the simplicy of this question. I have the following code which attempts to determine the color of some text and set other text to match that color. It works fine in Firefox, but does...

Javascript

I am getting crazy. Can't access XML content in Firefox.

by: leodippolito | last post by:

Hello sirs, I am trying to send a POST request to a webservice on the click of a button. This will return me an XML document with a list of combo box items. The problem: in FIREFOX, when the...

Javascript

Q/VB.NET: Append data to XML file without loading complete file?

by: Jonathan Buckland | last post by:

Can someone give me an example how to append data without having to load the complete XML file. Is this possible? Jonathan

.NET Framework

Getting the XML as text between two nodes

by: Pavils Jurjans | last post by:

Hello, I am interested in getting the XML contents as text between two XML elements that I know follow each other. They could be in completely different levels, but in XML file the first is...

.NET Framework

String is automatically getting truncated

by: R.Manikandan | last post by:

Hi In my code, one string variable is subjected to contain more amount of characters. If it cross certain limit, the string content in the varabile is automatically getting truncated and i am...

Javascript

Why am I getting out of range exception with these code?

by: Pucca | last post by:

Hi: Below is the error I got from the 2 lines of code below. I don't understand why and how to correct it. The actionMenu.DropDownItems has 0 item in its collection at the time of the code. ...

C# / C Sharp

Select text within a div tag by clicking on content of div tag or a button?

by: visu | last post by:

Hi this is a question asked in this group two years back.. No answer for this question till date. now i am in the same situation of the questioner.. to find a solution for this problem. Can any...

Javascript

Getting kind of abstract text snippets from text nodes

by: Andreas W. Wylach | last post by:

Hi everybody, I am about implementing a little search engine that searches a phrase over xml text nodes. I got that all working fine but what I want as the results is not the complete text of...

.NET Framework

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Getting the complete text content of a node...

Similar topics