getElementsByTagName 'n such

norfleet

hi folks,

OK, so let's say, for example, I have a bit of HTML that looks like
this:

<td class="regular1b" valign="top">
<a href="notfound.html">Lecture
V</a>
</td>

And I want to save all the text ("all" meaning the tags and
everything) between the <td> and </td>. Using JavaScript, I was able
to isolate the <td></td> by doing:

var w = myTable.getElementsByTagName("TD");

So then I have an IF statement within a FOR loop that looks like:

if (w.item(i).className == "regular1b")
alert(w[i].childNodes[0].nodeValue);

The ALERT() is just a place holder to make sure things are working.
The thing is, nodeValue returns NULL because there's no actual text
within the <td></td> tags; the only thing there is more HTML code, and
the text between the apparently isn't considered part of
the <td></td> tags.

I guess I'm wondering if there's another way to go about getting the
text from in between the <td></td> tags short of just doing a
brute-force text search on the whole darn page. Any help would be
much appreciated...

Fleet

Jul 23 '05 #1

Subscribe Post Reply

1973

Ron

norfleet wrote:

hi folks,

OK, so let's say, for example, I have a bit of HTML that looks like
this:

<td class="regular1b" valign="top">
<a href="notfound.html">Lecture
V</a>
</td>

And I want to save all the text ("all" meaning the tags and
everything) between the <td> and </td>. Using JavaScript, I was able
to isolate the <td></td> by doing:

var w = myTable.getElementsByTagName("TD");

So then I have an IF statement within a FOR loop that looks like:

if (w.item(i).className == "regular1b")
alert(w[i].childNodes[0].nodeValue);

The ALERT() is just a place holder to make sure things are working.
The thing is, nodeValue returns NULL because there's no actual text
within the <td></td> tags; the only thing there is more HTML code, and
the text between the apparently isn't considered part of
the <td></td> tags.

I guess I'm wondering if there's another way to go about getting the
text from in between the <td></td> tags short of just doing a
brute-force text search on the whole darn page. Any help would be
much appreciated...

Fleet

Heya Fleet,
Unless the document is normalized, childNodes[0] may be a whitespace
text node. You might want to normalize your TD before reading from it.
In addition, nodeValue is supposed to return null for any element node
->
http://www.w3.org/TR/2004/REC-DOM-Le...#ID-1950641247
.. Unfortunately, the best (possibly only) current way to get what you
want is to use the non-standard innerHTML property of your TD object. It
is implemented in the latest versions of IE and Gecko-based browsers.

Jul 23 '05 #2

Thomas 'PointedEars' Lahn

norfleet wrote:

OK, so let's say, for example, I have a bit of HTML that looks like
this:

<td class="regular1b" valign="top">
<a href="notfound.html">Lecture
V</a>
</td>

And I want to save all the text ("all" meaning the tags and
everything) between the <td> and </td>. Using JavaScript, I was able
to isolate the <td></td> by doing:

var w = myTable.getElementsByTagName("TD");

So then I have an IF statement within a FOR loop that looks like:

if (w.item(i).className == "regular1b")
As you have seen, there is no need to call the item() method explicitely
when accessing the DOM with an ECMAScript implementation. Using the square
bracket property accessor syntax, that method or the namedItem() method
is called implicitely, depending on the type of the operand.

<http://www.w3.org/TR/DOM-Level-2-HTML/ecma-script-binding.html>
alert(w[i].childNodes[0].nodeValue);

The ALERT() is just a place holder to make sure things are working.
The thing is, nodeValue returns NULL because there's no actual text
within the <td></td> tags;
It returns `null' (ECMAScript is case-sensitive) because the first child
node is an element node. This is documented and standards compliant
behavior. Think of the contents of the "td" element as a subtree where
nested content is a child node. Provided that the whitespace after the
start tag of the "td" element and before the start tag of the "a" element
is not considered a text node (proprietary behavior!), this subtree looks like

..
..
..
'- TD class="regular1b" valign="top"
| |
| '- A href="notfound.html"
| |
| '- SPAN class="list5"
| |
| '- B
| |
| '- TEXT "Lecture V"
|
|- ...
..
..
..

(The "Show parse tree" feature of the W3C Validator
<http://validator.w3.org/> provides a similar presentation.)

You see that childNodes[0] or firstChild refers to an element node.

Standard compliant parsing would result in

..
..
..
'- TD class="regular1b" valign="top"
| |
| |- TEXT "\n\t"
| |
| '- A HREF="notfound.html"
| |
| '- SPAN class="list5"
| |
| '- B
| |
| '- TEXT "Lecture V"
|
|- ...
..
..
..

so in Mozilla/5.0 (Mozilla, Netscape 6+, Firefox, Camino,
....) you get "\n\t" for childNodes[0].nodeValue.

That is why it was suggested to normalize the document, such as

<td class="regular1b" valign="top"><a
href="notfound.html"Lecture V</a></td> the only thing there is more HTML code, and the text between the
 apparently isn't considered part of the <td></td> tags.
That misconception is the main cause for your problem.
I guess I'm wondering if there's another way to go about getting the
text from in between the <td></td> tags short of just doing a
brute-force text search on the whole darn page. Any help would be
much appreciated...

There is. The "innerHTML" property has been suggested. But since it is
proprietary, and you are using the standards compliant DOM, you should
rather serialize the subtree, traversing it. Depending on the UA's DOM,
there are predefined serializer objects, such as XMLSerializer in the
Gecko DOM. But you can code your own serializer as well.
PointedEars

Jul 23 '05 #3

norfleet wrote:

hi folks,

OK, so let's say, for example, I have a bit of HTML that looks like
this:

<td class="regular1b" valign="top">
<a href="notfound.html">Lecture
V</a>
</td>

And I want to save all the text ("all" meaning the tags and
everything) between the <td> and </td>. Using JavaScript, I was able
to isolate the <td></td> by doing:

var w = myTable.getElementsByTagName("TD");

So then I have an IF statement within a FOR loop that looks like:

if (w.item(i).className == "regular1b")
alert(w[i].childNodes[0].nodeValue);

The ALERT() is just a place holder to make sure things are working.
The thing is, nodeValue returns NULL because there's no actual text
within the <td></td> tags; the only thing there is more HTML code, and
the text between the apparently isn't considered part of
the <td></td> tags.

I guess I'm wondering if there's another way to go about getting the
text from in between the <td></td> tags

There is. The textContent attribute in the Node interface (DOM 3 Core)
is supported by Mozil1a 1.5+. I tried it with your specific markup code
(with all the white-space, line feed, etc) and it worked without a
problem. I tried it with more complex subtree and it worked as expected.

Bug 210451: Implement Node.textContent
http://bugzilla.mozilla.org/show_bug.cgi?id=210451

http://www.w3.org/TR/2004/REC-DOM-Le...e3-textContent

For other browsers not supporting DOM 3 Node Interface, you can create a
traversal subtree function and get/fetch the text or use the
non-standard innerHTML attribute.

DU

short of just doing a brute-force text search on the whole darn page. Any help would be
much appreciated...

Fleet

Jul 23 '05 #4

norfleet wrote:

hi folks,

OK, so let's say, for example, I have a bit of HTML that looks like
this:

<td class="regular1b" valign="top">
<a href="notfound.html">Lecture
V</a>
</td>

And I want to save all the text ("all" meaning the tags and
everything) between the <td> and </td>. Using JavaScript, I was able
to isolate the <td></td> by doing:

var w = myTable.getElementsByTagName("TD");

So then I have an IF statement within a FOR loop that looks like:

if (w.item(i).className == "regular1b")
alert(w[i].childNodes[0].nodeValue);

The ALERT() is just a place holder to make sure things are working.
The thing is, nodeValue returns NULL because there's no actual text
within the <td></td> tags; the only thing there is more HTML code, and
the text between the apparently isn't considered part of
the <td></td> tags.

I suggest you play around, get to know, get accustomed to using
Mozilla's DOM inspector. You can install it on Netscape 7.1 and Firefox
0.8 as well. This is how I personally noticed that white-space between
nodes are treated as anonymous text nodes. What you say above is not
true (your misconception is widely common) and was explained in

Whitespace in the DOM
http://www.mozilla.org/docs/dom/technote/whitespace/

DU
I guess I'm wondering if there's another way to go about getting the
text from in between the <td></td> tags short of just doing a
brute-force text search on the whole darn page. Any help would be
much appreciated...

Fleet

Jul 23 '05 #5

by: 2obvious | last post by:

This is a pipe dream, I realize, but I'm trying to emulate the functionality of the W3C DOM-supported document.getElementsByTagName method under the very nightmarish Netscape 4. Through some...

Javascript

getElementsByTagName

by: Michel Bany | last post by:

I am trying to parse responseXML from an HTTP request. var doc = request.responseXML; var elements = doc.getElementsByTagName("*"); the last statement returns an empty collection when running from...

Javascript

How to retrieve XML sublevel, using GetElementsByTagName

by: Andy | last post by:

Hello, I have the following example XML: <data> <package> <packageid>123</packageid> <package_article> <articleid>article1</articleid> </package_article> </package>

.NET Framework

Calls to GetElementsByTagName affect performance of XML DOM

by: Dima | last post by:

Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in sync with XmlDocument thanks to events fired by XmlDocument. Once this list is created there is no way to remove its event...

.NET Framework

getElementsByTagName + documentElement

by: Max | last post by:

Hello everyone! i would want to know if the getElementsByTagName() function starts to find the elements from documentElement comprising the same documentElement. XML example: <?xml...

.NET Framework

getElementsByTagName returning empty set

by: Ben | last post by:

I have a web service that returns the following xml: <?xml version="1.0" encoding="utf-8" ?> <NewDataSet> <Addresses> <XML_F52E2B61-18A1-11d1-B105-00805F49916B> <CustomerAddressBase...

Javascript

var anchors = document.getElementsByTagName("A");

by: windandwaves | last post by:

does it matter if I write var anchors = document.getElementsByTagName("A"); or var anchors = document.getElementsByTagName("a"); Or is there a better way to catch both <a hrefs and <A...

Javascript

Replace value of node using getElementsByTagName

by: Ouray Viney | last post by:

Xml <ib>8.4.27.5</ib> python from xml.dom import minidom xmldoc = minidom.parse('C:\TestProfile.xml') xmldoc

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

getElementsByTagName 'n such

Similar topics