Alternative to documentElement.innerHTML?

Kyle

I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

height=100 width=100

in the real source, look like:

height="100" width="100"

in the returned value from documentElement.innerHTML.

Further complicating things, forms that begin insode a table in this
manner:

<table><form ...><tr><td...><input...></form></td>...

Are returned as:

<table><form ...></form><tr><td...><input...

If I modify the returned value from documentElement.innerHTML, then
write it back to documentElement.innerHTML, many of the forms are
non-functional.

I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,

-Kyle

Jul 20 '05 #1

Subscribe Post Reply

10865

Randy Webb

Kyle wrote:

I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

height=100 width=100

in the real source, look like:

height="100" width="100"

in the returned value from documentElement.innerHTML.

Further complicating things, forms that begin insode a table in this
manner:

<table><form ...><tr><td...><input...></form></td>...

Are returned as:

<table><form ...></form><tr><td...><input...

If I modify the returned value from documentElement.innerHTML, then
write it back to documentElement.innerHTML, many of the forms are
non-functional.

I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,

Validate your (X)HTML and you will solve a lot of those problems. Along
with dropping tables for layout.

Read the group FAQ, it discusses how to read a text file (2 methods),
which is what you are trying to do.

--
Randy
Chance Favors The Prepared Mind
comp.lang.javascript FAQ - http://jibbering.com/faq/

Jul 20 '05 #2

Dirk Feytons

Kyle wrote:

I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

[...]

Take a look at the DOM specification of W3C. Lots of methods to
manipulate your document.

--
Dirk

(PGP keyID: 0x448BC5DD - http://www.gnupg.org - http://www.pgp.com)

..oO° If I say I love you, will you stay or want to wither, fade away. If
I show you the sun and the night of my past, will a smile cross your
face but just vanish too fast. °Oo.

Jul 20 '05 #3

PeEmm

Kyle skrev, On 1/25/2004 6:51 AM:

I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

height=100 width=100

in the real source, look like:

height="100" width="100"

in the returned value from documentElement.innerHTML.

Further complicating things, forms that begin insode a table in this
manner:

<table><form ...><tr><td...><input...></form></td>...

Are returned as:

<table><form ...></form><tr><td...><input...

If I modify the returned value from documentElement.innerHTML, then
write it back to documentElement.innerHTML, many of the forms are
non-functional.

I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,

-Kyle

The DOM naturally only functions as expected, if the HTML source is as
expected, i.e. is valid due to standards. The examples you give above
are malformed HTML, so the DOM tries to do something about the mishmash.

--
/P.M.

Jul 20 '05 #4

Kyle

Randy Webb <hi************@aol.com> wrote in message news:<4K********************@comcast.com>...

Kyle wrote:
I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

height=100 width=100

in the real source, look like:

height="100" width="100"

in the returned value from documentElement.innerHTML.

Further complicating things, forms that begin insode a table in this
manner:

<table><form ...><tr><td...><input...></form></td>...

Are returned as:

<table><form ...></form><tr><td...><input...

If I modify the returned value from documentElement.innerHTML, then
write it back to documentElement.innerHTML, many of the forms are
non-functional.

I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,
Validate your (X)HTML and you will solve a lot of those problems. Along
with dropping tables for layout.

This code is resident in a Mozilla extension, not a page that I've
written. It isn't my HTML that I need to parse so I have no control
over it's validity.
Read the group FAQ, it discusses how to read a text file (2 methods),
which is what you are trying to do.

I don't understand what you mean here. As far as I know, the "file"
does not exist anywhere in the filesystem so this is untrue. I assume
this content is somewhere in memory because "View Source" and Sherlock
plugins make use of the real source without accessing the page a 2nd
time.

Thanks for any input.

--Kyle

Jul 20 '05 #5

Kyle

PeEmm <la*****@ebox.tninet.se> wrote in message news:<bv*********@ripley.netscape.com>...

Kyle skrev, On 1/25/2004 6:51 AM:
I am presently making use of documentElement.innerHTML to retrieve
page contents for manipulation, but I've noticed that the sting value
returned is not identical to the actual page source. Specifically,
attribute assignments that look like:

height=100 width=100

in the real source, look like:

height="100" width="100"

in the returned value from documentElement.innerHTML.

Further complicating things, forms that begin insode a table in this
manner:

<table><form ...><tr><td...><input...></form></td>...

Are returned as:

<table><form ...></form><tr><td...><input...

If I modify the returned value from documentElement.innerHTML, then
write it back to documentElement.innerHTML, many of the forms are
non-functional.

I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,

-Kyle

The DOM naturally only functions as expected, if the HTML source is as
expected, i.e. is valid due to standards. The examples you give above
are malformed HTML, so the DOM tries to do something about the mishmash.

I should have been more clear. This is a Mozilla Chrome extension, so
I assume that I should have access to the same methods that Mozilla
uses to display the source with "View Source" and retrieve the source
for parsing with Sherlock plugins. Thanks,

--Kyle

Jul 20 '05 #6

Randy Webb

Kyle wrote:

Randy Webb <hi************@aol.com> wrote in message news:<4K********************@comcast.com>...
Kyle wrote:
I am interested in any available alternatives that will function in
recent Mozilla releases. Thank you,
Validate your (X)HTML and you will solve a lot of those problems. Along
with dropping tables for layout.

This code is resident in a Mozilla extension, not a page that I've
written. It isn't my HTML that I need to parse so I have no control
over it's validity.

Ok.

Read the group FAQ, it discusses how to read a text file (2 methods),
which is what you are trying to do.

I don't understand what you mean here. As far as I know, the "file"
does not exist anywhere in the filesystem so this is untrue. I assume
this content is somewhere in memory because "View Source" and Sherlock
plugins make use of the real source without accessing the page a 2nd
time.

My response was in direct relation to the assumption (that is now
incorrect) that you were trying to read the HTML code of an HTML file,
and you wanted the original code, not the rendered code (they are
different).

If you load a page, and then do
javascript:alert(document.documentElement.innerHTM L);
In the address bar, and then view the source of the page, on very very
few occasions will they be the same code.

Example:
When I open IE, it opens to about:blank. (actually, all of my browsers
are set to open to about:blank)
View>Source gives this code:
<HTML></HTML>
And thats it.
javascript:alert(document.documentElement.innerHTM L);
alerts this:
<HEAD></HEAD>
<BODY></BODY>

In Mozilla, about:blank view>Source gives this code:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head><title></title></head>
<body></body>
</html>

I line broke it for readability.

javascript:alert(document.documentElement.innerHTM L);
gives this code:

<head><title></title></head><body></body>

Note the missing DTD and HTML tags.

In order to get the original, written code, of a webpage, into a
variable that the page's javascript can use, you have to read the file
from the server. And the only two ways I know of to do that is with an
HTTPRequestObject or a JAVA applet, hence my suggestion to consult the FAQ.

Whether any of that helps with you trying to read a Mozilla Skin plugin,
I don't know :(
--
Randy
Chance Favors The Prepared Mind
comp.lang.javascript FAQ - http://jibbering.com/faq/

Jul 20 '05 #7

Lasse Reichstein Nielsen

Randy Webb <hi************@aol.com> writes:

If you load a page, and then do
javascript:alert(document.documentElement.innerHTM L);
In the address bar, and then view the source of the page, on very very
few occasions will they be the same code.
Yes, browsers build the innerHTML structure from the current structure
of the document, whereas the view-source shows the original source code.
That means that innerHTML is "unparsing" the DOM tree structure, and
it would be surpricing if it gave exactly the same formatting as the
original source, even if the structure was the same.

.... In Mozilla, about:blank view>Source gives this code:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html> ....
javascript:alert(document.documentElement.innerHTM L);
gives this code:

<head><title></title></head><body></body>

Note the missing DTD and HTML tags.

Not surpricing since you ask for the *inner*HTML of the HTML element.
If Mozilla supported the "outerHTML" property, you could also show
the HTML tag. The document type element is even harder to find. It
is the first child of the document element (where the HTML element
is the second).

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

Jul 20 '05 #8

Randy Webb

Lasse Reichstein Nielsen wrote:

Randy Webb <hi************@aol.com> writes:

If you load a page, and then do
javascript:alert(document.documentElement.innerH TML);
In the address bar, and then view the source of the page, on very very
few occasions will they be the same code.

Yes, browsers build the innerHTML structure from the current structure
of the document, whereas the view-source shows the original source code.
That means that innerHTML is "unparsing" the DOM tree structure, and
it would be surpricing if it gave exactly the same formatting as the
original source, even if the structure was the same.

Formatting aside, even then there are very very few occasions where the
browser will give you what it got. The only way to make them match
(aside from the DTD and HTML tags), is to grab it, paste it into your
editor and then use that code.
....
In Mozilla, about:blank view>Source gives this code:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>

....

javascript:alert(document.documentElement.innerH TML);
gives this code:

<head><title></title></head><body></body>

Note the missing DTD and HTML tags.

Not surpricing since you ask for the *inner*HTML of the HTML element.

True. But it only serves to reinforce my statement that if you want the
complete code of the file, you *must* read it from the server, and skip
the parsing. The only two ways I know of to do that is with a java
applet (most widely supported) or with an HTTPRequestObject.

--
Randy
Chance Favors The Prepared Mind
comp.lang.javascript FAQ - http://jibbering.com/faq/

Jul 20 '05 #9

Alternative to documentElement.innerHTML?

Similar topics