473,395 Members | 1,622 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Get all document contents

Is there a way to get the entire contents of the current document as a
string? I want to send the document contents to a markup validation
service.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 23 '05 #1
5 1519


Christopher Benson-Manica wrote:
Is there a way to get the entire contents of the current document as a
string?
Some browsers like IE or Opera allow you to serialize an element so you
can use
document.documentElement.outerHTML
to get the serialized markup of the <HTML> element.
I want to send the document contents to a markup validation
service.


Send them the URL then, that way they can fetch the contents. outerHTML
will hardly do for validation as browsers apply their own serialization
and that way while your source might be XHTML with lower case tag names
the outerHTML might contain tags in upper case letters.

Of course there are also browser dependant methods to get the source of
the page, see
http://jibbering.com/faq/#FAQ4_38
but XMLHttpRequest's responseText is known for instance to not handle
ISO-8859-x encodings properly.
Martin Honnen
http://JavaScript.FAQTs.com/
Jul 23 '05 #2
Martin Honnen <ma*******@yahoo.de> spoke thus:
Send them the URL then, that way they can fetch the contents.
Obviously that would be the easy solution, but the pages I'd like to
do this with aren't accessible to the validator (users must be logged
in to view these pages).
outerHTML
will hardly do for validation as browsers apply their own serialization
and that way while your source might be XHTML with lower case tag names
the outerHTML might contain tags in upper case letters.
Hm, I see the problem. For the purposes of validation, though, it
should be possible to clean up the string without too much trouble
(convert all characters to lowercase to take care of the tags)
although it seems that attributes lose their enclosing double quotes
as well, which is unfortunate.
Of course there are also browser dependant methods to get the source of
the page, see
http://jibbering.com/faq/#FAQ4_38
but XMLHttpRequest's responseText is known for instance to not handle
ISO-8859-x encodings properly.


In what way does it fail to handle such encodings? I'll look into
something like this and see if I can make it work. Thanks.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Jul 23 '05 #3
Christopher Benson-Manica wrote:
Hm, I see the problem. For the purposes of validation, though, it
should be possible to clean up the string without too much trouble
(convert all characters to lowercase to take care of the tags)
although it seems that attributes lose their enclosing double quotes
as well, which is unfortunate.


In addition to that, browsers will add tags and content where there is none
in the source.
For example, adding <tbody> tags to tables, even if it's not in your source.

Examining the browser's internal representation of your source is inadequate
for validation.

--
Matt Kruse
http://www.JavascriptToolbox.com
Jul 23 '05 #4
Christopher Benson-Manica wrote:
Martin Honnen <ma*******@yahoo.de> spoke thus:

Send them the URL then, that way they can fetch the contents.

Obviously that would be the easy solution, but the pages I'd like to
do this with aren't accessible to the validator (users must be logged
in to view these pages).


You can install the W3C validator locally.

Allowing a browser to parse the HTML first and then send it to the
validator will effectively invalidate your validation. AFAIK (but I
may well be wrong), you can't get the doctype declaration which is
fundamental to validating the page.

--
Rob
Jul 23 '05 #5
RobG wrote:
<snip>
... . AFAIK (but I may well be wrong),
you can't get the doctype declaration which is
fundamental to validating the page.


On Mozilla and Opera (recent versions):-

document.doctype (object)
document.doctype.publicId (string)
document.doctype.systemId (string)

- could be used to re-produce it.

The other issues raised about the likely validity of a serialised DOM
makes doing so pointless in this context, but where a serialised DOM has
other uses it can be employed to make the results more complete (along
with maybe iterating the attributes collection of the documentElement in
order to supplement innerHTML with accurate HTML tags).

Richard.
Jul 23 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Claus - Arcolutions | last post by:
I got a word document as a stream, and I want to get the text from the word document. But I cant seem to find anything to use for that purpose. The "Microsoft office ?.object" com reference, only...
8
by: '69 Camaro | last post by:
Perhaps I'm Googling for the wrong terms. Does anyone have links to examples of the syntax necessary to read the HTML on another Web page when that HTML is produced from JavaScript using the...
7
by: Une Bévue | last post by:
the purpose : avoid all banners and unusefull contents of an html document the leaves intact the part from start to body and inside the body leave only the part where user has clicked (by...
9
by: tubby | last post by:
Silly question, but here goes... what's a good way to determine when a file is an Open Office document? I could look at the file extension, but it seems there would be a better way. VI shows this...
6
by: Rob | last post by:
Hello, I'm sure this has come up before. I have need for a collection of all elements/objects in an HTML document that have any kind of an attribute (HTML or CSS) that is making use of a URL to...
1
The1corrupted
by: The1corrupted | last post by:
Alright, I'm embedding HTML, Javascript, and CSS all in PHP and right now, the javascript isn't functioning properly. <script type='text/javascript'> var contents=\"\"; function dispcontent() {...
1
by: raviviswanathan.81 | last post by:
Hello, So we have a webmaster who sets document.domain to some domain. After that, we try to create and inject text inside an iframe by getting the iframeID.contentDocument (or...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.