473,385 Members | 1,535 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Problem loading html containing scripts using Dom LoadHTML

This code is just reading html and printing , eventually I want to
modify the html. However, the original html contains javascript and
the output html contains tags not in the original.

$url = "http://www.something.com";
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
print $doc->saveHTML();

Original html snippet:
function exampleFunction() {
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</head>';
doc += '<body onload="self.focus();">';
doc += '</body></html>';
}

Html after saveHTML:
function exampleFunction() {
('about:blank','imagemanagerpopup',settings);
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</script>
</head>
<body>
<p>';
doc += '</body>
</html><html><body>
<p>';
}

Extra tags to end the script, head and begin a new body are being
added before the </bodytag and after the <body onload=self.focus()>
tag in the js variable. Is there a way for the Dom to leave the
javascript as is without trying to 'fix' the html ? The changes being
made are causing a javascript error.
Thanks

May 14 '07 #1
9 6174
On May 14, 6:08 pm, loretta <lorb...@optonline.netwrote:
This code is just reading html and printing , eventually I want to
modify the html. However, the original html contains javascript and
the output html contains tags not in the original.

$url = "http://www.something.com";
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
print $doc->saveHTML();

Original html snippet:
function exampleFunction() {
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</head>';
doc += '<body onload="self.focus();">';
doc += '</body></html>';
}

Html after saveHTML:
function exampleFunction() {
('about:blank','imagemanagerpopup',settings);
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</script>
</head>
<body>
<p>';
doc += '</body>
</html><html><body>
<p>';

}

Extra tags to end the script, head and begin a new body are being
added before the </bodytag and after the <body onload=self.focus()>
tag in the js variable. Is there a way for the Dom to leave the
javascript as is without trying to 'fix' the html ? The changes being
made are causing a javascript error.
Thanks
start off with xHTML, so it can be loaded with no errors, see google
on how to add javascript in a way that is compliant with xml standards

May 14 '07 #2
On May 14, 2:16 pm, shimmyshack <matt.fa...@gmail.comwrote:
On May 14, 6:08 pm, loretta <lorb...@optonline.netwrote:


This code is just reading html and printing , eventually I want to
modify the html. However, the original html contains javascript and
the output html contains tags not in the original.
$url = "http://www.something.com";
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
print $doc->saveHTML();
Original html snippet:
function exampleFunction() {
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</head>';
doc += '<body onload="self.focus();">';
doc += '</body></html>';
}
Html after saveHTML:
function exampleFunction() {
('about:blank','imagemanagerpopup',settings);
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</script>
</head>
<body>
<p>';
doc += '</body>
</html><html><body>
<p>';
}
Extra tags to end the script, head and begin a new body are being
added before the </bodytag and after the <body onload=self.focus()>
tag in the js variable. Is there a way for the Dom to leave the
javascript as is without trying to 'fix' the html ? The changes being
made are causing a javascript error.
Thanks

start off with xHTML, so it can be loaded with no errors, see google
on how to add javascript in a way that is compliant with xml standards- Hide quoted text -

- Show quoted text -
The html I am retrieving has a xhtml doctype. I also have no control
over the original webpage. The original webpage loads with no errors
in both IE and FF.

May 14 '07 #3
loretta wrote:
On May 14, 2:16 pm, shimmyshack <matt.fa...@gmail.comwrote:
>On May 14, 6:08 pm, loretta <lorb...@optonline.netwrote:


>>This code is just reading html and printing , eventually I want to
modify the html. However, the original html contains javascript and
the output html contains tags not in the original.
$url = "http://www.something.com";
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
print $doc->saveHTML();
Original html snippet:
function exampleFunction() {
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</head>';
doc += '<body onload="self.focus();">';
doc += '</body></html>';
}
Html after saveHTML:
function exampleFunction() {
('about:blank','imagemanagerpopup',settings);
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</script>
</head>
<body>
<p>';
doc += '</body>
</html><html><body>
<p>';
}
Extra tags to end the script, head and begin a new body are being
added before the </bodytag and after the <body onload=self.focus()>
tag in the js variable. Is there a way for the Dom to leave the
javascript as is without trying to 'fix' the html ? The changes being
made are causing a javascript error.
Thanks
start off with xHTML, so it can be loaded with no errors, see google
on how to add javascript in a way that is compliant with xml standards- Hide quoted text -

- Show quoted text -

The html I am retrieving has a xhtml doctype. I also have no control
over the original webpage. The original webpage loads with no errors
in both IE and FF.
But does it validate (http://validator.w3.org)? Pages can load in
browsers without error and still not validate. The browsers are very
forgiving, and make a "best guess" as to what the page creator wanted.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
May 14 '07 #4
On May 14, 7:47 pm, loretta <lorb...@optonline.netwrote:
On May 14, 2:16 pm, shimmyshack <matt.fa...@gmail.comwrote:
On May 14, 6:08 pm, loretta <lorb...@optonline.netwrote:
This code is just reading html and printing , eventually I want to
modify the html. However, the original html contains javascript and
the output html contains tags not in the original.
$url = "http://www.something.com";
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
print $doc->saveHTML();
Original html snippet:
function exampleFunction() {
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</head>';
doc += '<body onload="self.focus();">';
doc += '</body></html>';
}
Html after saveHTML:
function exampleFunction() {
('about:blank','imagemanagerpopup',settings);
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</script>
</head>
<body>
<p>';
doc += '</body>
</html><html><body>
<p>';
}
Extra tags to end the script, head and begin a new body are being
added before the </bodytag and after the <body onload=self.focus()>
tag in the js variable. Is there a way for the Dom to leave the
javascript as is without trying to 'fix' the html ? The changes being
made are causing a javascript error.
Thanks
start off with xHTML, so it can be loaded with no errors, see google
on how to add javascript in a way that is compliant with xml standards- Hide quoted text -
- Show quoted text -

The html I am retrieving has a xhtml doctype. I also have no control
over the original webpage. The original webpage loads with no errors
in both IE and FF.
this is what i find on google.
http://developer.mozilla.org/en/docs...HTML_Documents
use <!CDATA or the "xhtml" document is no such thing, btw it should
not just claim to be xhtml but should be properly validated as such,
including the content-type text/xml+xhtml (served with as .xhtml)
once you have obtained the webpage, and parsed it adding the right
instructions for the xml parser, all should work, if indeed the rest
of the doc is valid xml.

May 14 '07 #5
On May 14, 9:58 pm, shimmyshack <matt.fa...@gmail.comwrote:
On May 14, 7:47 pm, loretta <lorb...@optonline.netwrote:
On May 14, 2:16 pm, shimmyshack <matt.fa...@gmail.comwrote:
On May 14, 6:08 pm, loretta <lorb...@optonline.netwrote:
This code is just reading html and printing , eventually I want to
modify the html. However, the original html contains javascript and
the output html contains tags not in the original.
$url = "http://www.something.com";
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
print $doc->saveHTML();
Original html snippet:
function exampleFunction() {
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</head>';
doc += '<body onload="self.focus();">';
doc += '</body></html>';
}
Html after saveHTML:
function exampleFunction() {
('about:blank','imagemanagerpopup',settings);
var doc = '<html><head>';
doc += '<title>Title</title>';
doc += '</script>
</head>
<body>
<p>';
doc += '</body>
</html><html><body>
<p>';
}
Extra tags to end the script, head and begin a new body are being
added before the </bodytag and after the <body onload=self.focus()>
tag in the js variable. Is there a way for the Dom to leave the
javascript as is without trying to 'fix' the html ? The changes being
made are causing a javascript error.
Thanks
start off with xHTML, so it can be loaded with no errors, see google
on how to add javascript in a way that is compliant with xml standards- Hide quoted text -
- Show quoted text -
The html I am retrieving has a xhtml doctype. I also have no control
over the original webpage. The original webpage loads with no errors
in both IE and FF.

this is what i find on google.http://developer.mozilla.org/en/docs..._and_JavaScrip...
use <!CDATA or the "xhtml" document is no such thing, btw it should
not just claim to be xhtml but should be properly validated as such,
including the content-type text/xml+xhtml (served with as .xhtml)
once you have obtained the webpage, and parsed it adding the right
instructions for the xml parser, all should work, if indeed the rest
of the doc is valid xml.
oops, application/xml+xhtml of course

May 14 '07 #6
Jerry Stuckle wrote:
But does it validate (http://validator.w3.org)? Pages can load in
browsers without error and still not validate. The browsers are very
forgiving, and make a "best guess" as to what the page creator wanted.
From the excerpts posted, no. Javascript blocks in XHTML must be entity
encoded -- that is:

'&' ='&amp;'
'<' ='&lt;'

at a minimum. If not, then the document is not valid.

If a document is not valid, then DOMDocument might not be able to load it
correctly. Or rather, "correctly" is not defined, so DOMDocument is free
to interpret it however it likes!

--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 15 '07 #7
On May 15, 9:50 am, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
Jerry Stuckle wrote:
But does it validate (http://validator.w3.org)? Pages can load in
browsers without error and still not validate. The browsers are very
forgiving, and make a "best guess" as to what the page creator wanted.

From the excerpts posted, no. Javascript blocks in XHTML must be entity
encoded -- that is:

'&' ='&amp;'
'<' ='&lt;'

at a minimum. If not, then the document is not valid.

If a document is not valid, then DOMDocument might not be able to load it
correctly. Or rather, "correctly" is not defined, so DOMDocument is free
to interpret it however it likes!

--
Toby A Inkster BSc (Hons) ARCShttp://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
uising a CDATA block means that the parse wont be tripped up by < and
so forth.

May 15 '07 #8
On May 15, 7:32 am, shimmyshack <matt.fa...@gmail.comwrote:
On May 15, 9:50 am, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:


Jerry Stuckle wrote:
But does it validate (http://validator.w3.org)?Pages can load in
browsers without error and still not validate. The browsers are very
forgiving, and make a "best guess" as to what the page creator wanted.
From the excerpts posted, no. Javascript blocks in XHTML must be entity
encoded -- that is:
'&' ='&amp;'
'<' ='&lt;'
at a minimum. If not, then the document is not valid.
If a document is not valid, then DOMDocument might not be able to load it
correctly. Or rather, "correctly" is not defined, so DOMDocument is free
to interpret it however it likes!
--
Toby A Inkster BSc (Hons) ARCShttp://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux

uising a CDATA block means that the parse wont be tripped up by < and
so forth.- Hide quoted text -

- Show quoted text -
The webpage does not validate, however the errors are nowhere near the
extra tags in the javascirpt being inserted at the head tag, i.e.
there is an unordered list somewhere in the html that is closed twice
and an incorrect checkbox attribute. The page validates in tidy, with
warnings only. There is this CDATA block around all the javascript
functions, in a comment:
//<![CDATA[
//]]>
It seems to me that the parser is seeing the '</head>' tag in the
javascrpt variable and putting in the end script tag and body tags

May 16 '07 #9
loretta wrote:
On May 15, 7:32 am, shimmyshack <matt.fa...@gmail.comwrote:
>On May 15, 9:50 am, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:


>>Jerry Stuckle wrote:
But does it validate (http://validator.w3.org)?Pages can load in
browsers without error and still not validate. The browsers are very
forgiving, and make a "best guess" as to what the page creator wanted.
From the excerpts posted, no. Javascript blocks in XHTML must be entity
encoded -- that is:
'&' ='&amp;'
'<' ='&lt;'
at a minimum. If not, then the document is not valid.
If a document is not valid, then DOMDocument might not be able to load it
correctly. Or rather, "correctly" is not defined, so DOMDocument is free
to interpret it however it likes!
--
Toby A Inkster BSc (Hons) ARCShttp://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
uising a CDATA block means that the parse wont be tripped up by < and
so forth.- Hide quoted text -

- Show quoted text -

The webpage does not validate, however the errors are nowhere near the
extra tags in the javascirpt being inserted at the head tag, i.e.
there is an unordered list somewhere in the html that is closed twice
and an incorrect checkbox attribute. The page validates in tidy, with
warnings only. There is this CDATA block around all the javascript
functions, in a comment:
//<![CDATA[
//]]>
It seems to me that the parser is seeing the '</head>' tag in the
javascrpt variable and putting in the end script tag and body tags
Since you haven't told us the page you're trying to load, we can't see
what the problem is.

And BTW - instead of using "something.com", which is a valid domain, you
should use "example.com" - which is reserved just for such use.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
May 16 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Nomad | last post by:
I'm trying to load an XML document into the DOM using the ActiveXObject I've succeeded in doing this on one machine. Which shouldn't becaus I've checked for the ActiveXObject and it doesn't...
7
by: Ant | last post by:
Hi, I'm, having some problems with this function. function displayElements() { for (i=0;i<document.forms.elements.length; ++i) { document.writeln(document.forms.elements.value); }
16
by: Dany | last post by:
Our web service was working fine until we installed .net Framework 1.1 service pack 1. Uninstalling SP1 is not an option because our largest customer says service packs marked as "critical" by...
0
by: Jaret Brower | last post by:
I'm trying to parse html that resides locally by using the HtmlDocument class and unfortunately you can only get an instance of an HtmlDocument through the WebBrowser control. Some of the html...
1
by: yawnmoth | last post by:
I'm trying to mess around with PHP5's DOM functions and have run into something that confuses me: <?php $dom = new DOMDocument(); $dom->loadHTML('<html></html>'); echo...
2
by: www.gerardvignes.com | last post by:
I am using this to load the client JavaScript for a web application when it is selected by the user) via an Ajax connection to the server. I have found only two ways of loading new JavaScript...
6
by: Shigun | last post by:
On a website I am working on I am trying to load another page into a div on the the page the user does his work from. What I have works correctly in FireFox, but not in IE. I've rummaged Google for...
1
by: charlie imac | last post by:
I have a question on the capability of Ajax. My question is: Is it possible to dynamically load any of the javascript gallery programs such as: Adobe Spry Gallery SmoothGallery others I...
1
by: juicymixx | last post by:
I must be completely missing something here. I can't seem to figure out how to parse using the DOM in PHP5... For instance, as a test I'm trying to scrape out the weather conditions table from:...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.