Connecting Tech Pros Worldwide Forums | Help | Site Map

encoding of scripts

Andy Fish
Guest
 
Posts: n/a
#1: Jun 27 '08
Hi,

using HTML 4.01 (not xhtml), I have recently discovered that this:

<script>var x='</script>';</script>

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script. initially my reaction was
that this is not a surprise because I had failed to HTML encode the script
contents, so my second attempt was this:

<script>var x='&lt;/script&gt;';</script>

however this it DOES NOT WORK - the variable ends up containing the text
"&lt;/script&gt;"

can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.

interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html (and seems even to be valid XHTML) even though it is not valid
XML

Andy



Erwin Moller
Guest
 
Posts: n/a
#2: Jun 27 '08

re: encoding of scripts


Andy Fish schreef:
Quote:
Hi,
>
using HTML 4.01 (not xhtml), I have recently discovered that this:
>
<script>var x='</script>';</script>
>
is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script. initially my reaction was
that this is not a surprise because I had failed to HTML encode the script
contents, so my second attempt was this:
>
<script>var x='&lt;/script&gt;';</script>
>
however this it DOES NOT WORK - the variable ends up containing the text
"&lt;/script&gt;"
>
can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.
>
interestingly i have also discovered that this:
>
<script>if (3<5);</script>
>
IS valid html (and seems even to be valid XHTML) even though it is not valid
XML
>
Andy
>
>
What about:

<script>var x='<\/script>';</script>
?
Mind the added \

Regards,
Erwin Moller
viza
Guest
 
Posts: n/a
#3: Jun 27 '08

re: encoding of scripts


On Jun 2, 12:41 pm, "Andy Fish" <ajf...@blueyonder.co.ukwrote:
Quote:
can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.
http://www.w3.org/TR/html4/sgml/dtd.html#Script :

<!ENTITY % Script "CDATA" -- script expression -->

http://www.w3.org/TR/html4/sgml/dtd.html#head.content

<!ELEMENT SCRIPT - - %Script; -- script statements -->
Quote:
interestingly i have also discovered that this:
>
<script>if (3<5);</script>
>
IS valid html
Apart from the missing required "type" attribute, yes. The content
type of the script element in HTML4 is CDATA, which means everything
up to the first occurrence of </ is read as-is.
Quote:
(and seems even to be valid XHTML) even though it is not valid XML
This is not possible since XHTML is XML.

The content type of the script element in XHTML1 is PCDATA, which that
your original idea of using
var= '&lt;foo&gt;'

means the same as
var='<foo>'

in a raw javascript file. Note that this doesn't actually work "in
the wild", because most users have broken browsers (eg: IE).

The best thing to do is to never ever have anything in your script
elements and only include scripts in separate files.

HTH
viza
Andreas Prilop
Guest
 
Posts: n/a
#4: Jun 27 '08

re: encoding of scripts


On Mon, 2 Jun 2008, Andy Fish wrote:
Quote:
Newsgroups: comp.infosystems.www.authoring.html
In how many newsgroups did you multipost?
Jukka K. Korpela
Guest
 
Posts: n/a
#5: Jun 27 '08

re: encoding of scripts


Scripsit Andy Fish:
Quote:
using HTML 4.01 (not xhtml), I have recently discovered that this:
>
<script>var x='</script>';</script>
>
is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script.
The fact that there is an end tag causes that. Quotes do not matter.
They are just data characters in this context.
Quote:
<script>var x='&lt;/script&gt;';</script>
>
however this it DOES NOT WORK - the variable ends up containing the
text "&lt;/script&gt;"
By HTML 4.01 rules, yes. There the content model is CDATA, which means
that entity references are not recognized, and "&" is just a data
character.
Quote:
can someone point me at part of the w3c specification that states how
script tags are parsed differently to other tags in HTML.
They aren't. The _content_ of the <script_element_ is special. This
can be found in the HTML 4.01 specs simply by looking at the description
of that element; it points to
http://www.w3.org/TR/html401/types.html#type-script
which refers to an appendix that explains ways to overcome the "</"
problem, such as prefixing "/" with "\" in JavaScript. In JavaScript,
you could also write
var x='<'+'/script>';
but that looks a bit more hackish.
Quote:
interestingly i have also discovered that this:
>
<script>if (3<5);</script>
>
IS valid html
No it isn't, but that's due to the lack of the type="..." attribute. If
you fix that, then it is valid. That's because the digit "5" isn't a
name start character.
Quote:
(and seems even to be valid XHTML)
It isn't valid in XHTML, since by XHTML rules, "<" must not appear in
any context as such except as the starting character of a tag.

In XHTML, the content model of <scriptis #PCDATA, so _there_ you could
use &lt; to stand for "<". But it's not wise to use XHTML as the
delivery format of a web page, because IE does not support XHTML.
Quote:
even though it is not valid XML
It would be impossible for a document to be non-valid XML if it is valid
XHTML. This immediately follows from the _definition_ of validity.

There is a simple way to get rid of such complexities: write your script
into an external file and refer to it via <script type="text/javascript"
src="foo.js"></script>.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Andy Fish
Guest
 
Posts: n/a
#6: Jun 27 '08

re: encoding of scripts


thanks for all the replies - i understand it all now

unfortunately i can't write all my scripts in separate js files because this
is all javascript that i'm generating on the fly on the server, but i have
amended my quoting/encoding functions to detect '</' and split it into 2
concatenated strings

:-)


"Jukka K. Korpela" <jkorpela@cs.tut.fiwrote in message
news:JHT0k.10966$_03.6624@reader1.news.saunalahti. fi...
Quote:
Scripsit Andy Fish:
>
Quote:
>using HTML 4.01 (not xhtml), I have recently discovered that this:
>>
><script>var x='</script>';</script>
>>
>is not valid HTML - the fact that there is an end script tag in quotes
>causes the parser to stop recognising the script.
>
The fact that there is an end tag causes that. Quotes do not matter. They
are just data characters in this context.
>
Quote:
><script>var x='&lt;/script&gt;';</script>
>>
>however this it DOES NOT WORK - the variable ends up containing the
>text "&lt;/script&gt;"
>
By HTML 4.01 rules, yes. There the content model is CDATA, which means
that entity references are not recognized, and "&" is just a data
character.
>
Quote:
>can someone point me at part of the w3c specification that states how
>script tags are parsed differently to other tags in HTML.
>
They aren't. The _content_ of the <script_element_ is special. This can
be found in the HTML 4.01 specs simply by looking at the description of
that element; it points to
http://www.w3.org/TR/html401/types.html#type-script
which refers to an appendix that explains ways to overcome the "</"
problem, such as prefixing "/" with "\" in JavaScript. In JavaScript, you
could also write
var x='<'+'/script>';
but that looks a bit more hackish.
>
Quote:
>interestingly i have also discovered that this:
>>
><script>if (3<5);</script>
>>
>IS valid html
>
No it isn't, but that's due to the lack of the type="..." attribute. If
you fix that, then it is valid. That's because the digit "5" isn't a name
start character.
>
Quote:
>(and seems even to be valid XHTML)
>
It isn't valid in XHTML, since by XHTML rules, "<" must not appear in any
context as such except as the starting character of a tag.
>
In XHTML, the content model of <scriptis #PCDATA, so _there_ you could
use &lt; to stand for "<". But it's not wise to use XHTML as the delivery
format of a web page, because IE does not support XHTML.
>
Quote:
>even though it is not valid XML
>
It would be impossible for a document to be non-valid XML if it is valid
XHTML. This immediately follows from the _definition_ of validity.
>
There is a simple way to get rid of such complexities: write your script
into an external file and refer to it via <script type="text/javascript"
src="foo.js"></script>.
>
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Closed Thread