473,473 Members | 2,320 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

encoding of scripts

Hi,

using HTML 4.01 (not xhtml), I have recently discovered that this:

<script>var x='</script>';</script>

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script. initially my reaction was
that this is not a surprise because I had failed to HTML encode the script
contents, so my second attempt was this:

<script>var x='&lt;/script&gt;';</script>

however this it DOES NOT WORK - the variable ends up containing the text
"&lt;/script&gt;"

can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.

interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html (and seems even to be valid XHTML) even though it is not valid
XML

Andy
Jun 27 '08 #1
5 1554
Andy Fish schreef:
Hi,

using HTML 4.01 (not xhtml), I have recently discovered that this:

<script>var x='</script>';</script>

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script. initially my reaction was
that this is not a surprise because I had failed to HTML encode the script
contents, so my second attempt was this:

<script>var x='&lt;/script&gt;';</script>

however this it DOES NOT WORK - the variable ends up containing the text
"&lt;/script&gt;"

can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.

interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html (and seems even to be valid XHTML) even though it is not valid
XML

Andy

What about:

<script>var x='<\/script>';</script>
?
Mind the added \

Regards,
Erwin Moller
Jun 27 '08 #2
On Jun 2, 12:41 pm, "Andy Fish" <ajf...@blueyonder.co.ukwrote:
can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.
http://www.w3.org/TR/html4/sgml/dtd.html#Script :

<!ENTITY % Script "CDATA" -- script expression -->

http://www.w3.org/TR/html4/sgml/dtd.html#head.content

<!ELEMENT SCRIPT - - %Script; -- script statements -->
interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html
Apart from the missing required "type" attribute, yes. The content
type of the script element in HTML4 is CDATA, which means everything
up to the first occurrence of </ is read as-is.
(and seems even to be valid XHTML) even though it is not valid XML
This is not possible since XHTML is XML.

The content type of the script element in XHTML1 is PCDATA, which that
your original idea of using
var= '&lt;foo&gt;'

means the same as
var='<foo>'

in a raw javascript file. Note that this doesn't actually work "in
the wild", because most users have broken browsers (eg: IE).

The best thing to do is to never ever have anything in your script
elements and only include scripts in separate files.

HTH
viza
Jun 27 '08 #3
On Mon, 2 Jun 2008, Andy Fish wrote:
Newsgroups: comp.infosystems.www.authoring.html
In how many newsgroups did you multipost?
Jun 27 '08 #4
Scripsit Andy Fish:
using HTML 4.01 (not xhtml), I have recently discovered that this:

<script>var x='</script>';</script>

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script.
The fact that there is an end tag causes that. Quotes do not matter.
They are just data characters in this context.
<script>var x='&lt;/script&gt;';</script>

however this it DOES NOT WORK - the variable ends up containing the
text "&lt;/script&gt;"
By HTML 4.01 rules, yes. There the content model is CDATA, which means
that entity references are not recognized, and "&" is just a data
character.
can someone point me at part of the w3c specification that states how
script tags are parsed differently to other tags in HTML.
They aren't. The _content_ of the <script_element_ is special. This
can be found in the HTML 4.01 specs simply by looking at the description
of that element; it points to
http://www.w3.org/TR/html401/types.html#type-script
which refers to an appendix that explains ways to overcome the "</"
problem, such as prefixing "/" with "\" in JavaScript. In JavaScript,
you could also write
var x='<'+'/script>';
but that looks a bit more hackish.
interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html
No it isn't, but that's due to the lack of the type="..." attribute. If
you fix that, then it is valid. That's because the digit "5" isn't a
name start character.
(and seems even to be valid XHTML)
It isn't valid in XHTML, since by XHTML rules, "<" must not appear in
any context as such except as the starting character of a tag.

In XHTML, the content model of <scriptis #PCDATA, so _there_ you could
use &lt; to stand for "<". But it's not wise to use XHTML as the
delivery format of a web page, because IE does not support XHTML.
even though it is not valid XML
It would be impossible for a document to be non-valid XML if it is valid
XHTML. This immediately follows from the _definition_ of validity.

There is a simple way to get rid of such complexities: write your script
into an external file and refer to it via <script type="text/javascript"
src="foo.js"></script>.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jun 27 '08 #5
thanks for all the replies - i understand it all now

unfortunately i can't write all my scripts in separate js files because this
is all javascript that i'm generating on the fly on the server, but i have
amended my quoting/encoding functions to detect '</' and split it into 2
concatenated strings

:-)
"Jukka K. Korpela" <jk******@cs.tut.fiwrote in message
news:JH******************@reader1.news.saunalahti. fi...
Scripsit Andy Fish:
>using HTML 4.01 (not xhtml), I have recently discovered that this:

<script>var x='</script>';</script>

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script.

The fact that there is an end tag causes that. Quotes do not matter. They
are just data characters in this context.
><script>var x='&lt;/script&gt;';</script>

however this it DOES NOT WORK - the variable ends up containing the
text "&lt;/script&gt;"

By HTML 4.01 rules, yes. There the content model is CDATA, which means
that entity references are not recognized, and "&" is just a data
character.
>can someone point me at part of the w3c specification that states how
script tags are parsed differently to other tags in HTML.

They aren't. The _content_ of the <script_element_ is special. This can
be found in the HTML 4.01 specs simply by looking at the description of
that element; it points to
http://www.w3.org/TR/html401/types.html#type-script
which refers to an appendix that explains ways to overcome the "</"
problem, such as prefixing "/" with "\" in JavaScript. In JavaScript, you
could also write
var x='<'+'/script>';
but that looks a bit more hackish.
>interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html

No it isn't, but that's due to the lack of the type="..." attribute. If
you fix that, then it is valid. That's because the digit "5" isn't a name
start character.
>(and seems even to be valid XHTML)

It isn't valid in XHTML, since by XHTML rules, "<" must not appear in any
context as such except as the starting character of a tag.

In XHTML, the content model of <scriptis #PCDATA, so _there_ you could
use &lt; to stand for "<". But it's not wise to use XHTML as the delivery
format of a web page, because IE does not support XHTML.
>even though it is not valid XML

It would be impossible for a document to be non-valid XML if it is valid
XHTML. This immediately follows from the _definition_ of validity.

There is a simple way to get rid of such complexities: write your script
into an external file and refer to it via <script type="text/javascript"
src="foo.js"></script>.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: terence.parker | last post by:
I have language text stored as variables in text files, which are 'included' by my PHP scripts (is there a better way?). However, I seem to have a problem with the simplified chinese GB2312...
10
by: Achim Domma | last post by:
Hi, I read a webpage via urllib2. The result of the 'read' call is of type 'str'. This string can be written to disc via file('out.html','w').write(html). Then I write the string into a...
18
by: Klaus Alexander Seistrup | last post by:
Hi, After upgrading my Python interpreter to 2.3.1 I constantly get warnings like this: DeprecationWarning: Non-ASCII character '\xe6' in file mumble.py on line 2, but no encoding declared;...
3
by: Peter | last post by:
Hi everybody I have a webpage with two forms on it. The reciever of the data from Form1 wants the encoding to be UTF-8 while the receiver of Form2 wants it in iso-8859-1. Is there a way to tell...
4
by: Luc | last post by:
Hello All, It's the first time I read the following code. A few js files like the one below, but I don't know how to make them readable by a human ( me ). Is it a strange character set ? an...
0
by: Julien Demoor | last post by:
Hello, My website is blog/news aggregator. It reads rss and atom feeds to store and stores data in a mysql database. The feeds may use different character encodings, generally utf-8 and...
2
by: Roshawn Dawson | last post by:
Hi, I'm an ASP.NET newbie and don't have a solid understanding of what encoding is (and its benefits). My question is, how do I turn "Harry Potter" into "Harry%20Potter" using ASP.NET? ...
6
by: fossmo | last post by:
Is there a way to force an encoding in a asp.net site? I have tried to save the pages in utf-8 encoding, with a lot of succsess. Letters like ÆØÅ (norwegian letters) are displayed when I do it...
3
by: Schups | last post by:
Hi everyone, this is my first "topic" on Google Groups. I'm looking for a solution for 4 days, without results. So... my apache is serving pages in UTF-8 but my php scripts are writed (I'm...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.