473,654 Members | 3,082 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

htmlentities & charencoding

Hi all,

I was hoping to get some clarification on a couple of questions I have:

1) When should htmlspecial characters be used? As a general rule should
it be used for text that may contain special characters that is going
to be rendered in the browser (ie: text that isn't in tags)? I've got a
javascript onclick handler whose code includes an ampersand and the
HTML validator complains. I don't know if I should escape the
ampersand, or even if its possible (seeing that the text is inside a
HTML attribute).

Why would you ever use htmlentities as opposed to htmlspecialchar s? The
only reason I can think of is if you're page's charset doesn't support
the special character you're trying to render (for example, the euro
using Latin1), but then why wouldn't you just change the pages charset
to UTF-8 (unless you're editor can't save in UTF-8, which might
indicate its time to get another editor). The comment on the PHP manual
entry for html entities, 'Please, don't use htmlentities to avoid XSS!
Htmlspecialchar s is enough!' seems to suggest that the uses for
htmlentities is limited, since it needn't be used to avoid XSS.

2) A comment in the PHP manual entry for htmlentities states that their
function can be used to 'replace any characters in a string that could
be 'dangerous' to put in an HTML/XML file with their numeric entities
(e.g. &#233 for [e acute])'. Why would it be dangerous!?

3) What are some typical uses of specifying HTTP input/output character
encoding? If it is used to convert output, why wouldn't you just change
the output page's char encoding? If its used to convert input from say
UTF-8 to Latin1, couldn't you just use a function to do this?

That's about it!

Thanks in advance

Taras

Jul 10 '06 #1
7 1903

Taras_96 wrote:
Hi all,

I was hoping to get some clarification on a couple of questions I have:

1) When should htmlspecial characters be used? As a general rule should
it be used for text that may contain special characters that is going
to be rendered in the browser (ie: text that isn't in tags)? I've got a
javascript onclick handler whose code includes an ampersand and the
HTML validator complains. I don't know if I should escape the
ampersand, or even if its possible (seeing that the text is inside a
HTML attribute).
Well.. bascially your either saying show this image to the user
"copyrightsymbo l" OR giving an instruction to the browser to display a
copyright symbol. I think the "dangerous" comment comes from the fact
that often MS will simply blank sometimes when they will display
correctly in *nix or when an undefined notation is used in a page is it
not known what the effect will be on some platforms or how it will be
displayed.

Flamer.
Why would you ever use htmlentities as opposed to htmlspecialchar s? The
only reason I can think of is if you're page's charset doesn't support
the special character you're trying to render (for example, the euro
using Latin1), but then why wouldn't you just change the pages charset
to UTF-8 (unless you're editor can't save in UTF-8, which might
indicate its time to get another editor). The comment on the PHP manual
entry for html entities, 'Please, don't use htmlentities to avoid XSS!
Htmlspecialchar s is enough!' seems to suggest that the uses for
htmlentities is limited, since it needn't be used to avoid XSS.

2) A comment in the PHP manual entry for htmlentities states that their
function can be used to 'replace any characters in a string that could
be 'dangerous' to put in an HTML/XML file with their numeric entities
(e.g. &#233 for [e acute])'. Why would it be dangerous!?

3) What are some typical uses of specifying HTTP input/output character
encoding? If it is used to convert output, why wouldn't you just change
the output page's char encoding? If its used to convert input from say
UTF-8 to Latin1, couldn't you just use a function to do this?

That's about it!

Thanks in advance

Taras
Jul 11 '06 #2
Message-ID: <11************ **********@35g2 000cwc.googlegr oups.comfrom
Taras_96 contained the following:
>1) When should htmlspecial characters be used? As a general rule should
it be used for text that may contain special characters that is going
to be rendered in the browser (ie: text that isn't in tags)? I've got a
javascript onclick handler whose code includes an ampersand and the
HTML validator complains.
The people without javascript will complain too, when they can't
navigate your site.

Just change the ampersand for &amp;
--
Geoff Berrow (put thecat out to email)
It's only Usenet, no one dies.
My opinions, not the committee's, mine.
Simple RFDs http://www.ckdog.co.uk/rfdmaker/
Jul 11 '06 #3
Taras_96 wrote:
Hi all,

I was hoping to get some clarification on a couple of questions I have:

1) When should htmlspecial characters be used? As a general rule should
it be used for text that may contain special characters that is going
to be rendered in the browser (ie: text that isn't in tags)? I've got a
javascript onclick handler whose code includes an ampersand and the
HTML validator complains. I don't know if I should escape the
ampersand, or even if its possible (seeing that the text is inside a
HTML attribute).
Well, I haven't looked at the code, but I suspect htmlspecialchar s(),
since it converts fewer characters and has fewer options, it would be
faster.

The HTML validator on w3.org is decent, but it doesn't handle javascript
very well. I just ignore the errors in javascript; for instance,
something like:

j=4&i;

The "&i" is not a valid html entity - but it's valid javascript code.
And this javascript wouldn't work:

j = 4%amp;i;

Why would you ever use htmlentities as opposed to htmlspecialchar s? The
only reason I can think of is if you're page's charset doesn't support
the special character you're trying to render (for example, the euro
using Latin1), but then why wouldn't you just change the pages charset
to UTF-8 (unless you're editor can't save in UTF-8, which might
indicate its time to get another editor). The comment on the PHP manual
entry for html entities, 'Please, don't use htmlentities to avoid XSS!
Htmlspecialchar s is enough!' seems to suggest that the uses for
htmlentities is limited, since it needn't be used to avoid XSS.
Just changing the page charset doesn't change what PHP uses. You can
pass a charset to either function, but if you need more than the five
chars handled by htmlspecialchar s() you need to use htmlentities().

And the notes are comments - from users, not the PHP developers. I give
it some credence, but not as much as the "official" word from the PHP
developers. And if you look through them enough, you'll find errors and
other people who get in and correct the errors. Not that much different
than what you find here on usenet.
2) A comment in the PHP manual entry for htmlentities states that their
function can be used to 'replace any characters in a string that could
be 'dangerous' to put in an HTML/XML file with their numeric entities
(e.g. &#233 for [e acute])'. Why would it be dangerous!?
Don't know here, but I suspect browsers may act differently in different
languages. But I have enough trouble with my native language, so I
really haven't worried about it. But again that's a user comment.
3) What are some typical uses of specifying HTTP input/output character
encoding? If it is used to convert output, why wouldn't you just change
the output page's char encoding? If its used to convert input from say
UTF-8 to Latin1, couldn't you just use a function to do this?
I use it anytime I'm displaying data input by the user, read from a
database, etc. You never know when the data might contain a '<', a '"',
etc.

Changing the char encoding for the page doesn't convert any characters.
All it does is tell the browser how to handle the characters. It's up
to you, the programmer, to ensure the character encoding you use matches
that of the page.

That's about it!

Thanks in advance

Taras

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Jul 11 '06 #4
Mel
On 2006-07-11 21:52:53 +1000, Jerry Stuckle <js*******@attg lobal.netsaid:
Taras_96 wrote:
>Hi all,

I was hoping to get some clarification on a couple of questions I have:

1) When should htmlspecial characters be used? As a general rule should
it be used for text that may contain special characters that is going
to be rendered in the browser (ie: text that isn't in tags)? I've got a
javascript onclick handler whose code includes an ampersand and the
HTML validator complains. I don't know if I should escape the
ampersand, or even if its possible (seeing that the text is inside a
HTML attribute).

Well, I haven't looked at the code, but I suspect htmlspecialchar s(),
since it converts fewer characters and has fewer options, it would be
faster.

The HTML validator on w3.org is decent, but it doesn't handle
javascript very well. I just ignore the errors in javascript; for
instance, something like:

j=4&i;

The "&i" is not a valid html entity - but it's valid javascript code.
And this javascript wouldn't work:

j = 4%amp;i;
No, it wouldn't, but valid XHTML _requires_ you to preclude the
embedded JavaScript with the appropriate CDATA marker. The character
'&' is reserved by the markup just like '>' and '<'. Not adhering to
the outlined standards simply encourages bad markup and makes
cross-browser compatibility more difficult. It's a big stretch to
equate cross-browser issues with unencoded ampersands, but it's not
that difficult to deal with. Javascript has some functional string
methods for encoding HTML entities.
>
>Why would you ever use htmlentities as opposed to htmlspecialchar s? The
only reason I can think of is if you're page's charset doesn't support
the special character you're trying to render (for example, the euro
using Latin1), but then why wouldn't you just change the pages charset
to UTF-8 (unless you're editor can't save in UTF-8, which might
indicate its time to get another editor). The comment on the PHP manual
entry for html entities, 'Please, don't use htmlentities to avoid XSS!
Htmlspecialcha rs is enough!' seems to suggest that the uses for
htmlentities is limited, since it needn't be used to avoid XSS.

Just changing the page charset doesn't change what PHP uses. You can
pass a charset to either function, but if you need more than the five
chars handled by htmlspecialchar s() you need to use htmlentities().

And the notes are comments - from users, not the PHP developers. I
give it some credence, but not as much as the "official" word from the
PHP developers. And if you look through them enough, you'll find
errors and other people who get in and correct the errors. Not that
much different than what you find here on usenet.
>2) A comment in the PHP manual entry for htmlentities states that their
function can be used to 'replace any characters in a string that could
be 'dangerous' to put in an HTML/XML file with their numeric entities
(e.g. &#233 for [e acute])'. Why would it be dangerous!?

Don't know here, but I suspect browsers may act differently in
different languages. But I have enough trouble with my native
language, so I really haven't worried about it. But again that's a
user comment.
>3) What are some typical uses of specifying HTTP input/output character
encoding? If it is used to convert output, why wouldn't you just change
the output page's char encoding? If its used to convert input from say
UTF-8 to Latin1, couldn't you just use a function to do this?

I use it anytime I'm displaying data input by the user, read from a
database, etc. You never know when the data might contain a '<', a
'"', etc.

Changing the char encoding for the page doesn't convert any characters.
All it does is tell the browser how to handle the characters. It's
up to you, the programmer, to ensure the character encoding you use
matches that of the page.

>That's about it!

Thanks in advance

Taras

Jul 11 '06 #5
Mel wrote:
On 2006-07-11 21:52:53 +1000, Jerry Stuckle <js*******@attg lobal.netsaid:
>>
Well, I haven't looked at the code, but I suspect htmlspecialchar s(),
since it converts fewer characters and has fewer options, it would be
faster.

The HTML validator on w3.org is decent, but it doesn't handle
javascript very well. I just ignore the errors in javascript; for
instance, something like:

j=4&i;

The "&i" is not a valid html entity - but it's valid javascript code.
And this javascript wouldn't work:

j = 4%amp;i;


No, it wouldn't, but valid XHTML _requires_ you to preclude the embedded
JavaScript with the appropriate CDATA marker. The character '&' is
reserved by the markup just like '>' and '<'. Not adhering to the
outlined standards simply encourages bad markup and makes cross-browser
compatibility more difficult. It's a big stretch to equate cross-browser
issues with unencoded ampersands, but it's not that difficult to deal
with. Javascript has some functional string methods for encoding HTML
entities.
Who said anything about XHTML? This is straight html.

And the point is - this is valid javascript, but the validator on w3.org
doesn't recognize it as such. Therefore it spits out errors where there
are none.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Jul 11 '06 #6
On Tue, 11 Jul 2006 17:36:20 -0400, Jerry Stuckle <js*******@attg lobal.net>
wrote:
>Mel wrote:
>On 2006-07-11 21:52:53 +1000, Jerry Stuckle <js*******@attg lobal.netsaid:
>>The HTML validator on w3.org is decent, but it doesn't handle
javascript very well. I just ignore the errors in javascript; for
instance, something like:

j=4&i;

The "&i" is not a valid html entity - but it's valid javascript code.
And this javascript wouldn't work:

j = 4%amp;i;

No, it wouldn't, but valid XHTML _requires_ you to preclude the embedded
JavaScript with the appropriate CDATA marker. The character '&' is
reserved by the markup just like '>' and '<'. Not adhering to the
outlined standards simply encourages bad markup and makes cross-browser
compatibilit y more difficult. It's a big stretch to equate cross-browser
issues with unencoded ampersands, but it's not that difficult to deal
with. Javascript has some functional string methods for encoding HTML
entities.

Who said anything about XHTML? This is straight html.

And the point is - this is valid javascript, but the validator on w3.org
doesn't recognize it as such. Therefore it spits out errors where there
are none.
Yes, this seems to be backed up by HTML 4.01 appendix B.3.2, which even has an
example of the contents of a <scriptelemen t in VBScript using & as a string
concatenation operator.

http://www.w3.org/TR/html4/appendix/...pecifying-data

It discusses how to avoid accidentally closing the <scriptelemen t, but seems
to indicate that & doesn't start a character reference inside <script>, as
that's automatically CDATA. So validators producing errors in this case would
appear to be wrong.

However, validator.w3.or g currently handles the example given without error. I
uploaded the following:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict //EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15">
<title>Page</title>
</head>
<body>

<script type="text/javascript">
j=4&i;
</script>

</body>
</html>

It responded:

This Page Is Valid -//W3C//DTD HTML 4.01 Strict //EN!

(it also validates as Transitional, unsurprisingly) Has its behaviour changed
recently? Did it used to produce errors in this case?

The "HTML Tidy" validator as used in the HTML Validator Firefox extension also
accepts & within <scriptwithou t complaint, and correctly complains about "</"
appearing in the script source.

--
Andy Hassall :: an**@andyh.co.u k :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Jul 11 '06 #7
Andy Hassall wrote:
On Tue, 11 Jul 2006 17:36:20 -0400, Jerry Stuckle <js*******@attg lobal.net>
wrote:

>>Mel wrote:
>>>On 2006-07-11 21:52:53 +1000, Jerry Stuckle <js*******@attg lobal.netsaid:
The HTML validator on w3.org is decent, but it doesn't handle
javascrip t very well. I just ignore the errors in javascript; for
instance, something like:

j=4&i;

The "&i" is not a valid html entity - but it's valid javascript code.
And this javascript wouldn't work:

j = 4%amp;i;

No, it wouldn't, but valid XHTML _requires_ you to preclude the embedded
JavaScript with the appropriate CDATA marker. The character '&' is
reserved by the markup just like '>' and '<'. Not adhering to the
outlined standards simply encourages bad markup and makes cross-browser
compatibilit y more difficult. It's a big stretch to equate cross-browser
issues with unencoded ampersands, but it's not that difficult to deal
with. Javascript has some functional string methods for encoding HTML
entities.

Who said anything about XHTML? This is straight html.

And the point is - this is valid javascript, but the validator on w3.org
doesn't recognize it as such. Therefore it spits out errors where there
are none.


Yes, this seems to be backed up by HTML 4.01 appendix B.3.2, which even has an
example of the contents of a <scriptelemen t in VBScript using & as a string
concatenation operator.

http://www.w3.org/TR/html4/appendix/...pecifying-data

It discusses how to avoid accidentally closing the <scriptelemen t, but seems
to indicate that & doesn't start a character reference inside <script>, as
that's automatically CDATA. So validators producing errors in this case would
appear to be wrong.

However, validator.w3.or g currently handles the example given without error. I
uploaded the following:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict //EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15">
<title>Page</title>
</head>
<body>

<script type="text/javascript">
j=4&i;
</script>

</body>
</html>

It responded:

This Page Is Valid -//W3C//DTD HTML 4.01 Strict //EN!

(it also validates as Transitional, unsurprisingly) Has its behaviour changed
recently? Did it used to produce errors in this case?

The "HTML Tidy" validator as used in the HTML Validator Firefox extension also
accepts & within <scriptwithou t complaint, and correctly complains about "</"
appearing in the script source.
Andy,

They might have fixed it. I hope so. I've had problems with it before.
I just ignore any errors within <scriptelements .
--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Jul 11 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
3695
by: tco | last post by:
Hi all, I'm searching a reverse function for htmlentities.... i couldn't find anything in the manual and over forums :-/ does anyone have an idea ? many thanks in advance, -- tco
27
5139
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
3
4447
by: jl | last post by:
>From the php manual I copied and pasted this example: <?php $str = "A 'quote' is <b>bold</b>"; // Outputs: A 'quote' is &lt;b&gt;bold&lt;/b&gt; echo htmlentities($str); // Outputs: A 'quote' is &lt;b&gt;bold&lt;/b&gt; echo htmlentities($str, ENT_QUOTES);
39
2420
by: CJM | last post by:
I'm in the process of partially revamping a corporate website. My main brief was to reorganise much of the content and to update a lot of the copy, but in the process I'm also trying to correct some of the technical aspects. The site looks OK, but under the hood it is an abomination; it was designed by a marketing company seemed to thing they were also web designers because they owned a copy of Dreamweaver. Amongst other things, I've...
2
2907
by: matthud | last post by:
<?php //MAKE IT SAFE $chunk = $_POST; $title = $_POST; $url = $_POST; $tags = $_POST; $user = $_POST; $safe_chunk = mysql_real_escape_string(htmlentities($chunk)); $safe_title = mysql_real_escape_string(htmlentities($title));
8
2693
by: js | last post by:
Hi list. If I'm not mistaken, in python, there's no standard library to convert html entities, like &amp; or &gt; into their applicable characters. htmlentitydefs provides maps that helps this conversion, but it's not a function so you have to write your own function make use of htmlentitydefs, probably using regex or something. To me this seemed odd because python is known as
9
4334
nathj
by: nathj | last post by:
Hi, As you can tell by the subject of this post I'm having a spot of bother with htmlentities() and html_entity_decode(). I have built/am building a web site that allows user feedback. When the user enters their review of a resource it is stored in the database. It goes into a LONGTEXT field and what is entered is passed through htmlentities(): $reviewToStore = htmlentities($reviewEntered, ENT_QUOTES) ; The results in the database...
4
2560
by: BG Mahesh | last post by:
hi We are using the normal html controls (textarea) in the posting form. The form page has the utf-8 character set. Users are copying the text from MS Word or Openoffice doc etc. Our PHP code is handling the conversion of RTF text characters and utf characters into HTML entities (e.g. & is being converted to &amp; by the inbuilt php function 'htmlentities')
8
7909
by: mijn naam | last post by:
Can someone please explain to me why/when one would use htmlspecialchars instead of htmlentities? I know: if you only want to get certain characters translated. This is not the answer I'm looking for, I would like to know *why* you would want that, as opposed to a full translation.
0
8372
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8285
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8814
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8706
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8475
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8591
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6160
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
2709
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1592
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.