By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,244 Members | 1,972 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,244 IT Pros & Developers. It's quick & easy.

& and &

P: n/a
How can one stop a browser from converting

&

to

& ?

We have a textarea in our system wehre a user can type in some html code
and have it saved to the database. When the data is retireved and
redisplayed it is displayed as simply &.

HTML snippet:

<TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
fred
&
&amp;
</TEXTAREA>

When displayed, the user predictably sees

fred
&
&

What workarounds are there for this - I am sure it's a problem for
others - is there a way of "escaping" the value before display?
--

jeremy

Mar 3 '06 #1
Share this Question
Share on Google+
11 Replies


P: n/a

Jeremy wrote:
We have a textarea in our system wehre a user can type in some html code
and have it saved to the database.


Users can't type "HTML code" into a <textarea> What they type _is_
plain text, which they might _intend_ to have interpreted later as if
it were HTML. To help them do this you must first convert their plain
text as entered into HTML - part of this process would be to encode
their plaintext "&" into the HTML "&amp;", probably just before storing
it.

Good variable naming in your server code will help too - try prefixing
variable with "strUserJunk", or "htmlUserJunk" as appropriate. Whenever
you see code that assigns variables with mis-matched names, be
suspicious that there's an encoding / decoding process missing.

This stuff isn't hard to do, but it does require clarity of thought and
attention to detail. It's also very important to get right (there are
some interesting attacks you can make on blogs etc. if you let users
post arbitrary chunks of HTML).

Mar 3 '06 #2

P: n/a
Jeremy <je********@gmail.com> wrote:
How can one stop a browser from converting

&amp;

to

& ?
You can't if the document is served as text/html.

If you want a browser to display &amp; literally in a document served as
text/html use &amp;amp;

Depending on what you need it may be possible to serve it as text/plain
in which case you can use the literal. This can also be embedded into a
document served as text/html.
We have a textarea in our system wehre a user can type in some html code
and have it saved to the database. When the data is retireved and
redisplayed it is displayed as simply &.

HTML snippet:

<TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
fred
&
&amp;
</TEXTAREA>

When displayed, the user predictably sees

fred
&
&


You haven't made it clear what this is used for, but maybe your server
side data processing needs to convert character references (not just
&amp;) that the user enters to &amp;char_ref

--
Spartanicus
Mar 3 '06 #3

P: n/a
In article <11*********************@p10g2000cwp.googlegroups. com>, Andy
Dingley says...
Users can't type "HTML code" into a <textarea> What they type _is_
plain text, which they might _intend_ to have interpreted later as if
it were HTML. To help them do this you must first convert their plain
text as entered into HTML - part of this process would be to encode
their plaintext "&" into the HTML "&amp;", probably just before storing
it.

Good variable naming in your server code will help too - try prefixing
variable with "strUserJunk", or "htmlUserJunk" as appropriate. Whenever
you see code that assigns variables with mis-matched names, be
suspicious that there's an encoding / decoding process missing.

This stuff isn't hard to do, but it does require clarity of thought and
attention to detail. It's also very important to get right (there are
some interesting attacks you can make on blogs etc. if you let users
post arbitrary chunks of HTML).


Yep I understand all of that. The user types &amp; into a field and
submits the form. The &amp; is stored in the database as typed by the
user. When the data is redisplayed for editing, the browser changes the
&amp; to simply &

So it really has nothing to do with variable naming and so on - the
question is how can we present back to the user the data that they
entered into the field?

--

jeremy

Mar 3 '06 #4

P: n/a
On Fri, 3 Mar 2006, Andy Dingley wrote:
Jeremy wrote:
We have a textarea in our system wehre a user can type in some
html code and have it saved to the database.
Users can't type "HTML code" into a <textarea>


I don't see for a moment why not. In fact I've been doing it for
ages. (Of course I would term it "markup", not "code").

And see http://www.htmlhelp.com/tools/validator/direct.html.en
for a practical use of such a thing.
What they type _is_ plain text,
What they type is text. Whether it's plain or otherwise is determined
by what the server-side process is going to use it for. There's no
way to control this: whatever they type-in, be it plain text, HTML
markup, C++ code, raw PostScript, or Linear B, gets submitted to the
server-side in accordance with the rules for forms submission. HTML
markup plays no special role in this part of the action - but it's not
for a moment excluded.

It's all about what you *do* with it when it reaches the server side.
which they might _intend_ to have interpreted later as if
it were HTML. To help them do this you must first convert their plain
text as entered into HTML - part of this process would be to encode
their plaintext "&" into the HTML "&amp;", probably just before storing
it.
*That* would certainly not be helpful if they were supplying HTML
markup.
This stuff isn't hard to do, but it does require clarity of thought
That's very true.
It's also very important to get right (there are some interesting
attacks you can make on blogs etc. if you let users post arbitrary
chunks of HTML).


Indeed; so block the raw-HTML options to untrusted contributors. But
that doesn't mean there's anything wrong in principle with the
existence of a raw-HTML option.
Mar 3 '06 #5

P: n/a
Jeremy wrote:
How can one stop a browser from converting

&amp;

to

& ?

We have a textarea in our system wehre a user can type in some html code
and have it saved to the database. When the data is retireved and
redisplayed it is displayed as simply &.

HTML snippet:

<TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
fred
&
&amp;
</TEXTAREA>

When displayed, the user predictably sees

fred
&
&


Easy--convert all the & to &amp; before displaying them. "&" will become
&amp; and will display as "&", and "&amp;" will become "&amp;amp;" and
will display as "&amp;".
Mar 3 '06 #6

P: n/a
In article <46************@individual.net>, Harlan Messinger says...


Easy--convert all the & to &amp; before displaying them. "&" will become
&amp; and will display as "&", and "&amp;" will become "&amp;amp;" and
will display as "&amp;".


Brilliant - obvious but brilliant - thanks that is all I needed.

--

jeremy
Mar 3 '06 #7

P: n/a
hug
Jeremy <je********@gmail.com> wrote:
In article <46************@individual.net>, Harlan Messinger says...


Easy--convert all the & to &amp; before displaying them. "&" will become
&amp; and will display as "&", and "&amp;" will become "&amp;amp;" and
will display as "&amp;".


Brilliant - obvious but brilliant - thanks that is all I needed.


If you're working in PHP see htmlentities() it'll get them all in one
swell foop and it has the speed advantage of being a builtin.

--
http://www.ren-prod-inc.com/hug_soft...action=contact
Mar 3 '06 #8

P: n/a
In article <s8********************************@4ax.com>, hug says...
Brilliant - obvious but brilliant - thanks that is all I needed.


If you're working in PHP see htmlentities() it'll get them all in one
swell foop and it has the speed advantage of being a builtin.


Thanks - actually working in Oracle pl/sql - 'tis a simple replace()
call.

--

jeremy
Mar 3 '06 #9

P: n/a
Jeremy wrote:
In article <s8********************************@4ax.com>, hug says...
Brilliant - obvious but brilliant - thanks that is all I needed.


If you're working in PHP see htmlentities() it'll get them all in one
swell foop and it has the speed advantage of being a builtin.

Thanks - actually working in Oracle pl/sql - 'tis a simple replace()
call.


Hug makes a good general point--you might need to make other conversions
too, like < and > and possibly quotation marks. Server-side applications
generally have access in some way to an HTMLEncode function that handles
all of that.
Mar 3 '06 #10

P: n/a
Jeremy wrote:
In article <11*********************@p10g2000cwp.googlegroups. com>, Andy
Dingley says...
Good variable naming in your server code will help too - try prefixing
variable with "strUserJunk", or "htmlUserJunk" as appropriate. Whenever
you see code that assigns variables with mis-matched names, be
suspicious that there's an encoding / decoding process missing.


So it really has nothing to do with variable naming and so on


See this article from Joel on Software to get a better idea of what Andy
was talking about.
http://www.joelonsoftware.com/articles/Wrong.html

In short, you're receiving unsafe content from the user, expecting it to
be plain text, failing to process it to make it is safe by encoding it
in HTML syntax and then outputting it directly. I suspect this will
probably be another bug if the user happens to enter this:

Hello World!</textarea>
<script>//do something evil</script>

When you include that fragment within your document and have not
processed it, the markup recieved by the browser would look something
like this:

<TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
Hello World!</textarea>
<script>//do something evil</script>
</TEXTAREA>

Also, I'm not sure what that align="virtual" attribute in your markup is
supposed to do, I've never heard of it before. Neither align nor wrap
are valid attributes of the textarea element.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
Mar 7 '06 #11

P: n/a
In article <3s***************@news-server.bigpond.net.au>, Lachlan Hunt
says...
Jeremy wrote:
In article <11*********************@p10g2000cwp.googlegroups. com>, Andy
Dingley says...
Good variable naming in your server code will help too - try prefixing
variable with "strUserJunk", or "htmlUserJunk" as appropriate. Whenever
you see code that assigns variables with mis-matched names, be
suspicious that there's an encoding / decoding process missing.


So it really has nothing to do with variable naming and so on


See this article from Joel on Software to get a better idea of what Andy
was talking about.
http://www.joelonsoftware.com/articles/Wrong.html

In short, you're receiving unsafe content from the user, expecting it to
be plain text, failing to process it to make it is safe by encoding it
in HTML syntax and then outputting it directly. I suspect this will
probably be another bug if the user happens to enter this:

Hello World!</textarea>
<script>//do something evil</script>

When you include that fragment within your document and have not
processed it, the markup recieved by the browser would look something
like this:

<TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
Hello World!</textarea>
<script>//do something evil</script>
</TEXTAREA>

Also, I'm not sure what that align="virtual" attribute in your markup is
supposed to do, I've never heard of it before. Neither align nor wrap
are valid attributes of the textarea element.


Thanks for all your feedback on this. Andy was addressing another issue
- I guess something related t owhat I was asking about. I see and
understand the point about potentially unsafe content. This is part of
an administrative toolset used by experienced and responsible site
administrators.

--

jeremy
Mar 8 '06 #12

This discussion thread is closed

Replies have been disabled for this discussion.