473,386 Members | 1,752 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

What chars are considered safe?

I was just writing a sanitisation route for a bit of user input. The data is
an English text description of a product, and will go into a DB, then back
out to other user's browsers.

As per normal practise, I was working on the basis of leaving in all
characters that I considered safe and stripping out everything else. This
led me to think of what characters are actually safe, given that the user
will want to be able to use at least basic punctuation, currency symbols
and so on. Avoiding < and > seemed obvious, but most other things have a
use I think.

My current line looks like this:

$data = preg_replace( '/[^\s\w\d@"\'()[]{}:#~!$%&*_-+.,]/', "", $data );

(Note that's a list of chars that are *not* to be replaced.) Are any of
these dangerous? Or have I left out some that are harmless and should be in
there?

--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfountain.org/">Derek Fountain</a>
Jul 17 '05 #1
11 1699
Derek Fountain wrote:
I was just writing a sanitisation route for a bit of user input. The data is
an English text description of a product, and will go into a DB, then back
out to other user's browsers.

As per normal practise, I was working on the basis of leaving in all
characters that I considered safe and stripping out everything else. This
led me to think of what characters are actually safe, given that the user
will want to be able to use at least basic punctuation, currency symbols
and so on. Avoiding < and > seemed obvious, but most other things have a
use I think.

My current line looks like this:

$data = preg_replace( '/[^\s\w\d@"\'()[]{}:#~!$%&*_-+.,]/', "", $data );
Two remarks on your pattern syntax: First, the second of
the two square brackets that appear to be in the character
class is in fact not. It is the closing square bracket, and
every non-metacharacter afterwards must be in the subject
string for the replacement to occur. Second, the hyphen, if
it were inside a character class, would cause a warning.
Either escape it or put it where it is not interpreted as a
metacharacter; that is, at the beginning or at the end.
(Note that's a list of chars that are *not* to be replaced.) Are any of
these dangerous? Or have I left out some that are harmless and should be in
there?


How about semicolons and pound signs ('£')?

--
Jock
Jul 17 '05 #2
.oO(Derek Fountain)
I was just writing a sanitisation route for a bit of user input. The data is
an English text description of a product, and will go into a DB, then back
out to other user's browsers.

As per normal practise, I was working on the basis of leaving in all
characters that I considered safe and stripping out everything else. This
led me to think of what characters are actually safe
Safe for what? In normal text every character can be safe if handled
properly.
given that the user
will want to be able to use at least basic punctuation, currency symbols
and so on. Avoiding < and > seemed obvious, but most other things have a
use I think.


What's wrong with < and >?

Micha
Jul 17 '05 #3
"Derek Fountain" <no****@example.com> wrote in message
news:42**********************@per-qv1-newsreader-01.iinet.net.au...
I was just writing a sanitisation route for a bit of user input. The data is an English text description of a product, and will go into a DB, then back
out to other user's browsers.

As per normal practise, I was working on the basis of leaving in all
characters that I considered safe and stripping out everything else. This
led me to think of what characters are actually safe, given that the user
will want to be able to use at least basic punctuation, currency symbols
and so on. Avoiding < and > seemed obvious, but most other things have a
use I think.

My current line looks like this:

$data = preg_replace( '/[^\s\w\d@"\'()[]{}:#~!$%&*_-+.,]/', "", $data );

(Note that's a list of chars that are *not* to be replaced.) Are any of
these dangerous? Or have I left out some that are harmless and should be in there?


What encoding are you using? None of the characters above (maybe except 255)
is special, so I think can be safely included. People like to have their
curly quotes and m-dashes.
Jul 17 '05 #4
NC
Derek Fountain wrote:

I was just writing a sanitisation route for a bit of user input.
The data is an English text description of a product, and will
go into a DB, then back out to other user's browsers.


I think you should clarify your definition of "safe". Safe against
what? There are at least three issues that need to be worked on
here: use of potentially improper HTML formatting by users,
malicious Javascript, and SQL injection...

Also, do you plan to store user inputs as HTML or plain text?

Cheers,
NC

Jul 17 '05 #5
>> $data = preg_replace( '/[^\s\w\d@"\'()[]{}:#~!$%&*_-+.,]/', "", $data );

Two remarks on your pattern syntax: First, the second of
the two square brackets that appear to be in the character
class is in fact not. It is the closing square bracket, and
every non-metacharacter afterwards must be in the subject
string for the replacement to occur. Second, the hyphen, if
it were inside a character class, would cause a warning.


Ahem, yeah, I spotted those not long after posting... :o} Thanks for the
pointer though!

--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfountain.org/">Derek Fountain</a>
Jul 17 '05 #6
Michael Fesser wrote:
will want to be able to use at least basic punctuation, currency symbols
and so on. Avoiding < and > seemed obvious, but most other things have a
use I think.


What's wrong with < and >?


When returned to an innocent user's browser they allow cross site scripting.

--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfountain.org/">Derek Fountain</a>
Jul 17 '05 #7
.oO(Derek Fountain)
Michael Fesser wrote:
will want to be able to use at least basic punctuation, currency symbols
and so on. Avoiding < and > seemed obvious, but most other things have a
use I think.


What's wrong with < and >?


When returned to an innocent user's browser they allow cross site scripting.


Yep, I know. But that's not a problem with the chars itself, but rather
because of improper I/O handling. Use htmlspecialchars() whenever you
print text out to an HTML page and there's no reason anymore to forbid
such "special" chars.

Micha
Jul 17 '05 #8
NC wrote:
Derek Fountain wrote:

I was just writing a sanitisation route for a bit of user input.
The data is an English text description of a product, and will
go into a DB, then back out to other user's browsers.
I think you should clarify your definition of "safe". Safe against
what?


Safe for sending through a database API into the DB, then sending back out
to another user's browser...
There are at least three issues that need to be worked on
here: use of potentially improper HTML formatting by users,
malicious Javascript, and SQL injection...
....and not facilitating exploits based on those or any other attack vectors.

The fact several people asked me to define "safe" puzzled me somewhat. How
many interpretations are there of the word in the context of sending data
strings into a DB and back out to a user's browser? I've a feeling I'm
missing something! :o)
Also, do you plan to store user inputs as HTML or plain text?


As typed into a browser textarea input field. I'm expecting plain text, but
want the user to be able to provide as much punctuation, slang, smilies,
etc. as they like, as long as nothing goes in that might compromise the
system or another user's browser/session.

I know to leave < and > out to prevent XSS. I'm not sure about the entity
constructors & and ;. I'm tempted to leave the single quote out because it
is too useful in SQL injection. I'm suspicious of backticks because of
command injection - that's from my Perl background, although I'm not sure
if that's justified in PHP. I have a natural urge to surpress anything that
might be used in scripting, like #, !, and slashes, but again, not for a
reason I actually claim to understand.

Perhaps I'm going about this the wrong way? What do other PHP coders do with
a text value before considering it "safe" to store and relay back to other
user's browsers?
--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfountain.org/">Derek Fountain</a>
Jul 17 '05 #9
.oO(Derek Fountain)
Perhaps I'm going about this the wrong way? What do other PHP coders do with
a text value before considering it "safe" to store and relay back to other
user's browsers?


Every input/output operation of (string) data may need some special
escaping/encoding applied before to be safe, dependent on the target
media. In your case I would simply do it this way:

1) Every user-submitted string data is run through stripslashes() if
magic quotes are enabled (check with get_magic_quotes_gpc()). This way
you get the raw data, don't have to rely on some obscure configuration
setting and can apply a proper escaping yourself whenever necessary.

2) Before storing the data into the DB all the strings are run through
mysql_real_escape_string() to escape certain chars (like single quotes)
which might cause trouble for the DB.

3) When printing the data out again to an HTML page htmlspecialchars()
is called on all strings, which converts some chars that have a special
meaning in HTML (<, >, &, ") to named entity references.

That's all. There's no need to take special care of some particular
chars, the mentioned functions above take care of problems like SQL
injection and XSS.

Micha
Jul 17 '05 #10
"Derek Fountain" <no****@example.com> wrote in message
news:42***********************@per-qv1-newsreader-01.iinet.net.au...

: I know to leave < and > out to prevent XSS. I'm not sure about the entity
: constructors & and ;. I'm tempted to leave the single quote out because it
: is too useful in SQL injection. I'm suspicious of backticks because of
: command injection - that's from my Perl background, although I'm not sure
: if that's justified in PHP. I have a natural urge to surpress anything
that
: might be used in scripting, like #, !, and slashes, but again, not for a
: reason I actually claim to understand.
:
: Perhaps I'm going about this the wrong way? What do other PHP coders do
with
: a text value before considering it "safe" to store and relay back to other
: user's browsers?

Well, for starters *anything* that I insert into a web page using php ALWAYS
goes through htmlentities() first, which entitity escapes &quot; &apos; &gt;
&lt; etc - otherwise you'll be making invalid html.

You might want to clean up stuff that people input too - there's a good
chance that someone putting
<script language='javascript'...
or
<?php
isn't necessarily entering something you'd like to display...

Matt
Jul 17 '05 #11
Michael Fesser wrote:
Every input/output operation of (string) data may need some special
escaping/encoding applied before to be safe, dependent on the target
media. In your case I would simply do it this way:

1) Every user-submitted string data is run through stripslashes() if
magic quotes are enabled (check with get_magic_quotes_gpc()). This way
you get the raw data, don't have to rely on some obscure configuration
setting and can apply a proper escaping yourself whenever necessary.

2) Before storing the data into the DB all the strings are run through
mysql_real_escape_string() to escape certain chars (like single quotes)
which might cause trouble for the DB.

3) When printing the data out again to an HTML page htmlspecialchars()
is called on all strings, which converts some chars that have a special
meaning in HTML (<, >, &, ") to named entity references.

That's all. There's no need to take special care of some particular
chars, the mentioned functions above take care of problems like SQL
injection and XSS.


This makes a lot of sense, thanks. One more question though. Assume I take
what the user has given me and do the DB escaping before I put it in the
database. I then ensure that whenever this string is returned to a browser
I pass it through htmlentities() first. But... what about when I want to
return the string to the user so they can edit it? The user will want to
see the quotes and angle brackets, etc., not the escaped versions of those
characters. Is it safe to hand the string back in a <textarea> or similar
without escaping it?

I experimented with Mozilla, and it seems to do the right thing. If I put
<script>alert('hello');<script> into my DB then send it out into a textarea
the browser doesn't run the script, but just offers it for editing. Is that
dependable? It would appear to be a browser feature, and not something PHP
has any control over.

--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfountain.org/">Derek Fountain</a>
Jul 17 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Sasha | last post by:
Hi, I am extending standard IEnumerator, and I was just wondering what is the best way to make enumarator safe? What do I mean by safe? Detect deletes and all... My idea is to have private Guid...
1
by: Ofir | last post by:
Hellow, I have a program that draws lines,text,images on a graphics object, from a Thread, using GDI+. are GDI+ considered to be thread safe, or i must invoke my draw method to the main...
3
by: Dave Crypto | last post by:
Hi There, SUMMARY: I need to know what the actual maximum date limit possible on a row of a MYSQL database. MORE DETAILS: For example, does a MYSQL database only allow 4032 bytes of data...
669
by: Xah Lee | last post by:
in March, i posted a essay “What is Expressiveness in a Computer Languageâ€, archived at: http://xahlee.org/perl-python/what_is_expresiveness.html I was informed then that there is a academic...
1
by: chandy | last post by:
I've been working with web technologies for ten years and in all that time a safe urlencoding for a space has always been %20 on every platform I have ever used. Now I am using asp.net and it's...
10
by: _mario.lat | last post by:
hallo, what does it means "the function is not thread-safe"? thak you in advance, Mario.
3
by: Kevin Blount | last post by:
I'm putting a radG:GridTemplateColumn together (which is probably irelevant), and within it I'm using a Label, as so: <asp:Label ID="defaultDescription" runat="server" Text='<%#...
0
by: Maric Michaud | last post by:
Le Monday 16 June 2008 18:58:06 Ethan Furman, vous avez écrit : As Larry Bates said the python way is to use str.join, but I'd do it with a genexp for memory saving, and a set to get O(1) test...
5
by: =?GB2312?B?17/HvyBaaHVvLCBRaWFuZw==?= | last post by:
Hi, I would like to have someone comments on what's the best practice defining error codes in C. Here's what I think: solution A: using enum pros: type safe. better for debug (some debugger...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.