strip html but keep ' & "" 
July 9th, 2009, 01:58 PM
| | Newbie | | Join Date: Jul 2009 Location: Missouri
Posts: 3
| |
hello all!
I am using a form that a user can fill out but for security reasons I want html stripped out. If the user inputs html, I want it to kick back saying something to the fact that it had html removed. What I have works just fine with one exception, I want people to be able to use -
<?php
-
$RemarksPure = Trim(stripslashes($_POST['remarks']));
-
$Remarks = addslashes(preg_replace('#</?\w[^>]*>#', '', $RemarksPure));
-
$RemarksValidationOK = true;
-
$ValidationOK = true;
-
if ($RemarksPure !== $Remarks) {
-
// breaks validation for the form thus returning user to page to re-edit content
-
$RemarksValidationOK = false;
-
$ValidationOK = false; // Whole Form Validation
-
}
-
?>
-
-
<?php if (!$RemarksValidationOK) { echo "No HTML please!"; } ?>
-
(This is shortened code)
While this works fine for stripping html, it also kicks back apostrophes and quotes. I would like the user to be able to use them, but I'm not completely sure how to do that. I want to maintain security so that people can't put errant code in the input box, but atleast it doesn't kick back on an apostrophe. Quotes would be nice, but not necessary if I'm better off leaving it as is.
I think I have a basic concepts of these commands in php to have gotten it working thus far, but I'm betting I can modify these commands to function better for me: - preg_replace - I'm a little flaky on the syntax and how it's used, but understand how it works
- stripslashes & addslashes - not sure I fully understand this function to properly use it for what I need.
Thanks in advance!
Dan
| 
July 9th, 2009, 08:58 PM
|  | Expert | | Join Date: Dec 2007 Location: Moon, Dark Side
Posts: 1,075
| | | re: strip html but keep ' & ""
There are many other functions in the manual for removing HTML code only from a string. (I can also give you a regexp)
addslashes() is good to sanitize all types of quotations for a database insertion (to prevent SQL injection), but you don't need that otherwise.
If you are putting it in the database, then when you store it in the DB the slash before the double quote are not stored.
If you want to accept just string text from the user, you should.
1. use preg_replace() to remove all HTML tags
2. use mysql_read_escape_string() , if you're using MySQL DBMS, or addslashes otherwise.
3. If you want to display the text again to the user, without having to recall it from the Database, then just put the version after step one into a variable so that you have a non-db-safe copy without any slashes.
Remove HTML: preg_replace("/</?[a-z][a-z0-9]*[^<>]*>/i","",$input);
// above removes all opening and closing (case insensitive) html tags.
Hope that helps,
Dan
| 
July 10th, 2009, 12:49 AM
| | Newbie | | Join Date: Jul 2009 Location: Missouri
Posts: 3
| | | re: strip html but keep ' & ""
I am just emailing the content of the input box, nothing needs to really be stored from these. I tried intgrating your pregreplace and i got an error saying that ? was an unknown variable... I don't really know what to do for troubleshooting though.
You however did finally make the stripslashes and addslashes make sense... so it adds slashes for certain types of data whereas in others you just want raw data. SO thanks!
This is the line that was giving me an error... - $RemarksSlashed = addslashes(preg_replace("/</?[a-z][a-z0-9]*[^<>]*>/i","",$RemarksPure));
| 
July 10th, 2009, 01:07 AM
|  | Expert | | Join Date: Dec 2007 Location: Moon, Dark Side
Posts: 1,075
| | | re: strip html but keep ' & ""
you probably dont need the addslashes() then.
Check your variable name for misspellings.
DAN
| 
July 10th, 2009, 02:22 AM
| | Newbie | | Join Date: Jul 2009 Location: Missouri
Posts: 3
| | | re: strip html but keep ' & ""
Thanks, I'll give it a shot!
| 
July 10th, 2009, 06:30 AM
|  | Moderator | | Join Date: Aug 2008 Location: Leipzig, Germany
Posts: 3,485
Provided Answers: 9 | | | re: strip html but keep ' & "" - preg_replace("/</?[a-z][a-z0-9]*[^<>]*>/i","",$input);
should probably be - preg_replace("§</?[a-z][a-z0-9]*[^<>]*>§i","",$input);
or even - preg_replace("@</?\w*[^<>]*>@i","",$input);
maybe htmlspecialchars() is also worth looking at
|  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 225,662 network members.
|