Connecting Tech Pros Worldwide Forums | Help | Site Map

strip html but keep ' & ""

Newbie
 
Join Date: Jul 2009
Location: Missouri
Posts: 3
#1: Jul 9 '09
hello all!

I am using a form that a user can fill out but for security reasons I want html stripped out. If the user inputs html, I want it to kick back saying something to the fact that it had html removed. What I have works just fine with one exception, I want people to be able to use

Expand|Select|Wrap|Line Numbers
  1. <?php
  2. $RemarksPure = Trim(stripslashes($_POST['remarks']));
  3. $Remarks = addslashes(preg_replace('#</?\w[^>]*>#', '', $RemarksPure));
  4. $RemarksValidationOK = true;
  5. $ValidationOK = true;
  6. if ($RemarksPure !== $Remarks) {
  7.     // breaks validation for the form thus returning user to page to re-edit content
  8.     $RemarksValidationOK = false;
  9.     $ValidationOK = false;  // Whole Form Validation
  10. }
  11. ?>
  12.  
Expand|Select|Wrap|Line Numbers
  1. <?php if (!$RemarksValidationOK) { echo "No HTML please!"; } ?>
  2.  
(This is shortened code)
While this works fine for stripping html, it also kicks back apostrophes and quotes. I would like the user to be able to use them, but I'm not completely sure how to do that. I want to maintain security so that people can't put errant code in the input box, but atleast it doesn't kick back on an apostrophe. Quotes would be nice, but not necessary if I'm better off leaving it as is.

I think I have a basic concepts of these commands in php to have gotten it working thus far, but I'm betting I can modify these commands to function better for me:
  • preg_replace - I'm a little flaky on the syntax and how it's used, but understand how it works
  • stripslashes & addslashes - not sure I fully understand this function to properly use it for what I need.


Thanks in advance!

Dan

dlite922's Avatar
Expert
 
Join Date: Dec 2007
Location: Denver, CO
Posts: 1,144
#2: Jul 9 '09

re: strip html but keep ' & ""


There are many other functions in the manual for removing HTML code only from a string. (I can also give you a regexp)

addslashes() is good to sanitize all types of quotations for a database insertion (to prevent SQL injection), but you don't need that otherwise.

If you are putting it in the database, then when you store it in the DB the slash before the double quote are not stored.

If you want to accept just string text from the user, you should.
1. use preg_replace() to remove all HTML tags
2. use mysql_read_escape_string() , if you're using MySQL DBMS, or addslashes otherwise.
3. If you want to display the text again to the user, without having to recall it from the Database, then just put the version after step one into a variable so that you have a non-db-safe copy without any slashes.

Remove HTML: preg_replace("/</?[a-z][a-z0-9]*[^<>]*>/i","",$input);

// above removes all opening and closing (case insensitive) html tags.


Hope that helps,






Dan
Newbie
 
Join Date: Jul 2009
Location: Missouri
Posts: 3
#3: Jul 10 '09

re: strip html but keep ' & ""


I am just emailing the content of the input box, nothing needs to really be stored from these. I tried intgrating your pregreplace and i got an error saying that ? was an unknown variable... I don't really know what to do for troubleshooting though.

You however did finally make the stripslashes and addslashes make sense... so it adds slashes for certain types of data whereas in others you just want raw data. SO thanks!

This is the line that was giving me an error...
Expand|Select|Wrap|Line Numbers
  1. $RemarksSlashed = addslashes(preg_replace("/</?[a-z][a-z0-9]*[^<>]*>/i","",$RemarksPure));
dlite922's Avatar
Expert
 
Join Date: Dec 2007
Location: Denver, CO
Posts: 1,144
#4: Jul 10 '09

re: strip html but keep ' & ""


you probably dont need the addslashes() then.

Check your variable name for misspellings.

DAN
Newbie
 
Join Date: Jul 2009
Location: Missouri
Posts: 3
#5: Jul 10 '09

re: strip html but keep ' & ""


Thanks, I'll give it a shot!
Dormilich's Avatar
Moderator
 
Join Date: Aug 2008
Location: Leipzig, Germany
Posts: 4,295
#6: Jul 10 '09

re: strip html but keep ' & ""


Expand|Select|Wrap|Line Numbers
  1. preg_replace("/</?[a-z][a-z0-9]*[^<>]*>/i","",$input); 
should probably be
Expand|Select|Wrap|Line Numbers
  1. preg_replace("§</?[a-z][a-z0-9]*[^<>]*>§i","",$input); 
or even
Expand|Select|Wrap|Line Numbers
  1. preg_replace("@</?\w*[^<>]*>@i","",$input); 
maybe htmlspecialchars() is also worth looking at
Reply

Tags
apostrophe, html, preg_replace, quotes, strip