473,387 Members | 1,541 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Preg_replace whole word only

162 100+
Im trying to make a naughty word filter. It removes bad words fine, but instances where there is a bad word found in the text like "assist" and "asses" get caught in the filter as well. Strangely though if the sentence is: My asses to assist me." the clean version will read: My asses to ***ist me." It seems to clear the first use of the word in another word, but then blocks the rest. Any ideas? My script is below. Thanks.

Expand|Select|Wrap|Line Numbers
  1.  
  2. function cleanWords($value) {
  3.  
  4.     /*   strip naughty words   */
  5.     $bad_word_file = 'standards/badwords.txt';
  6.     $strtofile = fopen($bad_word_file, "r");
  7.     $badwords = explode("\n", fread($strtofile, filesize($bad_word_file)));
  8.     fclose($strtofile);
  9.  
  10.     for ($i = 0; $i < count($badwords); $i++) {
  11.         $wordlist .= str_replace(chr(13),'',$badwords[$i]).'|';
  12.     }
  13.     $wordlist = substr($wordlist,0,-1);
  14.  
  15.     $value = preg_replace("/\b($wordlist)\b/ie", 'preg_replace("/./","*","\\1")', $value);    
  16.     return $value;
  17.  
  18. }
  19.  
  20.  
Mar 12 '10 #1
6 5891
Atli
5,058 Expert 4TB
Hey.

If you print the $wordlist, does it look right?
I tested this by just creating the $wordlist manually and it seemed to work fine.
Mar 12 '10 #2
empiresolutions
162 100+
yes $wordlist is correct. If it helps the wordlist is just over a 1000 words.
Mar 12 '10 #3
Use the space character with or conditions.

(\s|^)(badword1|badword2)(\s|$)

That checks for either a space before the word or if it is at the start of the screen. Then checks for either a space or the end of the line.
Mar 12 '10 #4
empiresolutions
162 100+
i ended up finding that the word "a.s.s." was in my list. I think the dots were messing up the expression. For thos interested, this is my new code. Thanks for any suggestions to get it where it is.

Expand|Select|Wrap|Line Numbers
  1. $_SESSION[wordlist] = join("|", array_map('trim', file('standards/badwords.txt')));
  2.  
  3. function cleanWords($value) {
  4.  
  5.     global $_SESSION;
  6.  
  7.     $value = preg_replace("/\b($_SESSION[wordlist])\b/ie", 'str_repeat("*", strlen("\\1")) ', $value);    
  8.     return $value;
  9.  
  10. }
  11.  
Mar 13 '10 #5
Atli
5,058 Expert 4TB
Hey.
Glad you got it working.

However, I would consider using a different method. - Putting the whole thing into the session is very inefficient. The list remains constant for every user, and rarely changes (if ever) right? - If so, then compiling it for every user like that and storing it in separate sessions for each one is just doing two things: eating up resources and cluttering the sessions with duplicate data.

You would be far better of compiling the regular expression into a common file, shared between all users. - This is how I would do this. (Wouldn't usually make a ready-to-use code example, but since you already solved this on your own...)
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. define("BADWORDS_RAW_FILE", "/path/to/badwords.txt");
  3. define("BADWORDS_EXP_FILE", "/path/to/badwords_expression.txt");
  4.  
  5. /**
  6.  * Returns a regular expression that can be used to check
  7.  * for "bad" words. Returns an expression in the format:
  8.  *  - /\b(list|of|bad|words)\b/i
  9.  */
  10. function getBadWordsRegexp()  
  11. {
  12.     $regexp = "";
  13.  
  14.     // Try to fetch an existing expression.
  15.     if(!file_exists(BADWORDS_EXP_FILE) || 
  16.        filesize(BADWORDS_EXP_FILE) <= 0 ||
  17.        ($regexp = file_get_contents(BADWORDS_EXP_FILE)) === false)
  18.     {
  19.         // Make sure the raw word list exists
  20.         if(!file_exists(BADWORDS_RAW_FILE)) {
  21.             trigger_error("The bad words file does not exists.", E_USER_ERROR);
  22.             return false;
  23.         }
  24.  
  25.         // Compile the regular expression
  26.         $regexp = '/\b(' . join("|", array_map('trim', file(BADWORDS_RAW_FILE))) . ')\b/i';
  27.  
  28.         // Try to save it
  29.         if(!is_writeable(BADWORDS_EXP_FILE) ||
  30.            !file_put_contents(BADWORDS_EXP_FILE, $regexp)) 
  31.         {
  32.             trigger_error("Could not save badwords expression. Check file permissions.", E_USER_WARNING);
  33.         }
  34.     }
  35.  
  36.     // Return it
  37.     return $regexp;
  38. }
  39. ?>
Then you could use it like:
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. function cleanWords($value) {
  3.     $regexp = getBadWordsRegexp();
  4.     return preg_replace($regexp . 'e', 'str_repeat("*", strlen("\\1")) ', $value);
  5. }
  6. ?>
P.S.
I have a couple of notes on your code, though.
  • You don't need to import $_SESSION into functions using the global keyword. $_SESSION is a "super-global", which makes it available to you wherever you are in your code.
  • All strings need to be quoted. That includes array keys. Which means that:
    Expand|Select|Wrap|Line Numbers
    1. // This
    2. $_SESSION[wordlist];
    3.  
    4. // Should be
    5. $_SESSION['wordlist'];
    If you leave it out, PHP assumes it is a constant. Failing to find a constant, it prints a warning and uses it as a string (which is why it works, even thought it is technically an error.) - For future-compatibility and performance reasons (minor as they may be), it is best to just remember the strings.
Mar 14 '10 #6
empiresolutions
162 100+
thanks Atli! your suggestions are much appreciated.
Mar 14 '10 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Juha Suni | last post by:
Hi! I have managed to live without using too much regular expressions so far, and now that I need one, I need some help too. I have a string containing a (possibly large) block of html. I need...
3
by: Sebastian Araya | last post by:
Hello, I have a string like this: var1: value1...valueI var2: value1...valueJ ... varN: value1...valueK this is an example: breakfast: coffee eggs lunch: sandwich apple dinner: chicken...
4
by: Alexander Ross | last post by:
I dont think I'll ever understand regular expressions ... I need to do th efollowing and I just don't know where to start: $haystack = "How much wood would a wood chuck chuck if a woodchuck could...
3
by: TXSherry | last post by:
Hi, I cannot seem to wrap my brain around preg_replace. Though I've read the help file backwords and forwards. :/ Hoping someone can give me a solution here. Problem: Given string 'str'...
7
by: Margaret MacDonald | last post by:
I've been going mad trying to figure out how to do this--it should be easy! Allow the user to enter '\_sometext\_', i.e., literal backslash, underscore, some text, literal backslash, underscore...
2
by: Afkamm | last post by:
Hi, :) The preg_replace function... preg_replace(pattern, replacement, subject ) How on earth do you get the limit value to work with arrays? In my code both the pattern and replacement...
3
by: Charles | last post by:
I'm new to this regular expression stuff. I'd like to use preg_replace to eliminate a known multi-line signature from the body of an E-mail. Say the body text is in $body, and the sig is this ...
1
by: correo | last post by:
Hi all! This: $string = preg_replace('//i', '_', $string); replaces an accented letter with two underscores instead of one, when the submitting page is in UTF8 ($string comes from a GET...
4
by: shonend | last post by:
I am trying to extract the pattern like this : "SUB: some text LOT: one-word" Described, "SUB" and "LOT" are key words; I want those words, everything in between and one word following the...
7
by: monomaniac21 | last post by:
hi all using preg_replace how can i replace the letter i in a string with nothing (delete it) when it is the last letter or it is followed by an i? i have products that are listed in a db...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.