473,289 Members | 1,929 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,289 software developers and data experts.

Removing Bad Words

Looking for suggestions on how to handle bad words that might
get passed in through $_GET['item'] variables.

My first thoughts included using str_replace() to strip out such
content, but then one ends up looking for characters that wrap
around the stripped characters and it ends up as a recursive
ordeal that fails to identify a poorly constructed $_GET['item']
variable (when someone hand-types the item into the line and
makes a simple typing error).

So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?

Thanks,

Jim Carlock

Feb 22 '06 #1
7 3150
Jim Carlock wrote:
Any suggestions on how to handle this?


You will have to implement "fuzzy logics" which wil be able to filter not
only "badword" but also "b a d w o r d", "b@d word", "b*dword", etcetera.

Although you should be able to catch some of those, the best filter is still
the human moderator...
JW
Feb 22 '06 #2
On Wed, 22 Feb 2006 19:36:41 GMT, "Jim Carlock" <an*******@127.0.0.1>
wrote:
Looking for suggestions on how to handle bad words that might
get passed in through $_GET['item'] variables.

My first thoughts included using str_replace() to strip out such
content, but then one ends up looking for characters that wrap
around the stripped characters and it ends up as a recursive
ordeal that fails to identify a poorly constructed $_GET['item']
variable (when someone hand-types the item into the line and
makes a simple typing error).

So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?


Automatic removal is just about impossible to do reliably. (People
living in places such as Sussex and Scunthorpe have complained that
their addresses get rejected by some sites.) If at all possible use a
matching routine to detect doubtful entries and place them on one side
for subsequent manual review.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Feb 22 '06 #3
Jim Carlock wrote:
Looking for suggestions on how to handle bad words that might
get passed in through $_GET['item'] variables.

My first thoughts included using str_replace() to strip out such
content, but then one ends up looking for characters that wrap
around the stripped characters and it ends up as a recursive
ordeal that fails to identify a poorly constructed $_GET['item']
variable (when someone hand-types the item into the line and
makes a simple typing error).

So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?

Thanks,

Jim Carlock


Jim, Not knowing your requirments or what the website will be used for makes it
a little difficult to give you a solution. Would a drop-down list of acceptable
words be better than expecting the user to type them correctly?

That being said, if you type as badly as I do, you have probably made all of teh
tpying errors most commonly seen. Including a str_replace() for all of those
examples would not be that difficult - better yet include it into a javascript
and let the client-side handle the word-corrections (onclick or onsubmit).

I have worked with several products (OS and database) that will auto-correct
some commands like: eixt = EXIT or comit=COMMIT etc... Digital TOPS10/20 OS
that ran on the KL10/20 systems (36bit - circa mid 70's early 80's) would prompt
you for a yes/no to:
did you mean [whatever the correct spelling of the command is] Pretty cool for
it's day...

--
Michael Austin.
DBA Consultant
Donations welcomed. Http://www.firstdbasource.com/donations.html
:)
Feb 22 '06 #4
Jim Carlock wrote:
So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?
"Michael Austin" replied: Jim, Not knowing your requirments or what the website will be
used for makes it a little difficult to give you a solution. Would
a drop-down list of acceptable words be better than expecting
the user to type them correctly?
Well a drop down list will go into the making for some things, but
anyone can edit the line of text in the address-bar. And so instead
of filtering for bad words, I'm looking for suggestions on how to
parse through a list of good words (stored inside an array) and if
any of the words in the address bar fail to match the words in the
any of the words in the array, the individual gets routed to a
bad-word page (the website homepage). I see a database as a
very useful option but I'm working with PHP arrays at the
moment. The database will be the future, but for the moment, I
think an array of 200 possible words might work very well.

Just need an effective way to compare a word to a list of words
inside an array and return true if it matches, false if it fails the
match.

My thoughts include:

function IsValidWord($sCheckThis) {
global $aWords;
foreach($aWords as $sWord) {
if ($sWord === $sCheckThis) {
return(TRUE);
}
}
return(FALSE);
}

So I'm looking for any other suggestions.
That being said, if you type as badly as I do, you have probably
made all of teh tpying errors most commonly seen. Including a
str_replace() for all of those examples would not be that difficult
- better yet include it into a javascript and let the client-side
handle the word-corrections (onclick or onsubmit).


The list of words is to remain on the server, so JavaScript in this
case, seems to be an invalid option. Any mistyped words are to
route the client to the homepage, or perhaps present the page in
question with no selections selected. Either/or seems appropriate
in this case.

<snip>...</snip>

Jim Carlock
Post replies to the group.
Feb 22 '06 #5
The function you need is in_array() although an associative array would
be more efficient. E.g.

$good_hash = array(
'good' => true,
'better' => true,
'best' => true,
...
);

if(!array_key_exists(strtolower($word), $good_word)) {
...
}

Feb 23 '06 #6
On 23 Feb 2006 00:29:48 GMT,
"Chung Leong" <ch***********@hotmail.com> posted:
The function you need is in_array() although an associative array
would be more efficient. E.g.


$good_hash = array(
'good' => true,
'better' => true,
'best' => true,
...
);

if(!array_key_exists(strtolower($word), $good_word)) {
...
}

Thanks, Chung. It seems like it's best to store everything inside the
array as lowercase and then fill in some appropriate variables for.

I initially started out with mixed-case arrays. For example:

// array of states
function Create_USA_States_Array() {
$aStates = array(
// http://www.usps.com/ncsc/lookups/usp...eviations.html
array("Alabama", "AL"),
array("Alaska", "AK"),
array("Arizona", "AZ"),
array("Arkansas", "AR"),
array("California", "CA"),
array("Colorado", "CO"),
array("Connecticut", "CT"),
array("Deleware", "DE"),
array("Florida", "FL"),
array("Georgia", "GA"),
array("Hawaii", "HI"),
array("Idaho", "ID"),
array("Illinois", "IL"),
array("Indiana", "IN"),
array("Iowa", "IA"),
array("Kansas", "KS"),
array("Kentucky", "KY"),
array("Louisiana", "LA"),
array("Maine", "ME"),
array("Maryland", "MD"),
array("Massachusetts", "MA"),
array("Michigan", "MI"),
array("Minnesota", "MN"),
array("Mississippi", "MS"),
array("Missouri", "MO"),
array("Montana", "MT"),
array("Nebraska", "NE"),
array("Nevada", "NV"),
array("New Hampshire", "NH"),
array("New Jersey", "NJ"),
array("New Mexico", "NM"),
array("New York", "NY"),
array("North Carolina", "NC"),
array("North Dakota", "ND"),
array("Ohio", "OH"),
array("Oklahoma", "OK"),
array("Oregon", "OR"),
array("Pennsylvania", "PA"),
array("Rhode Island", "RI"),
array("South Carolina", "SC"),
array("South Dakota", "SD"),
array("Tennessee", "TN"),
array("Texas", "TX"),
array("Utah", "UT"),
array("Vermont", "VT"),
array("Virginia", "VA"),
array("Washington", "WA"),
array("Washington, D.C.", "DC"),
array("West Virginia", "WV"),
array("Wisconsin", "WI"),
array("Wyoming", "WY"));
return($aStates);
}

The function established to return a state name works as follows:

// this function is incomplete
// PURPOSE: RETURN statename from parameter passed in
// INPUT: City-State String, OPTIONAL default string
// RETURNS: empty string if invalid parameter requested
// $sDS represents default state name to return
// $sCS = $_GET['citystate'];
// "Charlotte NC" or "Charlotte North Carolina" or "Charlotte" or
// "usertyped garbage"
function GetStateNameFromCityState($sCS, $sDS = "") {
$sStateAbbr = trim($sCS);
$iLen = strlen($sStateAbbr);
// first check to see if empty string
if (strlen($iLen < 2)) { return($sDS); }
if (GetStateFromAbbr($sStateAbbr)) {
// a valid abbreviation was passed in
return(GetStateFromAbbr($sStateAbbr));
}
$aStates = Create_USA_States_Array();
// possible state name in parameter so check for a state name,
// before checking against abbreviations
foreach ($aStates as $aState) {
// state name: $aState[0]
if (stristr($sStateAbbr, $aState[0]) != FALSE) {
// return state name
return($aState[0]);
}
}
// no valid statename found, so start abbreviation checks
// first determine if there's an abbreviation present
// explode(separator, string to separate)
$aWords = explode(" ", $sStateAbbr);
$yAbbrFound = FALSE;
// check for abbreviations
foreach ($aWords as $sWord) {
if (strlen($sWord) == 2) {
// assume a 2-letter word represents a state abbreviation
$sStateAbbr = $sWord;
$yAbbrFound = TRUE;
break;
}
}
if ($yAbbrFound) {
} else {
// no abbreviation to check, so return empty string
return($sDS);
}
// now validate abbreviation found
// COULD this fail? NEEDS MORE TESTING.
foreach ($aStates as $aState) {
// now check against abbreviations
if (stristr($sStateAbbr, $aState[1]) != FALSE) {
// return state name in proper formatting
return($aState[1]);
}
}
// return empty string when it all fails (default state)
return($sDS);
}

Haven't fully tested the user-typed garbage being passed in, but
my question specifically involves configuring the state array, and
alternative suggestions for this.

Note, that the above function actually returns what's found inside
the predefined array, rather than what's found in the address-bar.
This in effect, should get me words proper for HTML presentation,
where I don't have to mess with capitalizing ALL state abbrev's,
or capitalizing the first word of anything.

I still need to test the code above some more, so if anyone happens
to catch a flaw please point it out.

And again back to the question in the topic... "Lowercase Versus
Mixed-case" words inside the array that holds the states and state
abbreviations. Anyone here that knows of a better way to do this?
Another array might get created, as the list of targeted cities is over
100 right at the moment. To possibly identify each city to a proper
state.

I plan on getting something going whereby a new array appears as
follows:

"city name", iStateNumber

"state number" represents an integer 0 to 50 (51 states).
Duplicate "city name"'s could exist, so the database, combines
the "city name" and the "state number" into an index. The "state
number" ends up being a pointer to the StateID in the State
database. So continuing along the lines of the indexed arrays,
as presented by Chung Leong, how would I go about indexing
such an array as above and would indexing be appropriate for
such?

Thanks, Chung Leong. I did put the indexed array into play in
another function where the number of items is greater. I didn't
know how to work it into this particular array (or an array with
multiple fields with duplicate records).

Jim Carlock
Post replies to the group.
Feb 24 '06 #7
Jim Carlock wrote:
And again back to the question in the topic... "Lowercase Versus
Mixed-case" words inside the array that holds the states and state
abbreviations. Anyone here that knows of a better way to do this?
Another array might get created, as the list of targeted cities is over
100 right at the moment. To possibly identify each city to a proper
state.


Just have the static array be in mixed case, then generate the other
one(s) programmatically:

$states = array(
"AL" => "Alabama",
...
"WY" => "Wyoming"
);

$state_hash = array_flip(array_map('strtolower', $states));

Feb 25 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Asad Khan | last post by:
I have an HTML file that has a call to a Javascript function in it as follows: <!-- bunch of stuff --> <script type="text/javascript">doXMLFromString()</script> <!-- bunch of stuff --> Now I...
2
by: Nathan Sokalski | last post by:
I have a section in my ASP.NET code where I have an HTML unordered list. Visual Studio keeps removing the closing list item tags, except for the last list item. In other words, Visual Studio makes...
5
by: nuffnough | last post by:
This is python 2.4.3 on WinXP under PythonWin. I have a config file with many blank lines and many other lines that I don't need. read the file in, splitlines to make a list, then run a loop...
6
by: Niyazi | last post by:
Hi all, What is fastest way removing duplicated value from string array using vb.net? Here is what currently I am doing but the the array contains over 16000 items. And it just do it in 10 or...
3
by: Keith Patrick | last post by:
I'm doing some document merging where I want to bring in an XmlDocument and import its document element into another document deeper in its tree. However, when serializing my underlying objects,...
4
by: JJ | last post by:
Is there a way of checking that a line with escape sequences in it, has no strings in it (apart from the escape sequences)? i.e. a line with \n\t\t\t\t\t\t\t\r\n would have no string in it a...
6
by: Olagato | last post by:
I need to transform this: <urlset xmlns="http://www.google.com/schemas/sitemap/0.84"> <url> <loc>http://localhost/index.php/index./Paths-for-the-extreme-player</ loc> </url> <url>...
11
by: George Sakkis | last post by:
I have a situation where one class can be customized with several orthogonal options. Currently this is implemented with (multiple) inheritance but this leads to combinatorial explosion of...
2
code green
by: code green | last post by:
I am trying to write a simple function that will take a string containing an address line or business name and return it nicely formatted. By this I mean extra spaces removed and words capitalised....
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.