473,722 Members | 2,295 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Removing Bad Words

Looking for suggestions on how to handle bad words that might
get passed in through $_GET['item'] variables.

My first thoughts included using str_replace() to strip out such
content, but then one ends up looking for characters that wrap
around the stripped characters and it ends up as a recursive
ordeal that fails to identify a poorly constructed $_GET['item']
variable (when someone hand-types the item into the line and
makes a simple typing error).

So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?

Thanks,

Jim Carlock

Feb 22 '06 #1
7 3196
Jim Carlock wrote:
Any suggestions on how to handle this?


You will have to implement "fuzzy logics" which wil be able to filter not
only "badword" but also "b a d w o r d", "b@d word", "b*dword", etcetera.

Although you should be able to catch some of those, the best filter is still
the human moderator...
JW
Feb 22 '06 #2
On Wed, 22 Feb 2006 19:36:41 GMT, "Jim Carlock" <an*******@127. 0.0.1>
wrote:
Looking for suggestions on how to handle bad words that might
get passed in through $_GET['item'] variables.

My first thoughts included using str_replace() to strip out such
content, but then one ends up looking for characters that wrap
around the stripped characters and it ends up as a recursive
ordeal that fails to identify a poorly constructed $_GET['item']
variable (when someone hand-types the item into the line and
makes a simple typing error).

So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?


Automatic removal is just about impossible to do reliably. (People
living in places such as Sussex and Scunthorpe have complained that
their addresses get rejected by some sites.) If at all possible use a
matching routine to detect doubtful entries and place them on one side
for subsequent manual review.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Feb 22 '06 #3
Jim Carlock wrote:
Looking for suggestions on how to handle bad words that might
get passed in through $_GET['item'] variables.

My first thoughts included using str_replace() to strip out such
content, but then one ends up looking for characters that wrap
around the stripped characters and it ends up as a recursive
ordeal that fails to identify a poorly constructed $_GET['item']
variable (when someone hand-types the item into the line and
makes a simple typing error).

So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?

Thanks,

Jim Carlock


Jim, Not knowing your requirments or what the website will be used for makes it
a little difficult to give you a solution. Would a drop-down list of acceptable
words be better than expecting the user to type them correctly?

That being said, if you type as badly as I do, you have probably made all of teh
tpying errors most commonly seen. Including a str_replace() for all of those
examples would not be that difficult - better yet include it into a javascript
and let the client-side handle the word-corrections (onclick or onsubmit).

I have worked with several products (OS and database) that will auto-correct
some commands like: eixt = EXIT or comit=COMMIT etc... Digital TOPS10/20 OS
that ran on the KL10/20 systems (36bit - circa mid 70's early 80's) would prompt
you for a yes/no to:
did you mean [whatever the correct spelling of the command is] Pretty cool for
it's day...

--
Michael Austin.
DBA Consultant
Donations welcomed. Http://www.firstdbasource.com/donations.html
:)
Feb 22 '06 #4
Jim Carlock wrote:
So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?
"Michael Austin" replied: Jim, Not knowing your requirments or what the website will be
used for makes it a little difficult to give you a solution. Would
a drop-down list of acceptable words be better than expecting
the user to type them correctly?
Well a drop down list will go into the making for some things, but
anyone can edit the line of text in the address-bar. And so instead
of filtering for bad words, I'm looking for suggestions on how to
parse through a list of good words (stored inside an array) and if
any of the words in the address bar fail to match the words in the
any of the words in the array, the individual gets routed to a
bad-word page (the website homepage). I see a database as a
very useful option but I'm working with PHP arrays at the
moment. The database will be the future, but for the moment, I
think an array of 200 possible words might work very well.

Just need an effective way to compare a word to a list of words
inside an array and return true if it matches, false if it fails the
match.

My thoughts include:

function IsValidWord($sC heckThis) {
global $aWords;
foreach($aWords as $sWord) {
if ($sWord === $sCheckThis) {
return(TRUE);
}
}
return(FALSE);
}

So I'm looking for any other suggestions.
That being said, if you type as badly as I do, you have probably
made all of teh tpying errors most commonly seen. Including a
str_replace() for all of those examples would not be that difficult
- better yet include it into a javascript and let the client-side
handle the word-corrections (onclick or onsubmit).


The list of words is to remain on the server, so JavaScript in this
case, seems to be an invalid option. Any mistyped words are to
route the client to the homepage, or perhaps present the page in
question with no selections selected. Either/or seems appropriate
in this case.

<snip>...</snip>

Jim Carlock
Post replies to the group.
Feb 22 '06 #5
The function you need is in_array() although an associative array would
be more efficient. E.g.

$good_hash = array(
'good' => true,
'better' => true,
'best' => true,
...
);

if(!array_key_e xists(strtolowe r($word), $good_word)) {
...
}

Feb 23 '06 #6
On 23 Feb 2006 00:29:48 GMT,
"Chung Leong" <ch***********@ hotmail.com> posted:
The function you need is in_array() although an associative array
would be more efficient. E.g.


$good_hash = array(
'good' => true,
'better' => true,
'best' => true,
...
);

if(!array_key_e xists(strtolowe r($word), $good_word)) {
...
}

Thanks, Chung. It seems like it's best to store everything inside the
array as lowercase and then fill in some appropriate variables for.

I initially started out with mixed-case arrays. For example:

// array of states
function Create_USA_Stat es_Array() {
$aStates = array(
// http://www.usps.com/ncsc/lookups/usp...eviations.html
array("Alabama" , "AL"),
array("Alaska", "AK"),
array("Arizona" , "AZ"),
array("Arkansas ", "AR"),
array("Californ ia", "CA"),
array("Colorado ", "CO"),
array("Connecti cut", "CT"),
array("Deleware ", "DE"),
array("Florida" , "FL"),
array("Georgia" , "GA"),
array("Hawaii", "HI"),
array("Idaho", "ID"),
array("Illinois ", "IL"),
array("Indiana" , "IN"),
array("Iowa", "IA"),
array("Kansas", "KS"),
array("Kentucky ", "KY"),
array("Louisian a", "LA"),
array("Maine", "ME"),
array("Maryland ", "MD"),
array("Massachu setts", "MA"),
array("Michigan ", "MI"),
array("Minnesot a", "MN"),
array("Mississi ppi", "MS"),
array("Missouri ", "MO"),
array("Montana" , "MT"),
array("Nebraska ", "NE"),
array("Nevada", "NV"),
array("New Hampshire", "NH"),
array("New Jersey", "NJ"),
array("New Mexico", "NM"),
array("New York", "NY"),
array("North Carolina", "NC"),
array("North Dakota", "ND"),
array("Ohio", "OH"),
array("Oklahoma ", "OK"),
array("Oregon", "OR"),
array("Pennsylv ania", "PA"),
array("Rhode Island", "RI"),
array("South Carolina", "SC"),
array("South Dakota", "SD"),
array("Tennesse e", "TN"),
array("Texas", "TX"),
array("Utah", "UT"),
array("Vermont" , "VT"),
array("Virginia ", "VA"),
array("Washingt on", "WA"),
array("Washingt on, D.C.", "DC"),
array("West Virginia", "WV"),
array("Wisconsi n", "WI"),
array("Wyoming" , "WY"));
return($aStates );
}

The function established to return a state name works as follows:

// this function is incomplete
// PURPOSE: RETURN statename from parameter passed in
// INPUT: City-State String, OPTIONAL default string
// RETURNS: empty string if invalid parameter requested
// $sDS represents default state name to return
// $sCS = $_GET['citystate'];
// "Charlotte NC" or "Charlotte North Carolina" or "Charlotte" or
// "usertyped garbage"
function GetStateNameFro mCityState($sCS , $sDS = "") {
$sStateAbbr = trim($sCS);
$iLen = strlen($sStateA bbr);
// first check to see if empty string
if (strlen($iLen < 2)) { return($sDS); }
if (GetStateFromAb br($sStateAbbr) ) {
// a valid abbreviation was passed in
return(GetState FromAbbr($sStat eAbbr));
}
$aStates = Create_USA_Stat es_Array();
// possible state name in parameter so check for a state name,
// before checking against abbreviations
foreach ($aStates as $aState) {
// state name: $aState[0]
if (stristr($sStat eAbbr, $aState[0]) != FALSE) {
// return state name
return($aState[0]);
}
}
// no valid statename found, so start abbreviation checks
// first determine if there's an abbreviation present
// explode(separat or, string to separate)
$aWords = explode(" ", $sStateAbbr);
$yAbbrFound = FALSE;
// check for abbreviations
foreach ($aWords as $sWord) {
if (strlen($sWord) == 2) {
// assume a 2-letter word represents a state abbreviation
$sStateAbbr = $sWord;
$yAbbrFound = TRUE;
break;
}
}
if ($yAbbrFound) {
} else {
// no abbreviation to check, so return empty string
return($sDS);
}
// now validate abbreviation found
// COULD this fail? NEEDS MORE TESTING.
foreach ($aStates as $aState) {
// now check against abbreviations
if (stristr($sStat eAbbr, $aState[1]) != FALSE) {
// return state name in proper formatting
return($aState[1]);
}
}
// return empty string when it all fails (default state)
return($sDS);
}

Haven't fully tested the user-typed garbage being passed in, but
my question specifically involves configuring the state array, and
alternative suggestions for this.

Note, that the above function actually returns what's found inside
the predefined array, rather than what's found in the address-bar.
This in effect, should get me words proper for HTML presentation,
where I don't have to mess with capitalizing ALL state abbrev's,
or capitalizing the first word of anything.

I still need to test the code above some more, so if anyone happens
to catch a flaw please point it out.

And again back to the question in the topic... "Lowercase Versus
Mixed-case" words inside the array that holds the states and state
abbreviations. Anyone here that knows of a better way to do this?
Another array might get created, as the list of targeted cities is over
100 right at the moment. To possibly identify each city to a proper
state.

I plan on getting something going whereby a new array appears as
follows:

"city name", iStateNumber

"state number" represents an integer 0 to 50 (51 states).
Duplicate "city name"'s could exist, so the database, combines
the "city name" and the "state number" into an index. The "state
number" ends up being a pointer to the StateID in the State
database. So continuing along the lines of the indexed arrays,
as presented by Chung Leong, how would I go about indexing
such an array as above and would indexing be appropriate for
such?

Thanks, Chung Leong. I did put the indexed array into play in
another function where the number of items is greater. I didn't
know how to work it into this particular array (or an array with
multiple fields with duplicate records).

Jim Carlock
Post replies to the group.
Feb 24 '06 #7
Jim Carlock wrote:
And again back to the question in the topic... "Lowercase Versus
Mixed-case" words inside the array that holds the states and state
abbreviations. Anyone here that knows of a better way to do this?
Another array might get created, as the list of targeted cities is over
100 right at the moment. To possibly identify each city to a proper
state.


Just have the static array be in mixed case, then generate the other
one(s) programmaticall y:

$states = array(
"AL" => "Alabama",
...
"WY" => "Wyoming"
);

$state_hash = array_flip(arra y_map('strtolow er', $states));

Feb 25 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2509
by: Asad Khan | last post by:
I have an HTML file that has a call to a Javascript function in it as follows: <!-- bunch of stuff --> <script type="text/javascript">doXMLFromString()</script> <!-- bunch of stuff --> Now I make a copy of this HTML file by creating a new window and writing var body = document.body.innerHTML;
2
1938
by: Nathan Sokalski | last post by:
I have a section in my ASP.NET code where I have an HTML unordered list. Visual Studio keeps removing the closing list item tags, except for the last list item. In other words, Visual Studio makes my code look like the following: <ul> <li>adasf <li>asdfsa <li>asdfd <li>adfsdf</li>
5
2013
by: nuffnough | last post by:
This is python 2.4.3 on WinXP under PythonWin. I have a config file with many blank lines and many other lines that I don't need. read the file in, splitlines to make a list, then run a loop that looks like this:
6
6110
by: Niyazi | last post by:
Hi all, What is fastest way removing duplicated value from string array using vb.net? Here is what currently I am doing but the the array contains over 16000 items. And it just do it in 10 or more minutes. 'REMOVE DUBLICATED VALUE FROM ARRAY +++++++++++++++++ Dim col As New Scripting.Dictionary Dim ii As Integer = 0
3
7270
by: Keith Patrick | last post by:
I'm doing some document merging where I want to bring in an XmlDocument and import its document element into another document deeper in its tree. However, when serializing my underlying objects, .Net likes to add these namespaces: <RootNode xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <ChildNode xmlns="MyObjectHierarchyNamespace/> </RootNode> The problem this is causing me is that...
4
4277
by: JJ | last post by:
Is there a way of checking that a line with escape sequences in it, has no strings in it (apart from the escape sequences)? i.e. a line with \n\t\t\t\t\t\t\t\r\n would have no string in it a line with \n\t\t\t\thello\t\t\n would hve the string 'hello' in it. In others words, is there a method of removing all escape sequences from a string? I've tried Regex.Unescape(string) but this doesn't not seem to remove the
6
2258
by: Olagato | last post by:
I need to transform this: <urlset xmlns="http://www.google.com/schemas/sitemap/0.84"> <url> <loc>http://localhost/index.php/index./Paths-for-the-extreme-player</ loc> </url> <url> <loc>http://localhost/index.php/index.php/Games/The-edge-of-the- wall</loc>
11
1713
by: George Sakkis | last post by:
I have a situation where one class can be customized with several orthogonal options. Currently this is implemented with (multiple) inheritance but this leads to combinatorial explosion of subclasses as more orthogonal features are added. Naturally, the decorator pattern comes to mind (not to be confused with the the Python meaning of the term "decorator"). However, there is a twist. In the standard decorator pattern, the decorator...
2
10459
code green
by: code green | last post by:
I am trying to write a simple function that will take a string containing an address line or business name and return it nicely formatted. By this I mean extra spaces removed and words capitalised. I also wish it to be legal XML.. The result string is further sent to another function to check the names and addresses don't appear on a blacklist which is why the extra spaces need removing. This is my test function so far function...
0
8739
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9238
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9157
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9088
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8052
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6681
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5995
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3207
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2147
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.