473,382 Members | 1,165 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Finding Banned Words On A Page And Not Within Other Words!

I am trying to add a banned words filter onto a web proxy.
I am NOT searching for banned words within other words on a page but searching for banned words within a loaded page.
I am not actually looking for banned words within other words but within the page (meta tags, content).

And so, if I am looking for the word "cock", then the word "cockerel" should not trigger the filter.

I just tested this code and, yes, as expected the code works but as you can guess there is a lot of cpu power cycling through. One moment the page loads, the other moment it goes grey and shows signs that the page is taking too long to load. And all this on localhost. Now, I can imagine what my webhost would do!
So now, we will have to come-up with a better solution. Any ideas ?
How-about we do not get the script to check on the loaded page for all the banned words ? How-about we get the script to halt as soon as 1 banned word is found and an echo has been made which banned word has been found and where on the page ? (meta tags, body content, etc.).
Any code suggestions ?

Here is what I got so far:

Expand|Select|Wrap|Line Numbers
  1.     <?php
  2.  
  3.     /*
  4.     ERROR HANDLING
  5.     */
  6.  
  7.     // 1). $curl is going to be data type curl resource.
  8.     $curl = curl_init();
  9.  
  10.     // 2). Set cURL options.
  11.     curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-
  12.     words-you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
  13.     curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
  14.     curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
  15.  
  16.     // 3). Run cURL (execute http request).
  17.     $result = curl_exec($curl);
  18.     $response = curl_getinfo( $curl );
  19.  
  20.     if( $response['http_code'] == '200' )
  21.         {
  22.             //Set banned words.
  23.             $banned_words = array("Prick","Dick","***");
  24.  
  25.             //Separate each words found on the cURL fetched page.
  26.             $word = explode(" ", $result);
  27.  
  28.            //var_dump($word);
  29.  
  30.            for($i = 0; $i <= count($word); $i++)
  31.            {
  32.                foreach ($banned_words as $ban) 
  33.                {
  34.                   if (strtolower($word[$i]) == strtolower($ban))
  35.                   {
  36.                       echo "word: $word[$i]<br />";
  37.                       echo "Match: $ban<br>";
  38.                }
  39.               else
  40.                {
  41.                      echo "word: $word[$i]<br />";
  42.                      echo "No Match: $ban<br>";  
  43.                 }
  44.              }
  45.           }
  46.        }  
  47.  
  48.     // 4). Close cURL resource.
  49.     curl_close($curl);
  50.  
I am told to do it like this:

**Load the page into a string.
Use preg_match with "word boundaries" on the loaded string and loop through your banned words.**

I did as suggested but see a complete white blank page:

Expand|Select|Wrap|Line Numbers
  1. <?php
  2.  
  3. /*
  4. ERROR HANDLING
  5. */
  6. declare(strict_types=1);
  7. ini_set('display_errors', '1');
  8. ini_set('display_startup_errors', '1');
  9. error_reporting(E_ALL);
  10. mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);
  11.  
  12.  
  13. // 1). Set banned words.
  14. $banned_words = array("Prick","Dick","***");
  15.  
  16. // 2). $curl is going to be data type curl resource.
  17. $curl = curl_init();
  18.  
  19. // 3). Set cURL options.
  20. curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-
  21. words-
  22. you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
  23. curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
  24. curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
  25.  
  26. // 4). Run cURL (execute http request).
  27. $result = curl_exec($curl);
  28. $response = curl_getinfo( $curl );
  29.  
  30. if($response['http_code'] == '200' )
  31.      {
  32.           $regex = '/\b'; // The beginning of the regex string syntax
  33.           $regex .= implode('\b|\b', $banned_words); // joins all the banned words to the string with correct regex syntax
  34.           $regex .= '\b/i'; // Adds ending to regex syntax. Final i makes it case insensitive
  35.           $substitute = '****';
  36.           $cleanresult = preg_replace($regex, $substitute, $result);
  37.           echo $cleanresult;
  38.      }
  39.  
  40.   curl_close($curl);
  41.  
  42.   ?>
  43.  
Oct 4 '17 #1
0 1422

Sign in to post your reply or Sign up for a free account.

Similar topics

6
by: Christoph Pingel | last post by:
Hi all, an interesting problem for regex nerds. I've got a thesaurus of some hundred words and a moderately large dataset of about 1 million words in some thousand small texts. Words from the...
1
by: Ken Fine | last post by:
I need a client-side check against a static list of "banned words" before a form is submitted. Can someone point me to a script that does this?
11
by: tmshaw | last post by:
I'm a newb in a c++ class... I need to read the standard input into a couple strings. Then determine whether or not the same word is used IN SUCCESSION (ex. this cat cat is really mean.). ...
3
by: Rob | last post by:
You can find datagrid in page by refering the form. Gatagrid is a child control of Form. Here is the code ----------------- Dim ctl As New Control For Each ctl In...
6
by: John Sedlak | last post by:
Hello, in one of my pages I have an include: <!--#include file="console.aspx" --> Now...Inside console.aspx, I want to be able to find the name of the page that included it. This is because I...
4
by: zdrakec | last post by:
Hello all: I have a detail page from which the user clicks a hyperlink to get a list page. On the list page, I have included a hyperlink whose NavigateURL property is set, at run time, to be the...
7
by: Timo Haberkern | last post by:
Hi there, i have some troubles with my TSearch2 Installation. I have done this installation as described in http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words...
3
by: bagelman | last post by:
Hi, I want to find repeating words in a long string with Regular Expressions. I tried to write a regular expression but it didn't work. "\b(?<word>\w+)\s+(\k<word>)\b" This RegEx finds...
2
by: dechen | last post by:
How do I compare words of sentences with a dictionary and get it's corresponding value. e.g, This is the dictionary format. WORD Pronunciation try t-r-y-0| ...
0
by: bits2017 | last post by:
Php Lovers! I am NOT searching for banned words within other words on a page but searching for banned words within a loaded page. I am not actually looking for banned words within other words but...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.