473,624 Members | 2,245 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

exclude keywords from RSS search

I have a script which parses the BBC RSS feed once an hour, and drops
any stories with certain keywords into a database. I would like to ban
certain strings; at the moment the script will pick up the word "train"
but also "training"; this is giving me a lot of false positives. Can
anyone assist? I tried a couple of things but nothing that works.

Script is as follows:

<?php

$keywords = array("keyword1 ", "keyword2", "keyword3") ;
$bannedwords = array("bannedwo rd1", "bannedword 2", "bannedword 3");
$feedsource =
"http://news.bbc.co.uk/rss/newsonline_uk_e dition/uk/rss091.xml";

$db = mysql_connect(" localhost","use rname","passwor d") or
die(mysql_error ());
mysql_select_db ("database") ;

$insideitem = FALSE;
$tag = "";
$title = "";
$description = "";
$textdump = "";
$link = "";
$itemcount = FALSE;
$body1 = "";

function startElement($p arser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = TRUE;
}
}

function endElement($par ser, $name) {
global $insideitem, $tag, $title, $description, $link, $keywords;
$numkeywords = count($keywords );
$duplicate = FALSE;

if ($name == "ITEM") {
for($counter1=0 ; $counter1 < $numkeywords; $counter1++) {
if(stristr($tit le, $keywords[$counter1]) || strstr($descrip tion,
$keywords[$counter1])) {
$sql = "select * from tblRSSfeed";
$result = mysql_query($sq l) or die(mysql_error ());
while ($row = mysql_fetch_arr ay($result)) {
if($row[txtLink] == trim($link)) {
$duplicate = TRUE;
}
}
if($duplicate == FALSE) {
$itemcount = TRUE;
$datetime = date("Y-m-d H:i:s");
$title = trim(str_replac e("'", "\'", $title));
$description = trim(str_replac e("'", "\'", $description));
$link = trim($link);
$sql = "INSERT INTO tblRSSfeed VALUES(NULL, '$title',
'$description', '$link', '$datetime')";
mysql_query($sq l) or die(mysql_error ());
} else {
$duplicate = FALSE;
}
}
}
$title = "";
$description = "";
$link = "";
$insideitem = FALSE;
}
}

function characterData($ parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTIO N":
$description .= $data;

break;
case "LINK":
$link .= $data;
break;
}
}
}

$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
xml_set_charact er_data_handler ($xml_parser, "characterData" );
$fp = fopen("$feedsou rce","r") or die("Error reading RSS data.");
while ($data = fread($fp, 4096)) {
xml_parse($xml_ parser, $data, feof($fp)) or die(sprintf("XM L error: %s
at line %d", xml_error_strin g(xml_get_error _code($xml_pars er)),
xml_get_current _line_number($x ml_parser)));
}
fclose($fp);
xml_parser_free ($xml_parser);

?>

Jul 23 '05 #1
2 1693
>I have a script which parses the BBC RSS feed once an hour, and drops
any stories with certain keywords into a database. I would like to ban
certain strings; at the moment the script will pick up the word "train"
but also "training"; this is giving me a lot of false positives. Can
anyone assist? I tried a couple of things but nothing that works.


Have you tried a regular expression? I had a quick glance at your code, but
couldn't see any.

PHP Manual: http://au3.php.net/regex

Michael
Jul 24 '05 #2
No, there is none at the moment. I must admit that, apart from an email
address checking script which I got from a tutorial site, I have
avoided regex because of its infamous complexity, but I guess an
expression for this wouldnt be that hard. I will give it a go.

Jul 24 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2203
by: John Chiurato | last post by:
Yes, I am a newbie to PHP. Worried that using PHP may affect my search engine ranking. Would a script like the one below be adequate? <?php class Page { var $Title; var $Keywords; var $Content;
1
2300
by: Dave L | last post by:
I am authoring a basic search page for our website. First, I am searching our db using the entire phrase entered by the searcher. If there are no results, I parse the phrase using java's StringTokenizer, and search on each word. I would like to exclude "basic" words, and have come up with a short list of => (a an the and of or , .). Is there a good reference on the web that will give me tips on what words to exclude? Thank you.
5
4174
by: JP SIngh | last post by:
Hi All This is a complicated one, not for the faint hearted :) :) :) Please help if you can how to achieve this search. We have a freetext search entry box to allow users to search the database. I am searching two tables. SELECT TapeRecords.Id, TapeRecords.ItemTitle, TapeRecords.SourceRef,
4
3600
by: shonend | last post by:
I am trying to extract the pattern like this : "SUB: some text LOT: one-word" Described, "SUB" and "LOT" are key words; I want those words, everything in between and one word following the "LOT:". Source text may contain multiple "SUB: ... LOT:" blocks. For example this is my source text:
9
3376
by: Nenad Loncarevic | last post by:
I am a geologist, and over the years I've accumulated quite a number of proffesional papers on the subject, in various publications. I would like to make a database that would help me find the information I want, based on keywords mentioned in the needed paper. Since I don't feel like inventing hot water, I thought I'd ask you people a few questions. I was planning on making tables with a many-to-many relationship for papers and...
20
8352
by: admyc | last post by:
How can I make google show a link to my website when words are entred into its search field that don't actually appear in the main page of/anywhere in my website. I think these keywords need to be put somewhere not in the body of the index.html page and I think something called "meta data" is needed but don't know if this is right. Any help very much appreciated.
1
1581
by: AntiChrist | last post by:
In VS 2005 if you exclude files from a project, it actually renames the file to filename.exclude. In previous versions, it just left the file alone but excluded it. If you have a very large ASP.net web site you may have thousands of images. In our case we have millions because the nature of the site is that users upload their photo albums to share with other users. I NEVER want the IDE to enumerate these files for any reason, including...
5
11947
by: mforema | last post by:
Hi Everyone, I want to search records by typing in multiple keywords. I currently have a search form. It has a combo box, text box, Search command button, and a subform. The combo box lists the names of the fields found in my subform. The search form is supposed to allow a user to choose which field he/she wants to search by and then type a keyword(s) in the text box. The subform should display the filtered results. My problem occurs...
3
1707
by: ITSimTech | last post by:
I'm trying to learn how/do two things here: 1) If the user searches for "Data" ($searchtext = "Data") the output should also include the fourth record because Field1 contains "all". 2) But the output of this same search should also exclude any records where Field1 contains "info" ($searchtext = "info"). You can see by the textsearch template that I have 3 case conversion variables that can be used to to include/exclude "All", "all", "ALL",...
0
8233
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8675
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8619
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8334
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8474
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5561
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4173
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2604
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1482
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.