473,395 Members | 1,668 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

exclude keywords from RSS search

I have a script which parses the BBC RSS feed once an hour, and drops
any stories with certain keywords into a database. I would like to ban
certain strings; at the moment the script will pick up the word "train"
but also "training"; this is giving me a lot of false positives. Can
anyone assist? I tried a couple of things but nothing that works.

Script is as follows:

<?php

$keywords = array("keyword1", "keyword2", "keyword3");
$bannedwords = array("bannedword1", "bannedword2", "bannedword3");
$feedsource =
"http://news.bbc.co.uk/rss/newsonline_uk_edition/uk/rss091.xml";

$db = mysql_connect("localhost","username","password") or
die(mysql_error());
mysql_select_db("database");

$insideitem = FALSE;
$tag = "";
$title = "";
$description = "";
$textdump = "";
$link = "";
$itemcount = FALSE;
$body1 = "";

function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = TRUE;
}
}

function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link, $keywords;
$numkeywords = count($keywords);
$duplicate = FALSE;

if ($name == "ITEM") {
for($counter1=0; $counter1 < $numkeywords; $counter1++) {
if(stristr($title, $keywords[$counter1]) || strstr($description,
$keywords[$counter1])) {
$sql = "select * from tblRSSfeed";
$result = mysql_query($sql) or die(mysql_error());
while ($row = mysql_fetch_array($result)) {
if($row[txtLink] == trim($link)) {
$duplicate = TRUE;
}
}
if($duplicate == FALSE) {
$itemcount = TRUE;
$datetime = date("Y-m-d H:i:s");
$title = trim(str_replace("'", "\'", $title));
$description = trim(str_replace("'", "\'", $description));
$link = trim($link);
$sql = "INSERT INTO tblRSSfeed VALUES(NULL, '$title',
'$description', '$link', '$datetime')";
mysql_query($sql) or die(mysql_error());
} else {
$duplicate = FALSE;
}
}
}
$title = "";
$description = "";
$link = "";
$insideitem = FALSE;
}
}

function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;

break;
case "LINK":
$link .= $data;
break;
}
}
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
$fp = fopen("$feedsource","r") or die("Error reading RSS data.");
while ($data = fread($fp, 4096)) {
xml_parse($xml_parser, $data, feof($fp)) or die(sprintf("XML error: %s
at line %d", xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
fclose($fp);
xml_parser_free($xml_parser);

?>

Jul 23 '05 #1
2 1683
>I have a script which parses the BBC RSS feed once an hour, and drops
any stories with certain keywords into a database. I would like to ban
certain strings; at the moment the script will pick up the word "train"
but also "training"; this is giving me a lot of false positives. Can
anyone assist? I tried a couple of things but nothing that works.


Have you tried a regular expression? I had a quick glance at your code, but
couldn't see any.

PHP Manual: http://au3.php.net/regex

Michael
Jul 24 '05 #2
No, there is none at the moment. I must admit that, apart from an email
address checking script which I got from a tutorial site, I have
avoided regex because of its infamous complexity, but I guess an
expression for this wouldnt be that hard. I will give it a go.

Jul 24 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: John Chiurato | last post by:
Yes, I am a newbie to PHP. Worried that using PHP may affect my search engine ranking. Would a script like the one below be adequate? <?php class Page { var $Title; var $Keywords; var...
1
by: Dave L | last post by:
I am authoring a basic search page for our website. First, I am searching our db using the entire phrase entered by the searcher. If there are no results, I parse the phrase using java's...
5
by: JP SIngh | last post by:
Hi All This is a complicated one, not for the faint hearted :) :) :) Please help if you can how to achieve this search. We have a freetext search entry box to allow users to search the...
4
by: shonend | last post by:
I am trying to extract the pattern like this : "SUB: some text LOT: one-word" Described, "SUB" and "LOT" are key words; I want those words, everything in between and one word following the...
9
by: Nenad Loncarevic | last post by:
I am a geologist, and over the years I've accumulated quite a number of proffesional papers on the subject, in various publications. I would like to make a database that would help me find the...
20
by: admyc | last post by:
How can I make google show a link to my website when words are entred into its search field that don't actually appear in the main page of/anywhere in my website. I think these keywords need to...
1
by: AntiChrist | last post by:
In VS 2005 if you exclude files from a project, it actually renames the file to filename.exclude. In previous versions, it just left the file alone but excluded it. If you have a very large...
5
by: mforema | last post by:
Hi Everyone, I want to search records by typing in multiple keywords. I currently have a search form. It has a combo box, text box, Search command button, and a subform. The combo box lists the...
3
by: ITSimTech | last post by:
I'm trying to learn how/do two things here: 1) If the user searches for "Data" ($searchtext = "Data") the output should also include the fourth record because Field1 contains "all". 2) But the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.