I have a script which parses the BBC RSS feed once an hour, and drops
any stories with certain keywords into a database. I would like to ban
certain strings; at the moment the script will pick up the word "train"
but also "training"; this is giving me a lot of false positives. Can
anyone assist? I tried a couple of things but nothing that works.
Script is as follows:
<?php
$keywords = array("keyword1 ", "keyword2", "keyword3") ;
$bannedwords = array("bannedwo rd1", "bannedword 2", "bannedword 3");
$feedsource =
"http://news.bbc.co.uk/rss/newsonline_uk_e dition/uk/rss091.xml";
$db = mysql_connect(" localhost","use rname","passwor d") or
die(mysql_error ());
mysql_select_db ("database") ;
$insideitem = FALSE;
$tag = "";
$title = "";
$description = "";
$textdump = "";
$link = "";
$itemcount = FALSE;
$body1 = "";
function startElement($p arser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = TRUE;
}
}
function endElement($par ser, $name) {
global $insideitem, $tag, $title, $description, $link, $keywords;
$numkeywords = count($keywords );
$duplicate = FALSE;
if ($name == "ITEM") {
for($counter1=0 ; $counter1 < $numkeywords; $counter1++) {
if(stristr($tit le, $keywords[$counter1]) || strstr($descrip tion,
$keywords[$counter1])) {
$sql = "select * from tblRSSfeed";
$result = mysql_query($sq l) or die(mysql_error ());
while ($row = mysql_fetch_arr ay($result)) {
if($row[txtLink] == trim($link)) {
$duplicate = TRUE;
}
}
if($duplicate == FALSE) {
$itemcount = TRUE;
$datetime = date("Y-m-d H:i:s");
$title = trim(str_replac e("'", "\'", $title));
$description = trim(str_replac e("'", "\'", $description));
$link = trim($link);
$sql = "INSERT INTO tblRSSfeed VALUES(NULL, '$title',
'$description', '$link', '$datetime')";
mysql_query($sq l) or die(mysql_error ());
} else {
$duplicate = FALSE;
}
}
}
$title = "";
$description = "";
$link = "";
$insideitem = FALSE;
}
}
function characterData($ parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTIO N":
$description .= $data;
break;
case "LINK":
$link .= $data;
break;
}
}
}
$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
xml_set_charact er_data_handler ($xml_parser, "characterData" );
$fp = fopen("$feedsou rce","r") or die("Error reading RSS data.");
while ($data = fread($fp, 4096)) {
xml_parse($xml_ parser, $data, feof($fp)) or die(sprintf("XM L error: %s
at line %d", xml_error_strin g(xml_get_error _code($xml_pars er)),
xml_get_current _line_number($x ml_parser)));
}
fclose($fp);
xml_parser_free ($xml_parser);
?> 2 1693
>I have a script which parses the BBC RSS feed once an hour, and drops any stories with certain keywords into a database. I would like to ban certain strings; at the moment the script will pick up the word "train" but also "training"; this is giving me a lot of false positives. Can anyone assist? I tried a couple of things but nothing that works.
Have you tried a regular expression? I had a quick glance at your code, but
couldn't see any.
PHP Manual: http://au3.php.net/regex
Michael
No, there is none at the moment. I must admit that, apart from an email
address checking script which I got from a tutorial site, I have
avoided regex because of its infamous complexity, but I guess an
expression for this wouldnt be that hard. I will give it a go. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: John Chiurato |
last post by:
Yes, I am a newbie to PHP.
Worried that using PHP may affect my search engine ranking. Would a script
like the one below be adequate?
<?php
class Page {
var $Title;
var $Keywords;
var $Content;
|
by: Dave L |
last post by:
I am authoring a basic search page for our website. First, I am
searching our db using the entire phrase entered by the searcher. If
there are no results, I parse the phrase using java's StringTokenizer,
and search on each word. I would like to exclude "basic" words, and
have come up with a short list of => (a an the and of or , .). Is
there a good reference on the web that will give me tips on what words
to exclude?
Thank you.
|
by: JP SIngh |
last post by:
Hi All
This is a complicated one, not for the faint hearted :) :) :)
Please help if you can how to achieve this search.
We have a freetext search entry box to allow users to search the database. I
am searching two tables.
SELECT TapeRecords.Id, TapeRecords.ItemTitle, TapeRecords.SourceRef,
|
by: shonend |
last post by:
I am trying to extract the pattern like this :
"SUB: some text LOT: one-word"
Described, "SUB" and "LOT" are key words; I want those words,
everything in between and one word following the "LOT:". Source text
may contain multiple "SUB: ... LOT:" blocks.
For example this is my source text:
|
by: Nenad Loncarevic |
last post by:
I am a geologist, and over the years I've accumulated quite a number
of proffesional papers on the subject, in various publications. I
would like to make a database that would help me find the information
I want, based on keywords mentioned in the needed paper.
Since I don't feel like inventing hot water, I thought I'd ask you
people a few questions.
I was planning on making tables with a many-to-many relationship for
papers and...
| |
by: admyc |
last post by:
How can I make google show a link to my website when words are entred
into its search field that don't actually appear in the main page
of/anywhere in my website.
I think these keywords need to be put somewhere not in the body of the
index.html page and I think something called "meta data" is needed but
don't know if this is right.
Any help very much appreciated.
|
by: AntiChrist |
last post by:
In VS 2005 if you exclude files from a project, it actually renames the
file to filename.exclude. In previous versions, it just left the file
alone but excluded it. If you have a very large ASP.net web site you
may have thousands of images. In our case we have millions because the
nature of the site is that users upload their photo albums to share
with other users. I NEVER want the IDE to enumerate these files for
any reason, including...
|
by: mforema |
last post by:
Hi Everyone,
I want to search records by typing in multiple keywords. I currently have a search form. It has a combo box, text box, Search command button, and a subform. The combo box lists the names of the fields found in my subform. The search form is supposed to allow a user to choose which field he/she wants to search by and then type a keyword(s) in the text box. The subform should display the filtered results.
My problem occurs...
|
by: ITSimTech |
last post by:
I'm trying to learn how/do two things here:
1) If the user searches for "Data" ($searchtext = "Data") the output should also include the fourth record because Field1 contains "all".
2) But the output of this same search should also exclude any records where Field1 contains "info" ($searchtext = "info"). You can see by the textsearch template that I have 3 case conversion variables that can be used to to include/exclude "All", "all", "ALL",...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |