473,836 Members | 1,554 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

KEYWORDS from a string

Nel
Hi all,

Before I re-invent the wheel here, has anyone willing to share a basic
script to extract META keywords from a string. I have a string, let's say
$pageText that contains the dynamic contents of the page.

Ideally, I don't just want to explode the string and remove "and", "or" and
"the" etc. because some the the repeated keywords may be more that one word
long.

Also, it would be good to be able to rank the keywords according to the
frequency.

I have searched google and hotscripts etc. Can only find web sites to
create METAs to copy & paste.

Thanx in advance.

Nel
Jul 17 '05 #1
4 4417
"Nel" <ne***@ne14.co. NOSPAMuk> wrote in message
news:41******** *************** @ptn-nntp-reader02.plus.n et...
Hi all,

Before I re-invent the wheel here, has anyone willing to share a basic
script to extract META keywords from a string. I have a string, let's say
$pageText that contains the dynamic contents of the page.

Ideally, I don't just want to explode the string and remove "and", "or" and "the" etc. because some the the repeated keywords may be more that one word long.

Also, it would be good to be able to rank the keywords according to the
frequency.

I have searched google and hotscripts etc. Can only find web sites to
create METAs to copy & paste.

Thanx in advance.

Nel


See documentation for get_meta_tags() .
Jul 17 '05 #2
"Chung Leong" wrote:
"Nel" <ne***@ne14.co. NOSPAMuk> wrote in message
news:41001aae[quote:362d827a4 d="Chung Leong"]"Nel" <ne***@ne14.co. NOSPAMuk> wrote in message
news:41******** *************** @ptn-nntp-reader02.plus.n et... Hi all,

Before I re-invent the wheel here, has anyone willing to share a basic script to extract META keywords from a string. I have a string, letís say $pageText that contains the dynamic contents of the page.

Ideally, I donít just want to explode the string and remove "and", "or"
and "the" etc. because some the the repeated keywords may be more that one
word long.

Also, it would be good to be able to rank the keywords according to the frequency.

I have searched google and hotscripts etc. Can only find web sites to create METAs to copy & paste.

Thanx in advance.

Nel


See documentation for get_meta_tags() .[/quote:362d827a4 d]
47**********@pt n-nntp-reader02.plus.n et...
Hi all,

Before I re-invent the wheel here, has anyone willing to share a

basic
script to extract META keywords from a string. I have a string,

letís say
$pageText that contains the dynamic contents of the page.

Ideally, I donít just want to explode the string and remove

"and", "or"
and
"the" etc. because some the the repeated keywords may be more

that one
word
long.

Also, it would be good to be able to rank the keywords according

to the
frequency.

I have searched google and hotscripts etc. Can only find web

sites to
create METAs to copy & paste.

Thanx in advance.

Nel


See documentation for get_meta_tags() .


The reply above answers your question if you are looking for strict
definition of meta tags in html.

If by meta you mean keywords that are important and somewhat unique in
the body of the text, then I suggest that you need to have a
definition for common keywords, and then remove them to arrive at
"meta". The way I do it is to start with mysql stop words (search
on web). Then add words that are common in your domain (e.g. "html"
may be a common word on the web). Now remove all of these words from
the string using regular expressions, and what remains is pretty much
unique words.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-KEYWORDS...ict132415.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=442256
Jul 17 '05 #3
"Chung Leong" wrote:
"Nel" <ne***@ne14.co. NOSPAMuk> wrote in message
news:41001aae[quote:362d827a4 d="Chung Leong"]"Nel" <ne***@ne14.co. NOSPAMuk> wrote in message
news:41******** *************** @ptn-nntp-reader02.plus.n et... Hi all,

Before I re-invent the wheel here, has anyone willing to share a basic script to extract META keywords from a string. I have a string, letís say $pageText that contains the dynamic contents of the page.

Ideally, I donít just want to explode the string and remove "and", "or"
and "the" etc. because some the the repeated keywords may be more that one
word long.

Also, it would be good to be able to rank the keywords according to the frequency.

I have searched google and hotscripts etc. Can only find web sites to create METAs to copy & paste.

Thanx in advance.

Nel


See documentation for get_meta_tags() .[/quote:362d827a4 d]
47**********@pt n-nntp-reader02.plus.n et...
Hi all,

Before I re-invent the wheel here, has anyone willing to share a

basic
script to extract META keywords from a string. I have a string,

letís say
$pageText that contains the dynamic contents of the page.

Ideally, I donít just want to explode the string and remove

"and", "or"
and
"the" etc. because some the the repeated keywords may be more

that one
word
long.

Also, it would be good to be able to rank the keywords according

to the
frequency.

I have searched google and hotscripts etc. Can only find web

sites to
create METAs to copy & paste.

Thanx in advance.

Nel


See documentation for get_meta_tags() .


The reply above answers your question if you are looking for strict
definition of meta tags in html.

If by meta you mean keywords that are important and somewhat unique in
the body of the text, then I suggest that you need to have a
definition for common keywords, and then remove them to arrive at
"meta". The way I do it is to start with mysql stop words (search
on web). Then add words that are common in your domain (e.g. "html"
may be a common word on the web). Now remove all of these words from
the string using regular expressions, and what remains is pretty much
unique words.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-KEYWORDS...ict132415.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=442256
Jul 17 '05 #4
Nel
Here is the final script I put together thanks to your help and suggestions.
It will automatically work through a string and remove duplicates, new lines
and punctuation before listing the keywords within a meta tag.

If anyone can offer any improvements I am open to suggestions.

Nel.
_______________ _______________ _______________

<?php // metatags.inc.ph p
// Create keyword META tags from dynamic page content

// test string from BBC News
echo metatags("Tony Blair has nominated long-time ally Peter Mandelson as
Britain's next European commissioner.
The announcement was made after Mr Blair spoke to new European Commission
President Jose Manuel Durao Barroso on Friday morning.

The appointment represents a remarkable comeback for Mr Mandelson, who has
twice resigned from the Cabinet in controversial

circumstances.

It will also trigger a Westminster by-election in his Hartlepool seat.

'Positive response'

In a statement, Mr Mandelson said he was \"delighted\ " to have been
nominated for the post by the prime minister, but confirmed that

he had \"agonised\" over whether the job was right for him.");

function metatags($paget ext)
{
// Define variables for this web site
$websitename = "Example's Web Site";
$metadescriptio n = "This web site's description";
$metakeywords = cleankeywords($ pagetext);

// Build up META TAGS
$metatags = " <meta name=\"Name\" content=\"$webs itename\">\n";
$metatags .= " <meta name=\"Rating\" content=\"Gener al\">\n";
$metatags .= " <meta name=\"Robots\" content=\"Index \">\n";
$metatags .= " <meta name=\"Revisit-After\" content=\"14 days\">\n";
$metatags .= " <meta name=\"DESCRIPT ION\"
content=\"$meta description\">\ n";
$metatags .= " <meta name=\"KEYWORDS \"
content=\"$webs itename,$metake ywords\">\n";

return $metatags;
}
function cleankeywords($ term)
{
//Specify text file containing stop words (one on each line)
$stopwords_file = "stopwords.txt" ;

//Remove punctuation and \n \r
$pat = array("/\./s","/\,/s","/\"/s","/\'/s","/\n/s","/\r/s");
$term = preg_replace($p at, "", $term);

//load list of common words
$common = file($stopwords _file);
$total = count($common);
for ($x=0; $x<= $total; $x++)
{
$common[$x] = trim(strtolower ($common[$x]));
}

//make array of search terms
$_terms = explode(" ", $term);

foreach ($_terms as $line)
{
if (!in_array(strt olower(trim($li ne)), $common))
{
$cleanterm[$line] = $line;
}
}
$cleanwords = implode(", ", $cleanterm);
return $cleanwords;
}
?>
Jul 17 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
1953
by: Berteun Damman | last post by:
Hello, I'm having some problems with pyparsing, I could not find how to tell it to view certain words as keywords, i.e. not as a possible variable name (in an elegant way), for example, I have this little grammar: terminator = Literal(";") expr = Word(alphas) body = Forward();
14
2364
by: Jason Heyes | last post by:
I want to write a class that supports operations on keywords belonging to the C++ programming language. The following program repeatedly prompts the user for a keyword until 'explicit' is finally entered: #include <iostream> #include "KeyWord.h" int main() { KeyWord word;
3
1776
by: Jason Heyes | last post by:
This is a revised version of a post entitled "Class to support keywords". Please reply to this post instead of the old one. The following program repeatedly prompts the user for C++ keywords until 'explicit' is entered. If the user fails to enter a valid keyword, the program terminates. #include <iostream> #include "KeyWord.h"
5
3059
by: Digital.Rebel.18 | last post by:
I'm trying to figure out how to extract the keywords from an HTML document. The input string would typically look like: <meta name='keywords' content='word1, more stuff, etc'> Either single quotes or double quotes can be used and there can be any number of spaces or returns between any element. Keywords can contain special characters except for a comma or a closed bracket. For example, the HTML might be:
9
3386
by: Nenad Loncarevic | last post by:
I am a geologist, and over the years I've accumulated quite a number of proffesional papers on the subject, in various publications. I would like to make a database that would help me find the information I want, based on keywords mentioned in the needed paper. Since I don't feel like inventing hot water, I thought I'd ask you people a few questions. I was planning on making tables with a many-to-many relationship for papers and...
3
1918
by: Richard S | last post by:
CODE: ASP.NET with C# DATABASE: ACCES alright, im having a problem, probably a small thing, but i cant figure out, nor find it in any other post, or on the internet realy (probably cuz i wouldnt know what to search for), but heres the problem: I am making a search function for my website, i want this to be possible: - search for 1 keyword (the problem guy) - search for multiple keywords things that alreaddy work (so no problem with...
5
12002
by: mforema | last post by:
Hi Everyone, I want to search records by typing in multiple keywords. I currently have a search form. It has a combo box, text box, Search command button, and a subform. The combo box lists the names of the fields found in my subform. The search form is supposed to allow a user to choose which field he/she wants to search by and then type a keyword(s) in the text box. The subform should display the filtered results. My problem occurs...
5
2665
by: =?Utf-8?B?UGV0ZXI=?= | last post by:
How can I get the list of connection string's keywords available in sqlclient programmatically? I have found the list in here http://msdn2.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring(vs.80).aspx but I want to get this list programmatically.
1
1206
Ajm113
by: Ajm113 | last post by:
Hello everyone. Ok this is my goal: When a user sorts by number of keywords and lets say one result had 5 keywords of what the user entered in that has a better chance on being up top. I already know how to find keywords in a string of a result, but how can I tell php or mysql where it should be if it had a lot of keywords or not a lot? Thanks, Andrew.
0
9826
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
10560
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10604
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10261
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7796
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Duprť who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6984
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5659
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5831
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4026
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.