By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,278 Members | 1,342 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,278 IT Pros & Developers. It's quick & easy.

Complex replace in php 4

P: n/a
I searched and tried to develop (with no luck) a function to do the
following:
I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word
Also, I'm using php4 so I can't use str_ireplace for case insensitive
replace.

Can you help me?

Regards.

--
Fabri
Tag Wii: 8680 1598 2246 2466
http://www.consolerecords.it/forum/viewtopic.php?t=217
Apr 30 '07 #1
Share this Question
Share on Google+
11 Replies


P: n/a
On Apr 30, 6:13 pm, Fabri <farsi.i.cazzi.pro...@mai.ehwrote:
I searched and tried to develop (with no luck) a function to do the
following:

I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"

What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word
Regular Expressions. preg_replace() is PERL compatible regular
expressions, that's just my preference. You can use the normal PHP
regex, too (ereg_replace()).

$string = preg_replace( '#([\s\(])my car([\s\)\.])#i', '$1<a
href="...">my car</a>$2', $string );

The character groups on either side of "my car" allow "my car" to be
next to new lines, spaces, tabs, parenthesis, and periods. You can
take away the character groups and $1 and $2 in the second argument to
let it be replaced in any context, even in the middle of some text
with no spaces. If you want to learn more on regular expressions,
there might be a newsgroup for that too, and of coarse your favorite
search engine will be a big help. There are entire books written on
regex.

preg_replace():
http://php.net/manual/en/function.preg-replace

Apr 30 '07 #2

P: n/a
On May 1, 12:13 am, Fabri <farsi.i.cazzi.pro...@mai.ehwrote:
I searched and tried to develop (with no luck) a function to do the
following:

I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"

What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word

Also, I'm using php4 so I can't use str_ireplace for case insensitive
replace.

Can you help me?

Regards.

--
Fabri
Tag Wii: 8680 1598 2246 2466http://www.consolerecords.it/forum/viewtopic.php?t=217
Ah, you also need to avoid doing the replacement when the word appears
in an HTML attribute. For example:

<a href="http://www.google.pl/search?q=car"... </a>

A simple search and replace, even with regular expression, isn't going
to work always. You will need to parse the HTML to some degree. Where
is this text coming from? HTML that can contain Javascript will
require a full-fledge parser.

Apr 30 '07 #3

P: n/a
I assumed that if it were wrapped in an anchor tag there would be no
whitespace on the inside of the anchor tag. It won't replace the
following:

<a href="...">my car</a>

unless he takes out the character groups

-Mike PII

Apr 30 '07 #4

P: n/a

"Fabri" <fa******************@mai.ehwrote in message
news:46**********************@reader4.news.tin.it. ..
>I searched and tried to develop (with no luck) a function to do the
following:
I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a new
car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word
Also, I'm using php4 so I can't use str_ireplace for case insensitive
replace.

Can you help me?
you want a basic recursive parser but its probably overly complicated for
what you need.

Its better if you can add some structural information to the tag that will
be ignored by html. This will help you in more efficiently searching for
car.

You could do something like
"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a new
car. My new <car>car</caris <em>red</em>! Please don't think to be in
Nascar!!"
and then just search for <car>car</carand replace it with the link.

You could also do something like

<span class="MyNewCar">car</span>

and essentially do the same with the added bonus that you can modify the
style using css.

i.e., search for <span class="MyNewCar"and replace it with

<a href="..."><span class="MyNewCar">car</span></a>
Now if you can do the processing off line you want to write a simple
recusive parser. What you do here is search for all instances of cars and
then search backwards to make sure they are not contained in any <a href>
tags. The issue here is that theoretically it could take very long to do
this.

Since you are making car something special I would imagine you could just
add some structural information to it to make it special. If you are worried
about apply the same thing twice so you get something like

<a href="..."><a href="..."><span class="MyNewCar">car</span></a></a>

then its pretty easy to check to prevent that.

I would suggest you play around with it using simple examples and see what
you come up with. Its essentially just searching and I don't think you'll
need more than that. (and I doubt you'll need regular expressions)

Jon

May 1 '07 #5

P: n/a
Fabri wrote:
"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
Try the following... [Credits due: Brad Choate, John Gruber, Matthew
McGlynn and Alex Rosenberg for the _tokenize() function.]

<?php

// how many times do you want to allow the tokenizer to loop?
// The higher the value, the longer your system could churn
// given an infinite-loop bug (or really really really long text string).
define('MAX_TOKENIZER_LOOPS', 2000);

// print error on tokenizer loop problem?
define('ADVISE_TOKENIZER_FAILURE', FALSE);

// keys for $tokens hash
define('TOKENS_TYPE_TEXT', 'text');
define('TOKENS_TYPE_TAG', 'tag');

function _tokenize(&$str, &$tokens) {
#
# Parameter: Pointer to string containing HTML markup,
# pointer to array to store results.
#
# Output array contains tokens comprising the input
# string. Each token is either a tag (possibly with nested,
# tags contained therein, such as <a href="<MTFoo>">, or a
# run of text between tags. Each element of the array is a
# two-element array; the first is either 'tag' or 'text';
# the second is the actual value.
#
# Based on the _tokenize() subroutine from Brad Choate's MTRegex plugin.
# <http://www.bradchoate.com/past/mtregex.php>
$len = strlen($str);

$depth = 6;
$nested_tags = str_repeat('(?:<(?:[^<>]|', $depth);
$nested_tags = substr($nested_tags, 0, -1);
$nested_tags .= str_repeat(')*>)', $depth);

$match = "/(?s: <! ( -- .*? -- \s* )+ ) |
(?s: <\? .*? \?) |
$nested_tags/x";

$last_tag_end = -1;
$loops = $offset = 0;

//433 PHP 4.3.3 is required for this
//433 while (preg_match($match, $str, $hits, PREG_OFFSET_CAPTURE, $offset)) {
while (preg_match($match, substr($str, $offset), $hits, PREG_OFFSET_CAPTURE)) {

$extracted_tag = $hits[0][0]; // contains the full HTML tag
//433 $tag_start = (int)$hits[0][1]; // position of captured in string
$tag_start = $offset + (int)$hits[0][1]; // position of captured in string
$offset = $tag_start + 1; // tells preg_match where to start on next iteration

// if this tag isn't next to the previous one, store the interstitial text
if ($tag_start $last_tag_end) {
$tokens[] = array('type' =TOKENS_TYPE_TEXT,
'body' =substr($str, $last_tag_end+1, $tag_start-$last_tag_end-1));
}

$tokens[] = array('type' =TOKENS_TYPE_TAG,
'body' =$extracted_tag);

$last_tag_end = $tag_start + strlen($extracted_tag) - 1;

if ($loops++ MAX_TOKENIZER_LOOPS) {

if (ADVISE_TOKENIZER_FAILURE) {
print "SmartyPants _tokenize failure.";
}
return;
}
}
// if text remains after the close of the last tag, grab it
if ($offset < $len) {
$tokens[] = array('type' =TOKENS_TYPE_TEXT,
'body' =substr($str, $last_tag_end + 1));
}

return;

}

/**
* Make a particular word in an HTML string into a link.
*
* @copyright Copyright (C) 2007 Toby A Inkster
* @param string $haystack HTML string to search through.
* @param string $needle Word or phrase to find.
* @param string $link Link to add to this word. Opt; default Wikipedia.
* @param boolean $case_sensitive Matching sensitivity. Opt; FALSE.
*/
function linkity ($haystack, $needle, $link='', $case_sensitive=FALSE)
{
if ($link=='')
$link = 'http://en.wikipedia.org/wiki/'.ucfirst($word);

$regexp = '#\b('.$word.')\b#'.($case_senitive?'':'i');
$inlink = FALSE;
$out = '';

$tokens = array();
_tokenize($string, $tokens);

foreach ($tokens as $t)
{
if ($t['type']==TOKENS_TYPE_TAG)
{
if (preg_match('#<a#i', $t['body']))
$inlink = TRUE;
elseif (preg_match('#</a#i', $t['body']))
$inlink = FALSE;
$out .= $t['body'];
}
else
{
if ($inlink)
$out .= $t['body'];
else
$out .= preg_replace($regexp,
"<a href=\"{$link}\">$1</a>",
$t['body']);
}
}
return $out;
}

# Test -- should only link the second and third occurances of the word 'car'.
$str = 'Le\'ts go to <a href="my.htm">my car</a>. Tomorrow I\'ll have to buy
a new car. My new car is <em>red</em>! Please don\'t think to be in Nascar!!';
print linkity($str, 'car')."\n";

?>
--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 1 '07 #6

P: n/a
Mike P2 wrote:
I assumed that if it were wrapped in an anchor tag there would be no
whitespace on the inside of the anchor tag. It won't replace the
following:

<a href="...">my car</a>

unless he takes out the character groups
However, yours will replace:

<a href="...">my car is very fuel-efficient</a>

--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 1 '07 #7

P: n/a
On May 1, 6:54 am, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
Mike P2 wrote:
I assumed that if it were wrapped in an anchor tag there would be no
whitespace on the inside of the anchor tag. It won't replace the
following:
<a href="...">my car</a>
unless he takes out the character groups

However, yours will replace:

<a href="...">my car is very fuel-efficient</a>
Actually, it will not. '>' is not an accepted character in either
character group.

BTW, as I mentioned before, my idea assumes there will be no
whitespace on the ends of the content of the link if there is one
already. That can be fixed like this:

<?php
$search = 'my car';
$link = '...';
$string = 'my car is very fuel-efficient';

$string = str_ireplace( $search, " $search ", $string );
$string = preg_replace( '#(<a[^>]+>)(\s+)#i', '$2$1', " $string" );
$string = preg_replace( '#(\s+)</a>#i', '</a>$1', $string );
$string = preg_replace( "#([\s\(])$search([\s\)\.])#i", "$1<a
href='$link'>$search</a>$2", $string );

echo $string;
?>

What I thought of is to add those two extra preg_replace()s before the
main one that moves whitespace on edges from inside to outside of
anchor tags. The middle preg_replace() may be optional, since the last
one will not work if the words are butted up against the open tag
anyway. Finally, I just added that str_ireplace() so it can even
replace the keywords when next to or inside of some other tag. If you
think this is too slow, consider taking out case insensitivity or the
middle preg_replace() (or the first one maybe).

-Mike PII

May 1 '07 #8

P: n/a
Oh yea, and with that example I just posted, if you are going to
replace multiple keywords, you only need to run the first two (or one
of them if you are only going to use one of them) preg_replace()s
once. If you plan to make a function out of this, take out the first
two preg_replace()s and run them once separately before calling the
function.

-Mike PII

May 1 '07 #9

P: n/a
On 01.05.2007 00:13 Fabri wrote:
I searched and tried to develop (with no luck) a function to do the
following:
I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word
Also, I'm using php4 so I can't use str_ireplace for case insensitive
replace.

Can you help me?

Regards.
Well, over 30 hours and still no correct answer... weird ;)

How about this:
$text = <<<EE
"Le'ts go to <a href="my.htm">my car</a>.
Tomorrow I'll have to buy a
new car. My new car is <em>red</em>!
Please don't think to be in Nascar!!"
EE;

echo preg_replace(
'~\bcar\b(?![^<>]*</a>)~i',
"<a href='zzz'>$0</a>",
$text);

If you need comments, feel free to ask.

--
gosha bine

extended php parser ~ http://code.google.com/p/pihipi
blok ~ http://www.tagarga.com/blok
May 2 '07 #10

P: n/a
Mike P2 wrote:
Toby A Inkster wrote:
>However, yours will replace:
<a href="...">my car is very fuel-efficient</a>

Actually, it will not. '>' is not an accepted character in either
character group.
Sorry -- hadn't noticed that you'd made "my car" the link target instead
of "car", which was what the OP had requested. OK then, yours will screw
up when it sees this as input:

<a href="...">and my car is very fuel-efficient</a>
--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 2 '07 #11

P: n/a
gosha bine wrote:
Well, over 30 hours and still no correct answer... weird ;)
8-O (That's an "emoticon", not the Brazil/Andorra football results.)

http://message-id.net/rb************@ophelia.g5n.co.uk
--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 2 '07 #12

This discussion thread is closed

Replies have been disabled for this discussion.