473,326 Members | 2,136 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Complex replace in php 4

I searched and tried to develop (with no luck) a function to do the
following:
I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word
Also, I'm using php4 so I can't use str_ireplace for case insensitive
replace.

Can you help me?

Regards.

--
Fabri
Tag Wii: 8680 1598 2246 2466
http://www.consolerecords.it/forum/viewtopic.php?t=217
Apr 30 '07 #1
11 2927
On Apr 30, 6:13 pm, Fabri <farsi.i.cazzi.pro...@mai.ehwrote:
I searched and tried to develop (with no luck) a function to do the
following:

I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"

What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word
Regular Expressions. preg_replace() is PERL compatible regular
expressions, that's just my preference. You can use the normal PHP
regex, too (ereg_replace()).

$string = preg_replace( '#([\s\(])my car([\s\)\.])#i', '$1<a
href="...">my car</a>$2', $string );

The character groups on either side of "my car" allow "my car" to be
next to new lines, spaces, tabs, parenthesis, and periods. You can
take away the character groups and $1 and $2 in the second argument to
let it be replaced in any context, even in the middle of some text
with no spaces. If you want to learn more on regular expressions,
there might be a newsgroup for that too, and of coarse your favorite
search engine will be a big help. There are entire books written on
regex.

preg_replace():
http://php.net/manual/en/function.preg-replace

Apr 30 '07 #2
On May 1, 12:13 am, Fabri <farsi.i.cazzi.pro...@mai.ehwrote:
I searched and tried to develop (with no luck) a function to do the
following:

I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"

What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word

Also, I'm using php4 so I can't use str_ireplace for case insensitive
replace.

Can you help me?

Regards.

--
Fabri
Tag Wii: 8680 1598 2246 2466http://www.consolerecords.it/forum/viewtopic.php?t=217
Ah, you also need to avoid doing the replacement when the word appears
in an HTML attribute. For example:

<a href="http://www.google.pl/search?q=car"... </a>

A simple search and replace, even with regular expression, isn't going
to work always. You will need to parse the HTML to some degree. Where
is this text coming from? HTML that can contain Javascript will
require a full-fledge parser.

Apr 30 '07 #3
I assumed that if it were wrapped in an anchor tag there would be no
whitespace on the inside of the anchor tag. It won't replace the
following:

<a href="...">my car</a>

unless he takes out the character groups

-Mike PII

Apr 30 '07 #4

"Fabri" <fa******************@mai.ehwrote in message
news:46**********************@reader4.news.tin.it. ..
>I searched and tried to develop (with no luck) a function to do the
following:
I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a new
car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word
Also, I'm using php4 so I can't use str_ireplace for case insensitive
replace.

Can you help me?
you want a basic recursive parser but its probably overly complicated for
what you need.

Its better if you can add some structural information to the tag that will
be ignored by html. This will help you in more efficiently searching for
car.

You could do something like
"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a new
car. My new <car>car</caris <em>red</em>! Please don't think to be in
Nascar!!"
and then just search for <car>car</carand replace it with the link.

You could also do something like

<span class="MyNewCar">car</span>

and essentially do the same with the added bonus that you can modify the
style using css.

i.e., search for <span class="MyNewCar"and replace it with

<a href="..."><span class="MyNewCar">car</span></a>
Now if you can do the processing off line you want to write a simple
recusive parser. What you do here is search for all instances of cars and
then search backwards to make sure they are not contained in any <a href>
tags. The issue here is that theoretically it could take very long to do
this.

Since you are making car something special I would imagine you could just
add some structural information to it to make it special. If you are worried
about apply the same thing twice so you get something like

<a href="..."><a href="..."><span class="MyNewCar">car</span></a></a>

then its pretty easy to check to prevent that.

I would suggest you play around with it using simple examples and see what
you come up with. Its essentially just searching and I don't think you'll
need more than that. (and I doubt you'll need regular expressions)

Jon

May 1 '07 #5
Fabri wrote:
"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
Try the following... [Credits due: Brad Choate, John Gruber, Matthew
McGlynn and Alex Rosenberg for the _tokenize() function.]

<?php

// how many times do you want to allow the tokenizer to loop?
// The higher the value, the longer your system could churn
// given an infinite-loop bug (or really really really long text string).
define('MAX_TOKENIZER_LOOPS', 2000);

// print error on tokenizer loop problem?
define('ADVISE_TOKENIZER_FAILURE', FALSE);

// keys for $tokens hash
define('TOKENS_TYPE_TEXT', 'text');
define('TOKENS_TYPE_TAG', 'tag');

function _tokenize(&$str, &$tokens) {
#
# Parameter: Pointer to string containing HTML markup,
# pointer to array to store results.
#
# Output array contains tokens comprising the input
# string. Each token is either a tag (possibly with nested,
# tags contained therein, such as <a href="<MTFoo>">, or a
# run of text between tags. Each element of the array is a
# two-element array; the first is either 'tag' or 'text';
# the second is the actual value.
#
# Based on the _tokenize() subroutine from Brad Choate's MTRegex plugin.
# <http://www.bradchoate.com/past/mtregex.php>
$len = strlen($str);

$depth = 6;
$nested_tags = str_repeat('(?:<(?:[^<>]|', $depth);
$nested_tags = substr($nested_tags, 0, -1);
$nested_tags .= str_repeat(')*>)', $depth);

$match = "/(?s: <! ( -- .*? -- \s* )+ ) |
(?s: <\? .*? \?) |
$nested_tags/x";

$last_tag_end = -1;
$loops = $offset = 0;

//433 PHP 4.3.3 is required for this
//433 while (preg_match($match, $str, $hits, PREG_OFFSET_CAPTURE, $offset)) {
while (preg_match($match, substr($str, $offset), $hits, PREG_OFFSET_CAPTURE)) {

$extracted_tag = $hits[0][0]; // contains the full HTML tag
//433 $tag_start = (int)$hits[0][1]; // position of captured in string
$tag_start = $offset + (int)$hits[0][1]; // position of captured in string
$offset = $tag_start + 1; // tells preg_match where to start on next iteration

// if this tag isn't next to the previous one, store the interstitial text
if ($tag_start $last_tag_end) {
$tokens[] = array('type' =TOKENS_TYPE_TEXT,
'body' =substr($str, $last_tag_end+1, $tag_start-$last_tag_end-1));
}

$tokens[] = array('type' =TOKENS_TYPE_TAG,
'body' =$extracted_tag);

$last_tag_end = $tag_start + strlen($extracted_tag) - 1;

if ($loops++ MAX_TOKENIZER_LOOPS) {

if (ADVISE_TOKENIZER_FAILURE) {
print "SmartyPants _tokenize failure.";
}
return;
}
}
// if text remains after the close of the last tag, grab it
if ($offset < $len) {
$tokens[] = array('type' =TOKENS_TYPE_TEXT,
'body' =substr($str, $last_tag_end + 1));
}

return;

}

/**
* Make a particular word in an HTML string into a link.
*
* @copyright Copyright (C) 2007 Toby A Inkster
* @param string $haystack HTML string to search through.
* @param string $needle Word or phrase to find.
* @param string $link Link to add to this word. Opt; default Wikipedia.
* @param boolean $case_sensitive Matching sensitivity. Opt; FALSE.
*/
function linkity ($haystack, $needle, $link='', $case_sensitive=FALSE)
{
if ($link=='')
$link = 'http://en.wikipedia.org/wiki/'.ucfirst($word);

$regexp = '#\b('.$word.')\b#'.($case_senitive?'':'i');
$inlink = FALSE;
$out = '';

$tokens = array();
_tokenize($string, $tokens);

foreach ($tokens as $t)
{
if ($t['type']==TOKENS_TYPE_TAG)
{
if (preg_match('#<a#i', $t['body']))
$inlink = TRUE;
elseif (preg_match('#</a#i', $t['body']))
$inlink = FALSE;
$out .= $t['body'];
}
else
{
if ($inlink)
$out .= $t['body'];
else
$out .= preg_replace($regexp,
"<a href=\"{$link}\">$1</a>",
$t['body']);
}
}
return $out;
}

# Test -- should only link the second and third occurances of the word 'car'.
$str = 'Le\'ts go to <a href="my.htm">my car</a>. Tomorrow I\'ll have to buy
a new car. My new car is <em>red</em>! Please don\'t think to be in Nascar!!';
print linkity($str, 'car')."\n";

?>
--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 1 '07 #6
Mike P2 wrote:
I assumed that if it were wrapped in an anchor tag there would be no
whitespace on the inside of the anchor tag. It won't replace the
following:

<a href="...">my car</a>

unless he takes out the character groups
However, yours will replace:

<a href="...">my car is very fuel-efficient</a>

--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 1 '07 #7
On May 1, 6:54 am, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
Mike P2 wrote:
I assumed that if it were wrapped in an anchor tag there would be no
whitespace on the inside of the anchor tag. It won't replace the
following:
<a href="...">my car</a>
unless he takes out the character groups

However, yours will replace:

<a href="...">my car is very fuel-efficient</a>
Actually, it will not. '>' is not an accepted character in either
character group.

BTW, as I mentioned before, my idea assumes there will be no
whitespace on the ends of the content of the link if there is one
already. That can be fixed like this:

<?php
$search = 'my car';
$link = '...';
$string = 'my car is very fuel-efficient';

$string = str_ireplace( $search, " $search ", $string );
$string = preg_replace( '#(<a[^>]+>)(\s+)#i', '$2$1', " $string" );
$string = preg_replace( '#(\s+)</a>#i', '</a>$1', $string );
$string = preg_replace( "#([\s\(])$search([\s\)\.])#i", "$1<a
href='$link'>$search</a>$2", $string );

echo $string;
?>

What I thought of is to add those two extra preg_replace()s before the
main one that moves whitespace on edges from inside to outside of
anchor tags. The middle preg_replace() may be optional, since the last
one will not work if the words are butted up against the open tag
anyway. Finally, I just added that str_ireplace() so it can even
replace the keywords when next to or inside of some other tag. If you
think this is too slow, consider taking out case insensitivity or the
middle preg_replace() (or the first one maybe).

-Mike PII

May 1 '07 #8
Oh yea, and with that example I just posted, if you are going to
replace multiple keywords, you only need to run the first two (or one
of them if you are only going to use one of them) preg_replace()s
once. If you plan to make a function out of this, take out the first
two preg_replace()s and run them once separately before calling the
function.

-Mike PII

May 1 '07 #9
On 01.05.2007 00:13 Fabri wrote:
I searched and tried to develop (with no luck) a function to do the
following:
I have a string that may be:

"Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a
new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
What I have to do is replace occurences of "car" with <a
href="/...">car</aBUT in these cases:

- if there is already a wrapped link
- if car is part of another word
Also, I'm using php4 so I can't use str_ireplace for case insensitive
replace.

Can you help me?

Regards.
Well, over 30 hours and still no correct answer... weird ;)

How about this:
$text = <<<EE
"Le'ts go to <a href="my.htm">my car</a>.
Tomorrow I'll have to buy a
new car. My new car is <em>red</em>!
Please don't think to be in Nascar!!"
EE;

echo preg_replace(
'~\bcar\b(?![^<>]*</a>)~i',
"<a href='zzz'>$0</a>",
$text);

If you need comments, feel free to ask.

--
gosha bine

extended php parser ~ http://code.google.com/p/pihipi
blok ~ http://www.tagarga.com/blok
May 2 '07 #10
Mike P2 wrote:
Toby A Inkster wrote:
>However, yours will replace:
<a href="...">my car is very fuel-efficient</a>

Actually, it will not. '>' is not an accepted character in either
character group.
Sorry -- hadn't noticed that you'd made "my car" the link target instead
of "car", which was what the OP had requested. OK then, yours will screw
up when it sees this as input:

<a href="...">and my car is very fuel-efficient</a>
--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 2 '07 #11
gosha bine wrote:
Well, over 30 hours and still no correct answer... weird ;)
8-O (That's an "emoticon", not the Brazil/Andorra football results.)

http://message-id.net/rb************@ophelia.g5n.co.uk
--
Toby A Inkster BSc (Hons) ARCS
http://tobyinkster.co.uk/
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
May 2 '07 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: seia0106 | last post by:
Hello, Writing a program in c++ that should use complex numbers I have two choices before me. 1- define a struct for complex data i.e struct {float real,imag; }ComplexNum; 2-use an array...
17
by: Chris Travers | last post by:
Hi all; I just made an interesting discovery. Not sure if it is a good thing or not, and using it certainly breakes first normal form.... Not even sure if it really works. However, as I am...
8
by: Steve Jorgensen | last post by:
Mailing List management is a good example of a case where my conundrum arises. Say there is a m-m relationship between parties and groups - anyone can be a member of any combintation of groups. ...
1
by: Chris Dunaway | last post by:
Consider the following simple class: <Serializable()> _ Public Class SimpleClass Private m_AString As String Private m_AByteArray As Byte() Public Property AString() As String . . . End...
1
by: Najib Abi Fadel | last post by:
Hi i have an ordered table of dates let's say: 1/1/2004 8/1/2004 15/1/2004 29/1/2004 5/2/2004 12/2/2004
1
by: swayze | last post by:
Hi there, We're using php 4.1 and it doesn't seem to have built in support for this. Coming from a dotnet background this surprised me...Anyways, thats a different topic altogether... I'm...
12
by: vj | last post by:
Hi! I have a piece of code (shown below) involving complex numbers. The code is not running and giving error ("Invalid floating point operation" and "SQRT:Domain error"). I would be very...
3
by: J.M. | last post by:
I have data in a double array of length 2N, which actually represents complex numbers with real and imaginary parts interlaced. In other words, elements in this array with even indices represents...
2
by: mgsn | last post by:
I Have This this Query --------------------------------- set ANSI_NULLS ON set QUOTED_IDENTIFIER ON GO ALTER proc . @userId int=null, -- filter: user id. overrides custid when supplied ...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.