"Piotr" <pi**@gaztea.pl> wrote in message
news:1e*******************************@40tude.net. ..
Is there any way to split all merged words but www and e-mail addresses?
I have regexp
preg_replace("/(\.)([[:alpha:]])/", "\\1 \\2", "www.google.com
any,merged.words my****@domain.com")
it give me incorrect result:
www. google. com any, merged. words mymail@domain. com
i need result
www.google.com any, merged. words my****@domain.com
in my case, all web addresses has www. or http:// in beggining of string
and email of course @ inside string
is it possible to write regexp like this?
No. You would use a lookbehind assertion in instances like these, but the
assertion has to be fixed length. Since a domain name can be of any number
of characters, you can't do it.
What you can do is first search for domain names and email addresses,
replacing them with some placeholders, fix the merged words, then replace
the placeholders again. Example:
function encode($m) { return "###" . base64_encode($m[0]) . "###"; }
function decode($m) { return base64_decode($m[1]); }
$s = "www.google.com any,merged.words
my****@domain.com";
$s = preg_replace_callback('/\bwww\.[\w\.]+/', 'encode', $s);
$s = preg_replace_callback('/\b[\w\.]+@[\w\.]+/', 'encode', $s);
$s = preg_replace('/([,.])(\w)/', '\1 \2', $s);
$s = preg_replace_callback('/###(.*?)###/', 'decode', $s);
echo $s;