By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,751 Members | 1,158 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,751 IT Pros & Developers. It's quick & easy.

grep puzzle

P: n/a
Hi,

this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

Thanks
Jul 17 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
*** KhanyBoy wrote/escribió (10 Jun 2004 14:22:24 -0700):
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.


I can't figure out how you manage to get such a garbled input. Anyway, I
guess a combination of html_entity_decode() and html_entities() should
help.
--
--
-- Álvaro G. Vicario - Burgos, Spain
--
Jul 17 '05 #2

P: n/a
KhanyBoy wrote:
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?


OK, this does that, but it may not be a very elegant soltion. I recently
needed the same functionality for a project involving oscommerce.

function ampersandFix($x){
$x=str_replace('&','&',$x);
$pattern='`&(#[0-9]{2,3}|aacute|acirc|acute|aelig|agrave|amp'.
'|aring|atilde|auml|brvbar|brkbar|ccedil|cedil|cen t'.
'|copy|curren|deg|divide|eacute|ecirc|egrave|eth|e uml'.
'|frac12|frac14|frac34|gt|iacute|icirc|iexcl|igrav e'.
'|iquest|iuml|laquo|lt|macr|hibar|micro|middot|nbs p|not'.
'|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otil de'.
'|ouml|para|plusmn|pound|quot|raquo|reg|sect|shy|s up1|sup2'.
'|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml'.
'|die|uuml|yacute|yen|yuml);`i';
$replace='&$1;';
return preg_reacple($pattern,$replace,$x);
}
--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.
Jul 17 '05 #3

P: n/a
Justin Koivisto wrote:
KhanyBoy wrote:
If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?


OK, this does that, but it may not be a very elegant soltion. I recently
needed the same functionality for a project involving oscommerce.

function ampersandFix($x){
$x=str_replace('&','&',$x);
$pattern='`&(#[0-9]{2,3}|aacute|acirc|acute|aelig|agrave|amp'.
'|aring|atilde|auml|brvbar|brkbar|ccedil|cedil|cen t'.
'|copy|curren|deg|divide|eacute|ecirc|egrave|eth|e uml'.
'|frac12|frac14|frac34|gt|iacute|icirc|iexcl|igrav e'.
'|iquest|iuml|laquo|lt|macr|hibar|micro|middot|nbs p|not'.
'|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otil de'.
'|ouml|para|plusmn|pound|quot|raquo|reg|sect|shy|s up1|sup2'.
'|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml'.
'|die|uuml|yacute|yen|yuml);`i';
$replace='&$1;';
return preg_reacple($pattern,$replace,$x);
}


A lot faster, but not as accurate:
$text = preg_replace('!&(?![#a-z0-9]{1,7};)!i','&amp',$text);

You could also use this method (it's a lookahead assertion) with
Justin's function, which will still be a lot faster than his hack ;)

Greetings Christian.
Jul 17 '05 #4

P: n/a
KhanyBoy wrote:
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.


Why are entity references recognised in text?

--
Jock
Jul 17 '05 #5

P: n/a
Regarding this well-known quote, often attributed to KhanyBoy's famous "10
Jun 2004 14:22:24 -0700" speech:
Hi,

this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

Thanks


It might be a different direction, but here are some functions to determine
whether something is an HTML entity, using the built-in PHP entity
functions. It's sort of a recycled reply to an earlier question:

A simple (rough) test of the concept is online at:
http://php.pixelsaredead.com/htmlentities.php

<?php
/*
How to determine whether a given string decodes to an HTML entity
*/

function contains_entities($raw)
{
// $raw can be a string of any length
$raw = trim($raw);
return (strlen(htmlentities($raw)) > strlen($raw));
}

function is_entity_reference($raw)
{
// $raw should be a string with only the entity ref in it,
// in the form "&...;"

return (preg_match('/&.+;/', $raw)) &&
(strlen(html_entity_decode(trim($raw))) == 1);
}
?>
--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #6

P: n/a
Christian Fersch wrote:
Justin Koivisto wrote:
KhanyBoy wrote:
If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

OK, this does that, but it may not be a very elegant soltion. I
recently needed the same functionality for a project involving
oscommerce.

function ampersandFix($x){
$x=str_replace('&','&amp;',$x);
$pattern='`&amp;(#[0-9]{2,3}|aacute|acirc|acute|aelig|agrave|amp'.
'|aring|atilde|auml|brvbar|brkbar|ccedil|cedil|cen t'.
'|copy|curren|deg|divide|eacute|ecirc|egrave|eth|e uml'.
'|frac12|frac14|frac34|gt|iacute|icirc|iexcl|igrav e'.
'|iquest|iuml|laquo|lt|macr|hibar|micro|middot|nbs p|not'.
'|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otil de'.
'|ouml|para|plusmn|pound|quot|raquo|reg|sect|shy|s up1|sup2'.
'|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml'.
'|die|uuml|yacute|yen|yuml);`i';
$replace='&$1;';
return preg_reacple($pattern,$replace,$x);
}

A lot faster, but not as accurate:
$text = preg_replace('!&(?![#a-z0-9]{1,7};)!i','&amp',$text);

You could also use this method (it's a lookahead assertion) with
Justin's function, which will still be a lot faster than his hack ;)


Let's fix the hack then, shall we?

function ampersandFix($x){
$pattern='`&(?!(#[0-9]{2,3}|aacute|acirc|acute|aelig|agrave|amp'.
'|aring|atilde|auml|brvbar|brkbar|ccedil|cedil|cen t'.
'|copy|curren|deg|divide|eacute|ecirc|egrave|eth|e uml'.
'|frac12|frac14|frac34|gt|iacute|icirc|iexcl|igrav e'.
'|iquest|iuml|laquo|lt|macr|hibar|micro|middot|nbs p|not'.
'|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otil de'.
'|ouml|para|plusmn|pound|quot|raquo|reg|sect|shy|s up1|sup2'.
'|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml'.
'|die|uuml|yacute|yen|yuml);)`i';
return preg_replace($pattern,'&amp;',$x);
}

Now it's faster AND accurate. ;)

--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.
Jul 17 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.