473,387 Members | 1,574 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

grep puzzle

Hi,

this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

Thanks
Jul 17 '05 #1
6 2626
*** KhanyBoy wrote/escribió (10 Jun 2004 14:22:24 -0700):
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.


I can't figure out how you manage to get such a garbled input. Anyway, I
guess a combination of html_entity_decode() and html_entities() should
help.
--
--
-- Álvaro G. Vicario - Burgos, Spain
--
Jul 17 '05 #2
KhanyBoy wrote:
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?


OK, this does that, but it may not be a very elegant soltion. I recently
needed the same functionality for a project involving oscommerce.

function ampersandFix($x){
$x=str_replace('&','&',$x);
$pattern='`&(#[0-9]{2,3}|aacute|acirc|acute|aelig|agrave|amp'.
'|aring|atilde|auml|brvbar|brkbar|ccedil|cedil|cen t'.
'|copy|curren|deg|divide|eacute|ecirc|egrave|eth|e uml'.
'|frac12|frac14|frac34|gt|iacute|icirc|iexcl|igrav e'.
'|iquest|iuml|laquo|lt|macr|hibar|micro|middot|nbs p|not'.
'|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otil de'.
'|ouml|para|plusmn|pound|quot|raquo|reg|sect|shy|s up1|sup2'.
'|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml'.
'|die|uuml|yacute|yen|yuml);`i';
$replace='&$1;';
return preg_reacple($pattern,$replace,$x);
}
--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.
Jul 17 '05 #3
Justin Koivisto wrote:
KhanyBoy wrote:
If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?


OK, this does that, but it may not be a very elegant soltion. I recently
needed the same functionality for a project involving oscommerce.

function ampersandFix($x){
$x=str_replace('&','&',$x);
$pattern='`&(#[0-9]{2,3}|aacute|acirc|acute|aelig|agrave|amp'.
'|aring|atilde|auml|brvbar|brkbar|ccedil|cedil|cen t'.
'|copy|curren|deg|divide|eacute|ecirc|egrave|eth|e uml'.
'|frac12|frac14|frac34|gt|iacute|icirc|iexcl|igrav e'.
'|iquest|iuml|laquo|lt|macr|hibar|micro|middot|nbs p|not'.
'|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otil de'.
'|ouml|para|plusmn|pound|quot|raquo|reg|sect|shy|s up1|sup2'.
'|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml'.
'|die|uuml|yacute|yen|yuml);`i';
$replace='&$1;';
return preg_reacple($pattern,$replace,$x);
}


A lot faster, but not as accurate:
$text = preg_replace('!&(?![#a-z0-9]{1,7};)!i','&amp',$text);

You could also use this method (it's a lookahead assertion) with
Justin's function, which will still be a lot faster than his hack ;)

Greetings Christian.
Jul 17 '05 #4
KhanyBoy wrote:
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.


Why are entity references recognised in text?

--
Jock
Jul 17 '05 #5
Regarding this well-known quote, often attributed to KhanyBoy's famous "10
Jun 2004 14:22:24 -0700" speech:
Hi,

this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

Thanks


It might be a different direction, but here are some functions to determine
whether something is an HTML entity, using the built-in PHP entity
functions. It's sort of a recycled reply to an earlier question:

A simple (rough) test of the concept is online at:
http://php.pixelsaredead.com/htmlentities.php

<?php
/*
How to determine whether a given string decodes to an HTML entity
*/

function contains_entities($raw)
{
// $raw can be a string of any length
$raw = trim($raw);
return (strlen(htmlentities($raw)) > strlen($raw));
}

function is_entity_reference($raw)
{
// $raw should be a string with only the entity ref in it,
// in the form "&...;"

return (preg_match('/&.+;/', $raw)) &&
(strlen(html_entity_decode(trim($raw))) == 1);
}
?>
--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #6
Christian Fersch wrote:
Justin Koivisto wrote:
KhanyBoy wrote:
If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

OK, this does that, but it may not be a very elegant soltion. I
recently needed the same functionality for a project involving
oscommerce.

function ampersandFix($x){
$x=str_replace('&','&amp;',$x);
$pattern='`&amp;(#[0-9]{2,3}|aacute|acirc|acute|aelig|agrave|amp'.
'|aring|atilde|auml|brvbar|brkbar|ccedil|cedil|cen t'.
'|copy|curren|deg|divide|eacute|ecirc|egrave|eth|e uml'.
'|frac12|frac14|frac34|gt|iacute|icirc|iexcl|igrav e'.
'|iquest|iuml|laquo|lt|macr|hibar|micro|middot|nbs p|not'.
'|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otil de'.
'|ouml|para|plusmn|pound|quot|raquo|reg|sect|shy|s up1|sup2'.
'|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml'.
'|die|uuml|yacute|yen|yuml);`i';
$replace='&$1;';
return preg_reacple($pattern,$replace,$x);
}

A lot faster, but not as accurate:
$text = preg_replace('!&(?![#a-z0-9]{1,7};)!i','&amp',$text);

You could also use this method (it's a lookahead assertion) with
Justin's function, which will still be a lot faster than his hack ;)


Let's fix the hack then, shall we?

function ampersandFix($x){
$pattern='`&(?!(#[0-9]{2,3}|aacute|acirc|acute|aelig|agrave|amp'.
'|aring|atilde|auml|brvbar|brkbar|ccedil|cedil|cen t'.
'|copy|curren|deg|divide|eacute|ecirc|egrave|eth|e uml'.
'|frac12|frac14|frac34|gt|iacute|icirc|iexcl|igrav e'.
'|iquest|iuml|laquo|lt|macr|hibar|micro|middot|nbs p|not'.
'|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otil de'.
'|ouml|para|plusmn|pound|quot|raquo|reg|sect|shy|s up1|sup2'.
'|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml'.
'|die|uuml|yacute|yen|yuml);)`i';
return preg_replace($pattern,'&amp;',$x);
}

Now it's faster AND accurate. ;)

--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.
Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: sf | last post by:
Just started thinking about learning python. Is there any place where I can get some free examples, especially for following kind of problem ( it must be trivial for those using python) I have...
2
by: John E. Jardine | last post by:
Hi, Problem: Executing 's///' has a side effect on grep null string matching. If line 62, the substitution, is executed the last two values returned by grep and printed on lines 68, 69 are...
3
by: David Isaac | last post by:
What's the standard replacement for the obsolete grep module? Thanks, Alan Isaac
1
by: xavier vazquez | last post by:
I have a problem with a program that does not working properly...when the program run is suppose to generate a cross word puzzle , when the outcome show the letter of the words overlap one intop of...
0
by: xavier vazquez | last post by:
have a problem with a program that does not working properly...when the program run is suppose to generate a cross word puzzle , when the outcome show the letter of the words overlap one intop of the...
3
by: oncue01 | last post by:
Word Puzzle Task You are going to search M words in an N × N puzzle. The words may have been placed in one of the four directions as from (i) left to right (E), (ii) right to left (W), (iii) up...
13
by: Anton Slesarev | last post by:
I've read great paper about generators: http://www.dabeaz.com/generators/index.html Author say that it's easy to write analog of common linux tools such as awk,grep etc. He say that performance...
47
by: Henning_Thornblad | last post by:
What can be the cause of the large difference between re.search and grep? This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000):...
4
by: honey777 | last post by:
Problem: 15 Puzzle This is a common puzzle with a 4x4 playing space with 15 tiles, numbered 1 through 15. One "spot" is always left blank. Here is an example of the puzzle: The goal is to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.