473,791 Members | 2,816 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

grep puzzle

Hi,

this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

Thanks
Jul 17 '05 #1
6 2644
*** KhanyBoy wrote/escribió (10 Jun 2004 14:22:24 -0700):
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.


I can't figure out how you manage to get such a garbled input. Anyway, I
guess a combination of html_entity_dec ode() and html_entities() should
help.
--
--
-- Álvaro G. Vicario - Burgos, Spain
--
Jul 17 '05 #2
KhanyBoy wrote:
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?


OK, this does that, but it may not be a very elegant soltion. I recently
needed the same functionality for a project involving oscommerce.

function ampersandFix($x ){
$x=str_replace( '&','&',$x) ;
$pattern='`&amp ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
'|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
'|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
'|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
'|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
'|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
'|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
'|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
'|die|uuml|yacu te|yen|yuml);`i ';
$replace='&$1;' ;
return preg_reacple($p attern,$replace ,$x);
}
--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.
Jul 17 '05 #3
Justin Koivisto wrote:
KhanyBoy wrote:
If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?


OK, this does that, but it may not be a very elegant soltion. I recently
needed the same functionality for a project involving oscommerce.

function ampersandFix($x ){
$x=str_replace( '&','&',$x) ;
$pattern='`&amp ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
'|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
'|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
'|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
'|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
'|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
'|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
'|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
'|die|uuml|yacu te|yen|yuml);`i ';
$replace='&$1;' ;
return preg_reacple($p attern,$replace ,$x);
}


A lot faster, but not as accurate:
$text = preg_replace('! &(?![#a-z0-9]{1,7};)!i','&am p',$text);

You could also use this method (it's a lookahead assertion) with
Justin's function, which will still be a lot faster than his hack ;)

Greetings Christian.
Jul 17 '05 #4
KhanyBoy wrote:
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.


Why are entity references recognised in text?

--
Jock
Jul 17 '05 #5
Regarding this well-known quote, often attributed to KhanyBoy's famous "10
Jun 2004 14:22:24 -0700" speech:
Hi,

this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as  , ", and of course &.

If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

Thanks


It might be a different direction, but here are some functions to determine
whether something is an HTML entity, using the built-in PHP entity
functions. It's sort of a recycled reply to an earlier question:

A simple (rough) test of the concept is online at:
http://php.pixelsaredead.com/htmlentities.php

<?php
/*
How to determine whether a given string decodes to an HTML entity
*/

function contains_entiti es($raw)
{
// $raw can be a string of any length
$raw = trim($raw);
return (strlen(htmlent ities($raw)) > strlen($raw));
}

function is_entity_refer ence($raw)
{
// $raw should be a string with only the entity ref in it,
// in the form "&...;"

return (preg_match('/&.+;/', $raw)) &&
(strlen(html_en tity_decode(tri m($raw))) == 1);
}
?>
--
-- Rudy Fleminger
-- sp@mmers.and.ev il.ones.will.bo w-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #6
Christian Fersch wrote:
Justin Koivisto wrote:
KhanyBoy wrote:
If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?

OK, this does that, but it may not be a very elegant soltion. I
recently needed the same functionality for a project involving
oscommerce.

function ampersandFix($x ){
$x=str_replace( '&','&amp;',$x) ;
$pattern='`&amp ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
'|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
'|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
'|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
'|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
'|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
'|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
'|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
'|die|uuml|yacu te|yen|yuml);`i ';
$replace='&$1;' ;
return preg_reacple($p attern,$replace ,$x);
}

A lot faster, but not as accurate:
$text = preg_replace('! &(?![#a-z0-9]{1,7};)!i','&am p',$text);

You could also use this method (it's a lookahead assertion) with
Justin's function, which will still be a lot faster than his hack ;)


Let's fix the hack then, shall we?

function ampersandFix($x ){
$pattern='`&(?! (#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
'|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
'|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
'|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
'|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
'|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
'|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
'|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
'|die|uuml|yacu te|yen|yuml);)` i';
return preg_replace($p attern,'&amp;', $x);
}

Now it's faster AND accurate. ;)

--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.
Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
11796
by: sf | last post by:
Just started thinking about learning python. Is there any place where I can get some free examples, especially for following kind of problem ( it must be trivial for those using python) I have files A, and B each containing say 100,000 lines (each line=one string without any space) I want to do
2
2686
by: John E. Jardine | last post by:
Hi, Problem: Executing 's///' has a side effect on grep null string matching. If line 62, the substitution, is executed the last two values returned by grep and printed on lines 68, 69 are different than the values returned and printed when line 62 is commented out. Line 62 shouldn't have any impact on lines 67,68 & 69. Environment:
3
3292
by: David Isaac | last post by:
What's the standard replacement for the obsolete grep module? Thanks, Alan Isaac
1
13109
by: xavier vazquez | last post by:
I have a problem with a program that does not working properly...when the program run is suppose to generate a cross word puzzle , when the outcome show the letter of the words overlap one intop of the other....how i can fix this the program look like this import java.util.ArrayList; import java.util.Random;
0
2027
by: xavier vazquez | last post by:
have a problem with a program that does not working properly...when the program run is suppose to generate a cross word puzzle , when the outcome show the letter of the words overlap one intop of the other....how i can fix this this run the random words for the program import javax.swing.JOptionPane; import java.util.ArrayList; import java.util.Random; public class CrossWordPuzzleTester {
3
3209
by: oncue01 | last post by:
Word Puzzle Task You are going to search M words in an N × N puzzle. The words may have been placed in one of the four directions as from (i) left to right (E), (ii) right to left (W), (iii) up to bottom (S), or (iv) bottom to up (N). The program will print the starting place and the direction of each word. Limitations The number of words to be searched can be at most 100, the size of the puzzle N can be minimum 5 maximum 20....
13
10131
by: Anton Slesarev | last post by:
I've read great paper about generators: http://www.dabeaz.com/generators/index.html Author say that it's easy to write analog of common linux tools such as awk,grep etc. He say that performance could be even better. But I have some problem with writing performance grep analog. It's my script:
47
3451
by: Henning_Thornblad | last post by:
What can be the cause of the large difference between re.search and grep? This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000): row+="a"
4
20006
by: honey777 | last post by:
Problem: 15 Puzzle This is a common puzzle with a 4x4 playing space with 15 tiles, numbered 1 through 15. One "spot" is always left blank. Here is an example of the puzzle: The goal is to get the tiles in order, 1 through 15, from left to right, top to bottom, by just sliding tiles into the empty square. In this configuration, the goal would be to get the 14 and 15 to switch places, without affecting any of the other squares. Your...
0
9669
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9515
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10426
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10207
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
7537
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5430
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5558
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2913
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.