Hi,
this should test you guru's. I want a function that accepts text as an
argument and converts all & into & except where it is a html
character already such as , ", and of course &.
If there is already a php function for this I would like to know, but
if not, what is the GREP equivilent?
Thanks 6 2644
*** KhanyBoy wrote/escribió (10 Jun 2004 14:22:24 -0700): this should test you guru's. I want a function that accepts text as an argument and converts all & into & except where it is a html character already such as , ", and of course &.
I can't figure out how you manage to get such a garbled input. Anyway, I
guess a combination of html_entity_dec ode() and html_entities() should
help.
--
--
-- Álvaro G. Vicario - Burgos, Spain
--
KhanyBoy wrote: this should test you guru's. I want a function that accepts text as an argument and converts all & into & except where it is a html character already such as , ", and of course &.
If there is already a php function for this I would like to know, but if not, what is the GREP equivilent?
OK, this does that, but it may not be a very elegant soltion. I recently
needed the same functionality for a project involving oscommerce.
function ampersandFix($x ){
$x=str_replace( '&','&',$x) ;
$pattern='`& ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
'|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
'|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
'|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
'|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
'|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
'|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
'|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
'|die|uuml|yacu te|yen|yuml);`i ';
$replace='&$1;' ;
return preg_reacple($p attern,$replace ,$x);
}
--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.
Justin Koivisto wrote: KhanyBoy wrote: If there is already a php function for this I would like to know, but if not, what is the GREP equivilent?
OK, this does that, but it may not be a very elegant soltion. I recently needed the same functionality for a project involving oscommerce.
function ampersandFix($x ){ $x=str_replace( '&','&',$x) ; $pattern='`& ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'. '|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'. '|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'. '|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'. '|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'. '|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'. '|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'. '|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'. '|die|uuml|yacu te|yen|yuml);`i '; $replace='&$1;' ; return preg_reacple($p attern,$replace ,$x); }
A lot faster, but not as accurate:
$text = preg_replace('! &(?![#a-z0-9]{1,7};)!i','&am p',$text);
You could also use this method (it's a lookahead assertion) with
Justin's function, which will still be a lot faster than his hack ;)
Greetings Christian.
KhanyBoy wrote: this should test you guru's. I want a function that accepts text as an argument and converts all & into & except where it is a html character already such as , ", and of course &.
Why are entity references recognised in text?
--
Jock
Regarding this well-known quote, often attributed to KhanyBoy's famous "10
Jun 2004 14:22:24 -0700" speech: Hi,
this should test you guru's. I want a function that accepts text as an argument and converts all & into & except where it is a html character already such as , ", and of course &.
If there is already a php function for this I would like to know, but if not, what is the GREP equivilent?
Thanks
It might be a different direction, but here are some functions to determine
whether something is an HTML entity, using the built-in PHP entity
functions. It's sort of a recycled reply to an earlier question:
A simple (rough) test of the concept is online at: http://php.pixelsaredead.com/htmlentities.php
<?php
/*
How to determine whether a given string decodes to an HTML entity
*/
function contains_entiti es($raw)
{
// $raw can be a string of any length
$raw = trim($raw);
return (strlen(htmlent ities($raw)) > strlen($raw));
}
function is_entity_refer ence($raw)
{
// $raw should be a string with only the entity ref in it,
// in the form "&...;"
return (preg_match('/&.+;/', $raw)) &&
(strlen(html_en tity_decode(tri m($raw))) == 1);
}
?>
--
-- Rudy Fleminger
-- sp@mmers.and.ev il.ones.will.bo w-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Christian Fersch wrote: Justin Koivisto wrote:
KhanyBoy wrote:
If there is already a php function for this I would like to know, but if not, what is the GREP equivilent?
OK, this does that, but it may not be a very elegant soltion. I recently needed the same functionality for a project involving oscommerce.
function ampersandFix($x ){ $x=str_replace( '&','&',$x) ; $pattern='`& ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'. '|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'. '|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'. '|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'. '|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'. '|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'. '|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'. '|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'. '|die|uuml|yacu te|yen|yuml);`i '; $replace='&$1;' ; return preg_reacple($p attern,$replace ,$x); }
A lot faster, but not as accurate: $text = preg_replace('! &(?![#a-z0-9]{1,7};)!i','&am p',$text);
You could also use this method (it's a lookahead assertion) with Justin's function, which will still be a lot faster than his hack ;)
Let's fix the hack then, shall we?
function ampersandFix($x ){
$pattern='`&(?! (#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
'|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
'|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
'|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
'|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
'|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
'|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
'|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
'|die|uuml|yacu te|yen|yuml);)` i';
return preg_replace($p attern,'&', $x);
}
Now it's faster AND accurate. ;)
--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: sf |
last post by:
Just started thinking about learning python.
Is there any place where I can get some free examples, especially for
following kind of problem ( it must be trivial for those using python)
I have files A, and B each containing say 100,000 lines (each line=one
string without any space)
I want to do
|
by: John E. Jardine |
last post by:
Hi,
Problem:
Executing 's///' has a side effect on grep null string matching.
If line 62, the substitution, is executed the last two values returned by
grep and printed on lines 68, 69 are different than the values returned and
printed when line 62 is commented out. Line 62 shouldn't have any impact on
lines 67,68 & 69.
Environment:
|
by: David Isaac |
last post by:
What's the standard replacement for the obsolete grep module?
Thanks,
Alan Isaac
|
by: xavier vazquez |
last post by:
I have a problem with a program that does not working properly...when the program run is suppose to generate a cross word puzzle , when the outcome show the letter of the words overlap one intop of the other....how i can fix this
the program look like this
import java.util.ArrayList;
import java.util.Random;
|
by: xavier vazquez |
last post by:
have a problem with a program that does not working properly...when the program run is suppose to generate a cross word puzzle , when the outcome show the letter of the words overlap one intop of the other....how i can fix this this run the random words for the program
import javax.swing.JOptionPane;
import java.util.ArrayList;
import java.util.Random;
public class CrossWordPuzzleTester {
| |
by: oncue01 |
last post by:
Word Puzzle
Task
You are going to search M words in an N × N puzzle. The words may have
been placed in one of the four directions as from (i) left to right (E), (ii) right
to left (W), (iii) up to bottom (S), or (iv) bottom to up (N). The program
will print the starting place and the direction of each word.
Limitations
The number of words to be searched can be at most 100, the size of the
puzzle N can be minimum 5 maximum 20....
|
by: Anton Slesarev |
last post by:
I've read great paper about generators:
http://www.dabeaz.com/generators/index.html
Author say that it's easy to write analog of common linux tools such
as awk,grep etc. He say that performance could be even better.
But I have some problem with writing performance grep analog.
It's my script:
|
by: Henning_Thornblad |
last post by:
What can be the cause of the large difference between re.search and
grep?
This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re
row=""
for a in range(156000):
row+="a"
|
by: honey777 |
last post by:
Problem: 15 Puzzle
This is a common puzzle with a 4x4 playing space with 15 tiles, numbered 1 through 15. One "spot" is always left blank. Here is an example of the puzzle:
The goal is to get the tiles in order, 1 through 15, from left to right, top to bottom, by just sliding tiles into the empty square. In this configuration, the goal would be to get the 14 and 15 to switch places, without affecting any of the other squares.
Your...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |