In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondreams.org/" ADD_DATE="1091500674"
LAST_CHARSET="ISO-8859-1" ID="rdf:#$uiYyb3">Common Dreams</A>
I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondreams.org/">Common Dreams</A>
I almost succeed with this:
$lines[]=preg_replace("{(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3",
$line);
The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the ">":
$lines[]=preg_replace("{(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3", $line);
For some reason, this does not find a match. Since " ADD" is the same as
..*, I don't understand why I need the explicit " ADD".
How do I match without the explicit " ADD" 3 1951
"Red" wrote: In netscape bookmark files, there are lots of lines like this: <DT><A HREF="http://www.commondreams.org/" ADD_DATE="1091500674" LAST_CHARSET="ISO-8859-1" ID="rdf:#$uiYyb3">Common Dreams</A>
I want to eliminate the excess attributes and values to get this: <DT><A HREF="http://www.commondreams.org/">Common Dreams</A>
I almost succeed with this: $lines[]=preg_replace("{(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\1\3", $line);
The only problem is the explicit "ADD". The code only works is
there is an ADD_DATE attribute immediately after the url. I tried replacing
( ADD.*) with ( .*), which I thought would match everything up to the ">": $lines[]=preg_replace("{(<A HREF=\".*\")( .*)(>.*</A>)}","\1\3", $line);
For some reason, this does not find a match. Since " ADD" is the
same as ..*, I don’t understand why I need the explicit " ADD".
How do I match without the explicit " ADD"
I could not follow the code, but this should work
ADD_DATE="1091500674"
$changedlined = preg_replace("/ADD_DATE\=\"\d+\"/", ’’,
$originalline);
-- http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-regex-my...ict136508.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=455857
.oO(Red) In netscape bookmark files, there are lots of lines like this: <DT><A HREF="http://www.commondreams.org/" ADD_DATE="1091500674" LAST_CHARSET="ISO-8859-1" ID="rdf:#$uiYyb3">Common Dreams</A>
I want to eliminate the excess attributes and values to get this: <DT><A HREF="http://www.commondreams.org/">Common Dreams</A>
I almost succeed with this: $lines[]=preg_replace("{(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3", $line);
The only problem is the explicit "ADD". The code only works is there is an ADD_DATE attribute immediately after the url. I tried replacing ( ADD.*) with ( .*), which I thought would match everything up to the ">": $lines[]=preg_replace("{(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3", $line);
For some reason, this does not find a match. Since " ADD" is the same as .*, I don't understand why I need the explicit " ADD".
It's because of the default greediness of the quantifiers. The .* after
the HREF=\" in your second pattern is quite hungry and eats up every-
thing until the last " in the tag, including the ADD_DATE and everything
else. You can change this behaviour with the U-modifier, e.g.
$pattern = '#(<a href=".*").*(>.*</a>)#iU';
$replace = '$1$2';
$lines[] = preg_replace($pattern, $replace, $line);
Pattern Modifiers
<http://www.php.net/manual/en/pcre.pattern.modifiers.php>
HTH
Micha
Michael Fesser wrote: .oO(Red)
In netscape bookmark files, there are lots of lines like this: <DT><A HREF="http://www.commondreams.org/" ADD_DATE="1091500674" LAST_CHARSET="ISO-8859-1" ID="rdf:#$uiYyb3">Common Dreams</A>
I want to eliminate the excess attributes and values to get this: <DT><A HREF="http://www.commondreams.org/">Common Dreams</A>
I almost succeed with this: $lines[]=preg_replace("{(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3", $line);
The only problem is the explicit "ADD". The code only works is there is an ADD_DATE attribute immediately after the url. I tried replacing ( ADD.*) with ( .*), which I thought would match everything up to the ">": $lines[]=preg_replace("{(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3", $line);
For some reason, this does not find a match. Since " ADD" is the same as .*, I don't understand why I need the explicit " ADD".
It's because of the default greediness of the quantifiers. The .* after the HREF=\" in your second pattern is quite hungry and eats up every- thing until the last " in the tag, including the ADD_DATE and everything else. You can change this behaviour with the U-modifier, e.g.
$pattern = '#(<a href=".*").*(>.*</a>)#iU'; $replace = '$1$2'; $lines[] = preg_replace($pattern, $replace, $line);
Pattern Modifiers <http://www.php.net/manual/en/pcre.pattern.modifiers.php>
HTH Micha
What a handy modifier, thanks.
red This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: William Wisnieski |
last post by:
Hello Everyone:
I'm having a very strange problem occurring with my Access 2000 database. I
call it the "mystery record."
Here's the story:
I have a query by form that returns a record set...
|
by: Mark Shelor |
last post by:
I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical...
|
by: Tim Conner |
last post by:
Is there a way to write a faster function ?
public static bool IsNumber( char Value )
{
if (Regex.IsMatch( Value.ToString(), @"^+$" ))
{
return true;
}
else return false;
}
|
by: jojoba |
last post by:
Hi,
I hope this post is ok for this group.
Here's my deal:
I have two computers on my LAN at home.
One desktop. One laptop.
Both computers are wireless enabled (and wired enabled too).
I...
|
by: Extremest |
last post by:
I have a huge regex setup going on. If I don't do each one by itself
instead of all in one it won't work for. Also would like to know if
there is a faster way tried to use string.replace with all...
|
by: Extremest |
last post by:
I am using this regex.
static Regex paranthesis = new Regex("(\\d*/\\d*)",
RegexOptions.IgnoreCase);
it should find everything between parenthesis that have some numbers
onyl then a forward...
|
by: aspineux |
last post by:
My goal is to write a parser for these imaginary string from the SMTP
protocol, regarding RFC 821 and 1869.
I'm a little flexible with the BNF from these RFC :-)
Any comment ?
tests=
def...
|
by: morleyc |
last post by:
Hi, i would like to remove a number of characters from my string (\t
\r \n which are throughout the string), i know regex can do this but i
have no idea how. Any pointers much appreciated.
Chris
|
by: =?Utf-8?B?bWFnZ2ll?= |
last post by:
hi,
I need some help with a reg. expression. I have a comma delimited file with
quotes. Not every field has quotes, only some. This is a sample of my file:...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |