473,406 Members | 2,745 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

regex mystery

Red
In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondreams.org/" ADD_DATE="1091500674"
LAST_CHARSET="ISO-8859-1" ID="rdf:#$uiYyb3">Common Dreams</A>

I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondreams.org/">Common Dreams</A>

I almost succeed with this:
$lines[]=preg_replace("{(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3",
$line);

The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the ">":
$lines[]=preg_replace("{(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3", $line);

For some reason, this does not find a match. Since " ADD" is the same as
..*, I don't understand why I need the explicit " ADD".

How do I match without the explicit " ADD"
Jul 17 '05 #1
3 1951
"Red" wrote:
In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondreams.org/"
ADD_DATE="1091500674"
LAST_CHARSET="ISO-8859-1" ID="rdf:#$uiYyb3">Common Dreams</A>

I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondreams.org/">Common
Dreams</A>

I almost succeed with this:
$lines[]=preg_replace("{(<A HREF=\".*\")(
ADD.*)(>.*</A>)}","\1\3",
$line);

The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the
">":
$lines[]=preg_replace("{(<A HREF=\".*\")(
.*)(>.*</A>)}","\1\3", $line);

For some reason, this does not find a match. Since " ADD" is the same as
..*, I don’t understand why I need the explicit " ADD".

How do I match without the explicit " ADD"


I could not follow the code, but this should work
ADD_DATE="1091500674"

$changedlined = preg_replace("/ADD_DATE\=\"\d+\"/", ’’,
$originalline);

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-regex-my...ict136508.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=455857
Jul 17 '05 #2
.oO(Red)
In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondreams.org/" ADD_DATE="1091500674"
LAST_CHARSET="ISO-8859-1" ID="rdf:#$uiYyb3">Common Dreams</A>

I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondreams.org/">Common Dreams</A>

I almost succeed with this:
$lines[]=preg_replace("{(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3",
$line);

The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the ">":
$lines[]=preg_replace("{(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3", $line);

For some reason, this does not find a match. Since " ADD" is the same as
.*, I don't understand why I need the explicit " ADD".


It's because of the default greediness of the quantifiers. The .* after
the HREF=\" in your second pattern is quite hungry and eats up every-
thing until the last " in the tag, including the ADD_DATE and everything
else. You can change this behaviour with the U-modifier, e.g.

$pattern = '#(<a href=".*").*(>.*</a>)#iU';
$replace = '$1$2';
$lines[] = preg_replace($pattern, $replace, $line);

Pattern Modifiers
<http://www.php.net/manual/en/pcre.pattern.modifiers.php>

HTH
Micha
Jul 17 '05 #3
Red
Michael Fesser wrote:
.oO(Red)

In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondreams.org/" ADD_DATE="1091500674"
LAST_CHARSET="ISO-8859-1" ID="rdf:#$uiYyb3">Common Dreams</A>

I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondreams.org/">Common Dreams</A>

I almost succeed with this:
$lines[]=preg_replace("{(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3",
$line);

The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the ">":
$lines[]=preg_replace("{(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3", $line);

For some reason, this does not find a match. Since " ADD" is the same as
.*, I don't understand why I need the explicit " ADD".

It's because of the default greediness of the quantifiers. The .* after
the HREF=\" in your second pattern is quite hungry and eats up every-
thing until the last " in the tag, including the ADD_DATE and everything
else. You can change this behaviour with the U-modifier, e.g.

$pattern = '#(<a href=".*").*(>.*</a>)#iU';
$replace = '$1$2';
$lines[] = preg_replace($pattern, $replace, $line);

Pattern Modifiers
<http://www.php.net/manual/en/pcre.pattern.modifiers.php>

HTH
Micha

What a handy modifier, thanks.

red
Jul 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: William Wisnieski | last post by:
Hello Everyone: I'm having a very strange problem occurring with my Access 2000 database. I call it the "mystery record." Here's the story: I have a query by form that returns a record set...
115
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical...
9
by: Tim Conner | last post by:
Is there a way to write a faster function ? public static bool IsNumber( char Value ) { if (Regex.IsMatch( Value.ToString(), @"^+$" )) { return true; } else return false; }
14
by: jojoba | last post by:
Hi, I hope this post is ok for this group. Here's my deal: I have two computers on my LAN at home. One desktop. One laptop. Both computers are wireless enabled (and wired enabled too). I...
6
by: Extremest | last post by:
I have a huge regex setup going on. If I don't do each one by itself instead of all in one it won't work for. Also would like to know if there is a faster way tried to use string.replace with all...
7
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...
3
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...
15
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
4
by: =?Utf-8?B?bWFnZ2ll?= | last post by:
hi, I need some help with a reg. expression. I have a comma delimited file with quotes. Not every field has quotes, only some. This is a sample of my file:...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.