473,563 Members | 2,904 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

regex mystery

Red
In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondream s.org/" ADD_DATE="10915 00674"
LAST_CHARSET="I SO-8859-1" ID="rdf:#$uiYyb 3">Common Dreams</A>

I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondream s.org/">Common Dreams</A>

I almost succeed with this:
$lines[]=preg_replace(" {(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3" ,
$line);

The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the ">":
$lines[]=preg_replace(" {(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3" , $line);

For some reason, this does not find a match. Since " ADD" is the same as
..*, I don't understand why I need the explicit " ADD".

How do I match without the explicit " ADD"
Jul 17 '05 #1
3 1961
"Red" wrote:
In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondream s.org/"
ADD_DATE="10915 00674"
LAST_CHARSET="I SO-8859-1" ID="rdf:#$uiYyb 3">Common Dreams</A>

I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondream s.org/">Common
Dreams</A>

I almost succeed with this:
$lines[]=preg_replace(" {(<A HREF=\".*\")(
ADD.*)(>.*</A>)}","\1\3",
$line);

The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the
">":
$lines[]=preg_replace(" {(<A HREF=\".*\")(
.*)(>.*</A>)}","\1\3", $line);

For some reason, this does not find a match. Since " ADD" is the same as
..*, I don’t understand why I need the explicit " ADD".

How do I match without the explicit " ADD"


I could not follow the code, but this should work
ADD_DATE="10915 00674"

$changedlined = preg_replace("/ADD_DATE\=\"\d+ \"/", ’’,
$originalline);

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-regex-my...ict136508.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=455857
Jul 17 '05 #2
.oO(Red)
In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondream s.org/" ADD_DATE="10915 00674"
LAST_CHARSET=" ISO-8859-1" ID="rdf:#$uiYyb 3">Common Dreams</A>

I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondream s.org/">Common Dreams</A>

I almost succeed with this:
$lines[]=preg_replace(" {(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3" ,
$line);

The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the ">":
$lines[]=preg_replace(" {(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3" , $line);

For some reason, this does not find a match. Since " ADD" is the same as
.*, I don't understand why I need the explicit " ADD".


It's because of the default greediness of the quantifiers. The .* after
the HREF=\" in your second pattern is quite hungry and eats up every-
thing until the last " in the tag, including the ADD_DATE and everything
else. You can change this behaviour with the U-modifier, e.g.

$pattern = '#(<a href=".*").*(>. *</a>)#iU';
$replace = '$1$2';
$lines[] = preg_replace($p attern, $replace, $line);

Pattern Modifiers
<http://www.php.net/manual/en/pcre.pattern.mo difiers.php>

HTH
Micha
Jul 17 '05 #3
Red
Michael Fesser wrote:
.oO(Red)

In netscape bookmark files, there are lots of lines like this:
<DT><A HREF="http://www.commondream s.org/" ADD_DATE="10915 00674"
LAST_CHARSET= "ISO-8859-1" ID="rdf:#$uiYyb 3">Common Dreams</A>

I want to eliminate the excess attributes and values to get this:
<DT><A HREF="http://www.commondream s.org/">Common Dreams</A>

I almost succeed with this:
$lines[]=preg_replace(" {(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3" ,
$line);

The only problem is the explicit "ADD". The code only works is there is
an ADD_DATE attribute immediately after the url. I tried replacing (
ADD.*) with ( .*), which I thought would match everything up to the ">":
$lines[]=preg_replace(" {(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3" , $line);

For some reason, this does not find a match. Since " ADD" is the same as
.*, I don't understand why I need the explicit " ADD".

It's because of the default greediness of the quantifiers. The .* after
the HREF=\" in your second pattern is quite hungry and eats up every-
thing until the last " in the tag, including the ADD_DATE and everything
else. You can change this behaviour with the U-modifier, e.g.

$pattern = '#(<a href=".*").*(>. *</a>)#iU';
$replace = '$1$2';
$lines[] = preg_replace($p attern, $replace, $line);

Pattern Modifiers
<http://www.php.net/manual/en/pcre.pattern.mo difiers.php>

HTH
Micha

What a handy modifier, thanks.

red
Jul 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1756
by: William Wisnieski | last post by:
Hello Everyone: I'm having a very strange problem occurring with my Access 2000 database. I call it the "mystery record." Here's the story: I have a query by form that returns a record set in a datasheet. The user double clicks on a row in that datasheet and a main form (pop up) opens bound to a table with a continuous subform bound...
115
7514
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical transform function. When compiling under gcc on my big-endian PowerPC (Mac OS X), declaring this array as "static" DECREASES the transform throughput by...
9
4565
by: Tim Conner | last post by:
Is there a way to write a faster function ? public static bool IsNumber( char Value ) { if (Regex.IsMatch( Value.ToString(), @"^+$" )) { return true; } else return false; }
14
2760
by: jojoba | last post by:
Hi, I hope this post is ok for this group. Here's my deal: I have two computers on my LAN at home. One desktop. One laptop. Both computers are wireless enabled (and wired enabled too). I have running a fairly simple HTTP server (written in python) that i
6
2490
by: Extremest | last post by:
I have a huge regex setup going on. If I don't do each one by itself instead of all in one it won't work for. Also would like to know if there is a faster way tried to use string.replace with all the right parts in there in one big line and for some reason that did not work either. Here is my regex's. static Regex rar = new...
7
2571
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward slash then some numbers. For some reason I am not getting that. It won't work at all in 2.0
3
2692
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def RN(name, regex): """protect using () and give an optional name to a regex""" if name:
15
50174
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
4
1260
by: =?Utf-8?B?bWFnZ2ll?= | last post by:
hi, I need some help with a reg. expression. I have a comma delimited file with quotes. Not every field has quotes, only some. This is a sample of my file: 99,"01/01/2007","23,000",1,34,"henry",132,"45.00" I used some code from an article that I though would do what I needed, but it splits my amount fields(76,000 into two different fields...
0
7583
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8106
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
5484
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5213
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3643
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2082
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1200
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
924
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.