473,395 Members | 1,412 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

need help with a regs expression

Hi!
I use this to find all links in a page :

[:space:]*(href)[:space:]*=[:space:]*([^ >]+)

but recently i found out that it only works if links are setup this way: <a
href = "...

I would like to know if someone can help me find the proper expression which
would work with <a href = "... <a href= "... <a href="... and <a href ="...
its only a matter of spaces but i cant figure out how to make it work.

Thank you
Sep 14 '05 #1
2 1204
EXPRESSION : $exp_reg ="[ ]*href[ ]*=[ ]*([^>]+)";
------------------------------------------------------
EXEMPLES:
------------------------------------------------------
CODE :
$exp_reg ="[ ]*href[ ]*=[ ]*([^>]+)";
unset ($reg);
ereg($exp_reg, '<a href = "./emp/coucou.html">', $reg);
print_r($reg);
echo "<br>";
unset ($reg);
ereg($exp_reg, '<a href= "./emp/coucou.html">', $reg);
print_r($reg);
echo "<br>";
unset ($reg);
ereg($exp_reg, '<a href ="./emp/coucou.html">', $reg);
print_r($reg);
echo "<br>";
unset ($reg);
ereg($exp_reg, '<a href="./emp/coucou.html">', $reg);
print_r($reg);
echo "<br>";
unset ($reg);
------------------------------------------------------
RESULT :
Array
(
[0] => href = "./emp/coucou.html"
[1] => "./emp/coucou.html"
)
<br>Array
(
[0] => href= "./emp/coucou.html"
[1] => "./emp/coucou.html"
)
<br>Array
(
[0] => href ="./emp/coucou.html"
[1] => "./emp/coucou.html"
)
<br>Array
(
[0] => href="./emp/coucou.html"
[1] => "./emp/coucou.html"
)


------------------------------------------------------
CODE :

$html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>Index of /~romain/compack</TITLE>
</HEAD>
<BODY>
<H1>Index of /~romain/compack</H1>
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF = "EMP2/">EMP2/</A>
12-Sep-2005 11:41 -

<IMG SRC="/icons/unknown.gif" ALT="[ ]"> <A HREF
="EMP2/cargar_datos.php">cargar_datos.php</A> 13-Sep-2005 11:15
3k
<IMG SRC="/icons/unknown.gif" ALT="[ ]"> <A HREF=
"EMP1/concepto.php">concepto.php</A> 13-Sep-2005 10:46
12k
<IMG SRC="/icons/unknown.gif" ALT="[ ]"> <A HREF =
"conceptos.php">conceptos.php</A> 08-Sep-2005 10:44 3k';

preg_match_all("([ ]*href[ ]*=[ ]*([^>]+))", strtolower($html), $regs,
PREG_SET_ORDER);


print_r($regs);

------------------------------------------------------
RESULT :

Array
(
[0] => Array
(
[0] => href = "emp2/"
[1] => "emp2/"
)

[1] => Array
(
[0] => href ="emp2/cargar_datos.php"
[1] => "emp2/cargar_datos.php"
)

[2] => Array
(
[0] => href= "emp1/concepto.php"
[1] => "emp1/concepto.php"
)

[3] => Array
(
[0] => href = "conceptos.php"
[1] => "conceptos.php"
)

)

************************************************** *****
hope it can help.
JR.
************************************************** *****


Yannick Benoit a écrit :
Hi!
I use this to find all links in a page :

[:space:]*(href)[:space:]*=[:space:]*([^ >]+)

but recently i found out that it only works if links are setup this way: <a
href = "...

I would like to know if someone can help me find the proper expression which
would work with <a href = "... <a href= "... <a href="... and <a href ="...
its only a matter of spaces but i cant figure out how to make it work.

Thank you


Sep 14 '05 #2
ro**********@gmail.com wrote:
Yannick Benoit a écrit :
[:space:]*(href)[:space:]*=[:space:]*([^ >]+)

A named character class, itself inside a character class, has to be
wrapped in '[:' and ':]'. So here you would have /[[:space:]]/. The
Manual does not document this for PCREs.

What you have at the moment, in a POSIX regular expression, are
ordinary character classes. (You'd get an error if you tried that in
a PCRE.) Since they don't match US-ASCII spaces, the only way the
pattern could match is if the subject string doesn't have spaces at
those points.
[ ]*href[ ]*=[ ]*([^>]+)


A character class consisting of a single character means the same if
you remove its square brackets. The classes are superfluous.

/ */ and /[[:space:]]*/ don't necessarily have the same meaning.
The latter likely matches the usual whitespace characters, the former
only US-ASCII spaces.

--
Jock
Sep 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: |-|erc | last post by:
<?php // Get the names and values for vars sent by index.lib.php3 if (isset($HTTP_GET_VARS)) { while(list($name,$value) = each($HTTP_GET_VARS)) { $$name = $value; }; };
23
by: Adam | last post by:
I am coding a microkernel based off of Tanebaum's theroy. For Isis to be extensible, fast, and secure, it has been decided it will be a microkernel. Not in the old Mach sense of the word, but in...
5
by: rahul8143 | last post by:
hello, I want to know for what purpose union REGS and struct SREGS are used in windows programming? also how following code an determine that running OS is windows? in_regs.x.ax = 0x160A;...
4
by: Bruno Barros | last post by:
Hello, I don´t know if union REGS <dos.h> is part of this group, but i have a problem. At the moment i use turboc++lite and i don´t have reply when i execute the question code. Only a black...
5
by: kushmanr | last post by:
hi i am having to problems: interrupt decleration doesn`t compile: void _interrupt _far terminal_isr() ....... _dos_setvect(0xC,terminal_isr) : error C4226: nonstandard extension used :...
0
by: balajiv86 | last post by:
hi can anyone tell me the book which contains details abt inbuilt union regs in c
0
by: dhruba.bandopadhyay | last post by:
Am using Borland C++ 4.5 for the old dos.h APIs. It appears that newer versions of compilers stop support for the oldskool DOS routines. Am trying to convert/port an oldskool Pascal program that...
14
by: Yep | last post by:
Hi Guys, I have some confusion with pointers and implicitly specifying a size of memory to write. I will try to explain what I am trying to do a bit better. char *myaddress=(char *)...
80
by: pereges | last post by:
Hello, I have the following structure - typedef struct { double x, y, z; }vector; In certain places, I could avoid triplification of code by using an array instead of x, y, z. For eg:
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.