473,404 Members | 2,137 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

Regular Expression to extract all .JPG and .PNG URL's

Hello,
I'm bad at regular expressions. Would somebody help me:
I need to extract all URL to .jpg and .png pictures from a string
containing an HTML file (DOM wouldn't work well in what I need).

I've tried:

preg_match_all("/.jpg$|.png$/", $htmlfile, $Matches);
foreach ($Matches as $match)
{
echo $match."<br>";
}

without much success. Anybody can correct this to make it work?

Oct 12 '07 #1
20 14572

<ko*******@gmail.comwrote in message
news:11**********************@i13g2000prf.googlegr oups.com...
Hello,
I'm bad at regular expressions. Would somebody help me:
I need to extract all URL to .jpg and .png pictures from a string
containing an HTML file (DOM wouldn't work well in what I need).

I've tried:

preg_match_all("/.jpg$|.png$/", $htmlfile, $Matches);
foreach ($Matches as $match)
{
echo $match."<br>";
}
\.(jpe?g|png)\b+$?

that should cure what ailes you. this is free handed, but after eyeballing
regex for years, i'm pretty sure it will work 'outta the box'.

cheers
Oct 12 '07 #2
On 12 Oct, 16:12, kosano...@gmail.com wrote:
Hello,
I'm bad at regular expressions. Would somebody help me:
I need to extract all URL to .jpg and .png pictures from a string
containing an HTML file (DOM wouldn't work well in what I need).

I've tried:

preg_match_all("/.jpg$|.png$/", $htmlfile, $Matches);
foreach ($Matches as $match)
{
echo $match."<br>";
}

without much success. Anybody can correct this to make it work?
Try:
preg_match_all("/.*(jpg|png)$/", $htmlfile, $Matches);

Oct 12 '07 #3

"Steve" <no****@example.comwrote in message
news:Zn*************@newsfe06.lga...
>
<ko*******@gmail.comwrote in message
news:11**********************@i13g2000prf.googlegr oups.com...
>Hello,
I'm bad at regular expressions. Would somebody help me:
I need to extract all URL to .jpg and .png pictures from a string
containing an HTML file (DOM wouldn't work well in what I need).

I've tried:

preg_match_all("/.jpg$|.png$/", $htmlfile, $Matches);
foreach ($Matches as $match)
{
echo $match."<br>";
}

\.(jpe?g|png)\b+$?
sorry...if you're screen scaping, then the image may be in a src tag
attribute and will no doubt be quoted or tic'd. the above will only handle
unquoted/tic'd where the src value (this image link) is not ended with the
closing tag. all that said, see if you can tell how the changes below meet
the constraints/problems not covered by the former...

=(["'])?(([^\.]*\.)*(jpe?|pn)g)\1[^>]*?>

again, i haven't tested it...but the raw image (including the path, i.e.
http://www.example.com/images/image.png) should be the second match captured
by preg_match_all. to make sure of that, do this:

preg_match_all($pattern, $search, $matches);
echo '<pre>' . print_r($matches, true) . '</pre>';

hth,

me
Oct 12 '07 #4

"Captain Paralytic" <pa**********@yahoo.comwrote in message
news:11**********************@k35g2000prh.googlegr oups.com...
On 12 Oct, 16:12, kosano...@gmail.com wrote:
>Hello,
I'm bad at regular expressions. Would somebody help me:
I need to extract all URL to .jpg and .png pictures from a string
containing an HTML file (DOM wouldn't work well in what I need).

I've tried:

preg_match_all("/.jpg$|.png$/", $htmlfile, $Matches);
foreach ($Matches as $match)
{
echo $match."<br>";
}

without much success. Anybody can correct this to make it work?

Try:
preg_match_all("/.*(jpg|png)$/", $htmlfile, $Matches);
nah...that would mean:

ghijpgklmno

or

mnopngqrst

or any text having those letters within it would be captured. however,
neither his nor yours work unless the string ends at the end of the line
(eof or \r or \n). both don't escape the dot...so that means 'any character
followed by jpg or png'. more marginal success would be had just doing,
"/\.jpg|\.png/", but even that doesn't cut it...since either jpg or png can
be followed by anything and still be captured. not to mention we've left out
jpeg's.

make sense?
Oct 12 '07 #5
Everybody thank you very much for your help. However I coulndn't make
work any of the examples. The output I get all the time is "Array" and
that's it.

Steve, image paths are not always in the src value but they are always
quoted.
So I thought to make it find .jpg or .png and grab the string on the
left form the " sign. How would that be in reg exp?

Oct 12 '07 #6
In our last episode,
<11**********************@v29g2000prd.googlegroups .com>,
the lovely and talented ko*******@gmail.com
broadcast on comp.lang.php:
Everybody thank you very much for your help. However I coulndn't make
work any of the examples. The output I get all the time is "Array" and
that's it.
Did you even read the manual to find out what preg_match_all actually
does?

Steve, image paths are not always in the src value but they are always
quoted.
So I thought to make it find .jpg or .png and grab the string on the
left form the " sign. How would that be in reg exp?
--
Lars Eighner <http://larseighner.com/ <http://myspace.com/larseighner>
Countdown: 465 days to go.
What do you do when you're debranded?
Oct 12 '07 #7
On 12 Oct, 17:02, kosano...@gmail.com wrote:
Everybody thank you very much for your help. However I coulndn't make
work any of the examples. The output I get all the time is "Array" and
that's it.

Steve, image paths are not always in the src value but they are always
quoted.
So I thought to make it find .jpg or .png and grab the string on the
left form the " sign. How would that be in reg exp?
Try print_r($Matches);
instead of the foreach

Oct 12 '07 #8

<ko*******@gmail.comwrote in message
news:11**********************@v29g2000prd.googlegr oups.com...
Everybody thank you very much for your help. However I coulndn't make
work any of the examples. The output I get all the time is "Array" and
that's it.

Steve, image paths are not always in the src value but they are always
quoted.
So I thought to make it find .jpg or .png and grab the string on the
left form the " sign. How would that be in reg exp?
yes...just take the equal sign out of the pattern i gave AND the trailing >.
now, your pattern looks like this:

(["'])?(([^\.]*\.)*(jpe?|pn)g)\1

that will catch a jpg, jpeg, or png where it appears inbetween a set of tics
(') or quotes (").

REMEMBER !!! to debug this and find out where your match will apear within
$matches, do this:

echo '<pre>' . print_r($matches, true) . '</pre>';

please post your results here, so we can either continue to help or, see
that it worked for you.

thx,

me
Oct 12 '07 #9
<?php

$htmlc=" \"http://example.com/file1.jpg\" kjkjskfj \"http://
blabla.com/image2.png\" dsgdg";
preg_match_all("/.*(jpg|png)$/", $htmlc, $matches);
echo '<pre>' . print_r($matches, true) . '</pre>';

?>

outputs:
Array
(
[0] =Array
(
)

[1] =Array
(
)

)
I need to extract:
http://example.com/file1.jpg
http://blabla.com/image2.png

Oct 12 '07 #10

<ko*******@gmail.comwrote in message
news:11**********************@q3g2000prf.googlegro ups.com...
<?php

$htmlc=" \"http://example.com/file1.jpg\" kjkjskfj \"http://
blabla.com/image2.png\" dsgdg";
preg_match_all("/.*(jpg|png)$/", $htmlc, $matches);
echo '<pre>' . print_r($matches, true) . '</pre>';

?>

outputs:
Array
(
[0] =Array
(
)

[1] =Array
(
)

)
look up preg_match_all in the docs or at php.net. regarless of matching or
not having any results that do match, it will always return an array as a
result. typically, array[0] will have your exact matches (as another array).
array[1 - n] contains sub-matches...or partial matches. so,

foreach ($matches[0] as $match)
{
echo '<pre>' . $match . '</pre>';
}

will most likely be what you need. make sense?

this is all aside from the fact that the pattern used above doesn't resemble
anything you've described as your goal...and i'm not even preg. ;^)
Oct 12 '07 #11
Steve, the arrays are empty. I don't care if it's arrays or not, I
need data.

On Oct 12, 7:00 pm, "Steve" <no....@example.comwrote:
<kosano...@gmail.comwrote in message

news:11**********************@q3g2000prf.googlegro ups.com...
<?php
$htmlc=" \"http://example.com/file1.jpg\" kjkjskfj \"http://
blabla.com/image2.png\" dsgdg";
preg_match_all("/.*(jpg|png)$/", $htmlc, $matches);
echo '<pre>' . print_r($matches, true) . '</pre>';
?>
outputs:
Array
(
[0] =Array
(
)
[1] =Array
(
)
)

look up preg_match_all in the docs or at php.net. regarless of matching or
not having any results that do match, it will always return an array as a
result. typically, array[0] will have your exact matches (as another array).
array[1 - n] contains sub-matches...or partial matches. so,

foreach ($matches[0] as $match)
{
echo '<pre>' . $match . '</pre>';

}

will most likely be what you need. make sense?

this is all aside from the fact that the pattern used above doesn't resemble
anything you've described as your goal...and i'm not even preg. ;^)

Oct 12 '07 #12

<ko*******@gmail.comwrote in message
news:11**********************@e34g2000pro.googlegr oups.com...
Steve, the arrays are empty. I don't care if it's arrays or not, I
need data.
pardon my frustration, but, no shit! however, if you don't understand the
output of preg_match_all, how then, will you get the data?

let me turn you into less than a thinker since you apparently need
spoon-feeding. copy and paste the following...then quit wasting everyone's
time.

$html = '"http://www.example.com/file1.jpg" kjkjskfj ';
$html .= '"http://www.example.com/fil1.jpeg" kjkjskfj';
$html .= '"http://www.example.com/fil1.png" kjkjskfj';
$pattern = '/(["\'])?(([^\.]*\.)*?(jpe?|pn)g)\1/';
preg_match_all($pattern, $html, $matches);
$images = $matches[2]; // well holy mother of christ! right where i
guessed!!!
foreach ($images as $image)
{
echo '<pre>' . $image . '</pre>';
}

now, either go read the manual and do it all yourself (preferable), be
polite when consuming someone else's time...especially when they're tying to
help you, or foad!

either way, you man run along now!

Oct 12 '07 #13
Steve thank you very much for your time.

On Oct 12, 7:51 pm, "Steve" <no....@example.comwrote:
<kosano...@gmail.comwrote in message

news:11**********************@e34g2000pro.googlegr oups.com...
Steve, the arrays are empty. I don't care if it's arrays or not, I
need data.

pardon my frustration, but, no shit! however, if you don't understand the
output of preg_match_all, how then, will you get the data?

let me turn you into less than a thinker since you apparently need
spoon-feeding. copy and paste the following...then quit wasting everyone's
time.

$html = '"http://www.example.com/file1.jpg" kjkjskfj ';
$html .= '"http://www.example.com/fil1.jpeg" kjkjskfj';
$html .= '"http://www.example.com/fil1.png" kjkjskfj';
$pattern = '/(["\'])?(([^\.]*\.)*?(jpe?|pn)g)\1/';
preg_match_all($pattern, $html, $matches);
$images = $matches[2]; // well holy mother of christ! right where i
guessed!!!
foreach ($images as $image)
{
echo '<pre>' . $image . '</pre>';

}

now, either go read the manual and do it all yourself (preferable), be
polite when consuming someone else's time...especially when they're tying to
help you, or foad!

either way, you man run along now!

Oct 12 '07 #14
Steve
for this: (["'])?(([^\.]*\.)*(jpe?|pn)g)\1
I get: Unknown modifier

Oct 12 '07 #15
Steve
for this: (["'])?(([^\.]*\.)*(jpe?|pn)g)\1
I get: Unknown modifier

Oct 12 '07 #16

<ko*******@gmail.comwrote in message
news:11**********************@v29g2000prd.googlegr oups.com...
Steve
for this: (["'])?(([^\.]*\.)*(jpe?|pn)g)\1
I get: Unknown modifier

because that is the core of the pattern. you have to encase it:

'/<pattern>/'

make sense? just implement the copy/paste version i sent in the other post.
Oct 12 '07 #17
..oO(Steve)
><ko*******@gmail.comwrote in message
news:11**********************@e34g2000pro.googleg roups.com...
>Steve, the arrays are empty. I don't care if it's arrays or not, I
need data.

pardon my frustration, but, no shit! however, if you don't understand the
output of preg_match_all, how then, will you get the data?
Calm down. In his example he used a pattern with a '$' at the end, which
can't work in this case. That's why he got an empty result array.

Micha
Oct 12 '07 #18

<ko*******@gmail.comwrote in message
news:11**********************@i13g2000prf.googlegr oups.com...
Steve
for this: (["'])?(([^\.]*\.)*(jpe?|pn)g)\1
I get: Unknown modifier
what is up with your news reader!

you've posted this twice now. i gave you working code three posts ago, to
which you said thank you for your time. i cannot fathom that you're still
trying to work out the half-coded stuff of earlier threads when i gave you
the working code already! that's why i have to close my eyes and ignore the
fact it isn't latency in usenet message arrival (based on the time stamp),
and blame your new reader.

you have a very poor new sreader. lol (noticing it's google).
Oct 12 '07 #19

"Michael Fesser" <ne*****@gmx.dewrote in message
news:op********************************@4ax.com...
.oO(Steve)
>><ko*******@gmail.comwrote in message
news:11**********************@e34g2000pro.google groups.com...
>>Steve, the arrays are empty. I don't care if it's arrays or not, I
need data.

pardon my frustration, but, no shit! however, if you don't understand the
output of preg_match_all, how then, will you get the data?

Calm down. In his example he used a pattern with a '$' at the end, which
can't work in this case. That's why he got an empty result array.
i know that, and in that same thread, i told him that. the problem is that
he posts outside of threads and it becomes very difficult to see which
examples, of the 3 sets, he is referring to. this is yet another reason to
avoid using google groups as an interface into usenet.

btw, even in that post, i was calm. nothing that goes on in usenet is worth
getting riled about. ;^)
Oct 12 '07 #20
Steve wrote:
>preg_match_all("/.*(jpg|png)$/", $htmlfile, $Matches);

nah...that would mean:

ghijpgklmno

or

mnopngqrst

or any text having those letters within it would be captured. however,
neither his nor yours work unless the string ends at the end of the
line (eof or \r or \n). both don't escape the dot...so that means
'any character followed by jpg or png'. more marginal success would
be had just doing, "/\.jpg|\.png/", but even that doesn't cut
it...since either jpg or png can be followed by anything and still be
captured. not to mention we've left out jpeg's.

make sense?
Well it seems to be a bit of a contradiction. My one was intended to only
find png or jpg at the end of the line as that was what the OP had intimated
was required.

Now you can hardly say that it would match ghijpgklmno or mnopngqrst at the
same time as saying that it'll only match with png or jpg at the end of the
line!

Oct 13 '07 #21

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Michael McGarry | last post by:
Hi, I am horrible with Regular Expressions, can anyone recommend a book on it? Also I am trying to parse the following string to extract the number after load average. ".... load average:...
1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
3
by: EFP | last post by:
Can anyone help me with a simple regular expression problem. All that I want to do is take a list of known data and extract a particular section of the string to form a new list. Here is my...
4
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
3
by: Gianluca | last post by:
Hi, I'm using regular expressions to extract some information from my vb.net source code files. I have something like this: 1: '<class name="xyz" description="xxxxxx"/> 2: Class xyz ......
3
by: jarod1701 | last post by:
Hi, I'm currently trying to create a regular expression that can extract certain elements from a url. The url will be of the following form: http://user:pass@www.sitename.com I want a...
4
by: Kristian | last post by:
I have a program which recives a string with an address. The string has no spesific format and I would like to extract the entrance character. some rules for the regular expression: one char,...
3
by: ksr | last post by:
Hi, I am looking for a regular expression that would extract UNC paths from a given string and place that inside a href. Currently the expression fails if there is a space in the path.. eg....
9
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use...
0
by: altavim | last post by:
Usually when you make regular expression to extract text you are starting from simple expression. When you got to know target text, you are extending your expression. Subsequently very hard to ready...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.