Yang Li Ke wrote:
Hey guys!
I want to get all pics (.jpg) from a url and I use this code but sometimes
with some urls it doesnt work :(
Anyone can check it out?
$text = implode("", file($url));
$text = eregi_replace("<!--([^-]|-[^-]|--[^>])*-->","", $text);
while (eregi("[:space:]*(img)[:space:]*=[:space:]*([^ >]+)", $text, $regs))
{
$regs[2] = ereg_replace("\"", "", $regs[2]);
$regs[2] = ereg_replace("'", "", $regs[2]);
$regs[2] = preg_replace("/(\s.+)/" , "" , $regs[2]);
if(eregi(".jpg|.jpeg|.jpe",$regs[2])){
echo $regs[2]."<br>";
}
$text = substr($text, strpos($text, $regs[1]) + strlen($regs[1]));
}
I'd use preg_match_all with something like this (untested):
"/<[ ]*img[^>]*src[ ]*=[ ]*(\"[^\"]+\.jp[e]?g\"|\'[^\']+\.jp[e]?g\'|[^
]+\.jp[e]?g)[^>]*>/i"
I can't guarantee it because it's untested and long, but it should get all .jpg
or .jpeg filenames in image tags, regardless of case, position of the src
attribute, type and/or lack of quotes. It won't pick it up if they've used tabs
or something within the < and >. Also, it'll include directory names, if they
exist, in the filename:
images/hello.jpg
thumbnails/world.jpeg
Oh, and it'll return the quotes, if they exist, so you might want to strip them
off before using the match.
And it won't pick up images used only by javascript (such as mouseovers).
I don't know what you'll do with the names, but if you're going to try to fetch
the images you should be wary of <base> tags.
For example:
<HTML>
<HEAD>
<BASE HREF="http://www.yoursite.com/">
</HEAD>
<BODY>
<IMG SRC="images/foo.jpg">
</BODY>
</HTML>
If the above file is located at
http://www.yoursite.com/subdirectory/ then you
might assume the url of the image is
http://www.yoursite.com/subdirectory/images/foo.jpg when it is actually at
http://www.yoursite.com/images/foo.jpg.
Regards,
Shawn
--
Shawn Wilson
sh***@glassgiant.com http://www.glassgiant.com