472,126 Members | 1,538 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,126 software developers and data experts.

Speeding up a function

I was hoping to parse a webpage and extract all link and image URLs
from it and enter the new ones into mySQL tables, below is my code to
do it, I've tried to optimise it as much as I can but it still takes
too long to execute (server timeouts on a server which I can not
control) I was wondering if there was some way to compile the code or
if anyone can spot something which could be better written.
Thanks in advance,
Martin

<?
$start=microtime_float();
$uid=$_GET['uid'];
$restriction= "dontstayin";
include("loadstuff.php"); //just contains a function which echos a tick
or cross
include("userpass.php"); //contains database access details.
mysql_connect($host,$user,$password);
@mysql_select_db("$database") or die(cross());
$result = mysql_query("SELECT * FROM mark_toparse WHERE uid='$uid'");
//mysql_close();
$contents = mysql_result($result,0,"fcontents");
$fullorig = mysql_result($result,0,"originalpath");
$origpath = substr($origpath,0,strrpos($fullorig,"/")+1);
$lines = explode(">",$contents);
for($i=0;$i<count($lines);$i++){
$imgsrc = stristr($lines[$i],"<img");
if($imgsrc!=false){
$imgsrc = str_replace("'","",$imgsrc);
$f = strpos($imgsrc,"\"",strpos($imgsrc,"src"));
$l = strpos($imgsrc,"\"",$f+1);
$url = substr($imgsrc,$f+1,$l-$f-1);
if(strncasecmp($url,"http:",5)!=0){
$url = $origpath . $url;
$url= str_replace(":/","://",str_replace("//","/",$url));
}
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$link = stristr($lines[$i],"href=");
if($link!=false){
$link = str_replace("'","",$link);
$f = strpos($link,"\"");
$l = strpos($link,"\"",$f+1);
$url= substr($link,$f+1,$l-$f-1);
if(strncasecmp($url,"http:",5)!=0){
if(strncasecmp($url,"mailto:",7)==0||strncasecmp($ url,"ftp:",4)==0||strncasecmp($url,"msnim:",6)==0) {
//ignore ftp, mailto and msn links
}else{
$url = $origpath . $url;
$url= str_replace(":/","://",str_replace("//","/",$url));
if(eregi("\.jp[eg2]{1,2}$",$url)){
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$resone = mysql_query("SELECT count(*) FROM mark_linktodl WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_linktodl VALUES('','$url')");
}
}
}
}elseif(eregi($restriction,$url)){
if(eregi("\.jp[eg2]{1,2}$",$url)){
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$resone = mysql_query("SELECT count(*) FROM mark_linktodl WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_linktodl VALUES('','$url')");
}
}
}
}
}

}
//mysql_query("DELETE FROM mark_toparse WHERE
originalpath='$fullorig'");
mysql_query("INSERT INTO mark_parsed
VALUES('','$fullorig','".md5($contents)."')");
mysql_close();
$finish=microtime_float();

if(strcmp($_GET['debug'],"t")==0){
$tim=$finish-$start;
include("error_image.php");
echo imagepng(errorimage("Analysis: $tim s"));
}else{
echo tick();
}

function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}

?>

Oct 27 '06 #1
3 1110
Rik
Cl*******@hotmail.com wrote:
I was hoping to parse a webpage and extract all link and image URLs
from it and enter the new ones into mySQL tables, below is my code to
do it, I've tried to optimise it as much as I can but it still takes
too long to execute (server timeouts on a server which I can not
control) I was wondering if there was some way to compile the code or
if anyone can spot something which could be better written.
Thanks in advance,
Martin

Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Oct 27 '06 #2
Rik wrote:
>
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Combining all the insert/replace operation into one statement would
help as well. Instead of doing a query immediate, store the links in
different arrays like this:

$images[$url] = true;

That collapses duplicate links in the page being parsed. Then it's just
a matter of looping through the arrays to build sql statements that
would create the records in one fell swoop.

Oct 27 '06 #3
Rik wrote:
>
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Combining all the insert/replace operation into one statement would
help as well. Instead of doing a query immediate, store the links in
different arrays like this:

$images[$url] = true;

That collapses duplicate links in the page being parsed. Then it's just
a matter of looping through the arrays to build sql statements that
would create the records in one fell swoop.

Oct 28 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Snyke | last post: by
12 posts views Thread by dvumani | last post: by
2 posts views Thread by Robert Wilkens | last post: by
2 posts views Thread by OHM | last post: by
11 posts views Thread by Dan Sugalski | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.