473,383 Members | 1,918 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Speeding up a function

I was hoping to parse a webpage and extract all link and image URLs
from it and enter the new ones into mySQL tables, below is my code to
do it, I've tried to optimise it as much as I can but it still takes
too long to execute (server timeouts on a server which I can not
control) I was wondering if there was some way to compile the code or
if anyone can spot something which could be better written.
Thanks in advance,
Martin

<?
$start=microtime_float();
$uid=$_GET['uid'];
$restriction= "dontstayin";
include("loadstuff.php"); //just contains a function which echos a tick
or cross
include("userpass.php"); //contains database access details.
mysql_connect($host,$user,$password);
@mysql_select_db("$database") or die(cross());
$result = mysql_query("SELECT * FROM mark_toparse WHERE uid='$uid'");
//mysql_close();
$contents = mysql_result($result,0,"fcontents");
$fullorig = mysql_result($result,0,"originalpath");
$origpath = substr($origpath,0,strrpos($fullorig,"/")+1);
$lines = explode(">",$contents);
for($i=0;$i<count($lines);$i++){
$imgsrc = stristr($lines[$i],"<img");
if($imgsrc!=false){
$imgsrc = str_replace("'","",$imgsrc);
$f = strpos($imgsrc,"\"",strpos($imgsrc,"src"));
$l = strpos($imgsrc,"\"",$f+1);
$url = substr($imgsrc,$f+1,$l-$f-1);
if(strncasecmp($url,"http:",5)!=0){
$url = $origpath . $url;
$url= str_replace(":/","://",str_replace("//","/",$url));
}
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$link = stristr($lines[$i],"href=");
if($link!=false){
$link = str_replace("'","",$link);
$f = strpos($link,"\"");
$l = strpos($link,"\"",$f+1);
$url= substr($link,$f+1,$l-$f-1);
if(strncasecmp($url,"http:",5)!=0){
if(strncasecmp($url,"mailto:",7)==0||strncasecmp($ url,"ftp:",4)==0||strncasecmp($url,"msnim:",6)==0) {
//ignore ftp, mailto and msn links
}else{
$url = $origpath . $url;
$url= str_replace(":/","://",str_replace("//","/",$url));
if(eregi("\.jp[eg2]{1,2}$",$url)){
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$resone = mysql_query("SELECT count(*) FROM mark_linktodl WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_linktodl VALUES('','$url')");
}
}
}
}elseif(eregi($restriction,$url)){
if(eregi("\.jp[eg2]{1,2}$",$url)){
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$resone = mysql_query("SELECT count(*) FROM mark_linktodl WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_linktodl VALUES('','$url')");
}
}
}
}
}

}
//mysql_query("DELETE FROM mark_toparse WHERE
originalpath='$fullorig'");
mysql_query("INSERT INTO mark_parsed
VALUES('','$fullorig','".md5($contents)."')");
mysql_close();
$finish=microtime_float();

if(strcmp($_GET['debug'],"t")==0){
$tim=$finish-$start;
include("error_image.php");
echo imagepng(errorimage("Analysis: $tim s"));
}else{
echo tick();
}

function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}

?>

Oct 27 '06 #1
3 1152
Rik
Cl*******@hotmail.com wrote:
I was hoping to parse a webpage and extract all link and image URLs
from it and enter the new ones into mySQL tables, below is my code to
do it, I've tried to optimise it as much as I can but it still takes
too long to execute (server timeouts on a server which I can not
control) I was wondering if there was some way to compile the code or
if anyone can spot something which could be better written.
Thanks in advance,
Martin

Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Oct 27 '06 #2
Rik wrote:
>
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Combining all the insert/replace operation into one statement would
help as well. Instead of doing a query immediate, store the links in
different arrays like this:

$images[$url] = true;

That collapses duplicate links in the page being parsed. Then it's just
a matter of looping through the arrays to build sql statements that
would create the records in one fell swoop.

Oct 27 '06 #3
Rik wrote:
>
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Combining all the insert/replace operation into one statement would
help as well. Instead of doing a query immediate, store the links in
different arrays like this:

$images[$url] = true;

That collapses duplicate links in the page being parsed. Then it's just
a matter of looping through the arrays to build sql statements that
would create the records in one fell swoop.

Oct 28 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Snyke | last post by:
Hi. I have a command line script which works really fine, the only problem is that it take *really* long for the first output to be printed on screen. Since I also get some HTTP headers I'm...
12
by: dvumani | last post by:
I have C code which computes the row sums of a matrix, divide each element of each row with the row sum and then compute the column sum of the resulting matrix. Is there a way I can speed up the...
9
by: mfyahya | last post by:
Hi, I'm new to databases :) I need help speeding up select queries on my data which are currently taking 4-5 seconds. I set up a single large table of coordinates data with an index on the fields...
2
by: Robert Wilkens | last post by:
Ok... This may be the wrong forum, but it's the first place I'm trying. I'm new to C# and just implemented the 3-tier Distributed application from Chapter 1 (the first walkthrough) in the...
2
by: OHM | last post by:
I was wondering about this topic and although I accept that different situations call for different solutions, but wondered are there any other solutions and whether has anyone carried out a...
11
by: Dan Sugalski | last post by:
Is there any good way to speed up SQL that uses like and has placeholders? Here's the scoop. I've got a system that uses a lot of pre-generated SQL with placeholders in it. At runtime these SQL...
2
by: salad | last post by:
This is a tip on how to speed up listboxes DRAMATICALLY. Persons that would benefit are those that are constantly updating the rowsource of a listbox/combobox in order to filter and sort the data...
4
by: jocknerd | last post by:
About 10 years ago, I wrote a C app that would read scores from football games and calculate rankings based on the outcome of the games. In fact, I still use this app. You can view my rankings at...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.