473,569 Members | 2,628 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Speeding up a function

I was hoping to parse a webpage and extract all link and image URLs
from it and enter the new ones into mySQL tables, below is my code to
do it, I've tried to optimise it as much as I can but it still takes
too long to execute (server timeouts on a server which I can not
control) I was wondering if there was some way to compile the code or
if anyone can spot something which could be better written.
Thanks in advance,
Martin

<?
$start=microtim e_float();
$uid=$_GET['uid'];
$restriction= "dontstayin ";
include("loadst uff.php"); //just contains a function which echos a tick
or cross
include("userpa ss.php"); //contains database access details.
mysql_connect($ host,$user,$pas sword);
@mysql_select_d b("$database" ) or die(cross());
$result = mysql_query("SE LECT * FROM mark_toparse WHERE uid='$uid'");
//mysql_close();
$contents = mysql_result($r esult,0,"fconte nts");
$fullorig = mysql_result($r esult,0,"origin alpath");
$origpath = substr($origpat h,0,strrpos($fu llorig,"/")+1);
$lines = explode(">",$co ntents);
for($i=0;$i<cou nt($lines);$i++ ){
$imgsrc = stristr($lines[$i],"<img");
if($imgsrc!=fal se){
$imgsrc = str_replace("'" ,"",$imgsrc) ;
$f = strpos($imgsrc, "\"",strpos($im gsrc,"src"));
$l = strpos($imgsrc, "\"",$f+1);
$url = substr($imgsrc, $f+1,$l-$f-1);
if(strncasecmp( $url,"http:",5) !=0){
$url = $origpath . $url;
$url= str_replace(":/","://",str_repla ce("//","/",$url));
}
$resone = mysql_query("SE LECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_ar ray($resone,MYS QL_NUM));
if($a[0]==0){
mysql_query("IN SERT INTO mark_images VALUES('','$url ')");
}
}else{
$link = stristr($lines[$i],"href=");
if($link!=false ){
$link = str_replace("'" ,"",$link);
$f = strpos($link,"\ "");
$l = strpos($link,"\ "",$f+1);
$url= substr($link,$f +1,$l-$f-1);
if(strncasecmp( $url,"http:",5) !=0){
if(strncasecmp( $url,"mailto:", 7)==0||strncase cmp($url,"ftp:" ,4)==0||strncas ecmp($url,"msni m:",6)==0){
//ignore ftp, mailto and msn links
}else{
$url = $origpath . $url;
$url= str_replace(":/","://",str_repla ce("//","/",$url));
if(eregi("\.jp[eg2]{1,2}$",$url)){
$resone = mysql_query("SE LECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_ar ray($resone,MYS QL_NUM));
if($a[0]==0){
mysql_query("IN SERT INTO mark_images VALUES('','$url ')");
}
}else{
$resone = mysql_query("SE LECT count(*) FROM mark_linktodl WHERE
url='$url'");
$a = (mysql_fetch_ar ray($resone,MYS QL_NUM));
if($a[0]==0){
mysql_query("IN SERT INTO mark_linktodl VALUES('','$url ')");
}
}
}
}elseif(eregi($ restriction,$ur l)){
if(eregi("\.jp[eg2]{1,2}$",$url)){
$resone = mysql_query("SE LECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_ar ray($resone,MYS QL_NUM));
if($a[0]==0){
mysql_query("IN SERT INTO mark_images VALUES('','$url ')");
}
}else{
$resone = mysql_query("SE LECT count(*) FROM mark_linktodl WHERE
url='$url'");
$a = (mysql_fetch_ar ray($resone,MYS QL_NUM));
if($a[0]==0){
mysql_query("IN SERT INTO mark_linktodl VALUES('','$url ')");
}
}
}
}
}

}
//mysql_query("DE LETE FROM mark_toparse WHERE
originalpath='$ fullorig'");
mysql_query("IN SERT INTO mark_parsed
VALUES('','$ful lorig','".md5($ contents)."')") ;
mysql_close();
$finish=microti me_float();

if(strcmp($_GET['debug'],"t")==0){
$tim=$finish-$start;
include("error_ image.php");
echo imagepng(errori mage("Analysis: $tim s"));
}else{
echo tick();
}

function microtime_float ()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}

?>

Oct 27 '06 #1
3 1158
Rik
Cl*******@hotma il.com wrote:
I was hoping to parse a webpage and extract all link and image URLs
from it and enter the new ones into mySQL tables, below is my code to
do it, I've tried to optimise it as much as I can but it still takes
too long to execute (server timeouts on a server which I can not
control) I was wondering if there was some way to compile the code or
if anyone can spot something which could be better written.
Thanks in advance,
Martin

Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Oct 27 '06 #2
Rik wrote:
>
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Combining all the insert/replace operation into one statement would
help as well. Instead of doing a query immediate, store the links in
different arrays like this:

$images[$url] = true;

That collapses duplicate links in the page being parsed. Then it's just
a matter of looping through the arrays to build sql statements that
would create the records in one fell swoop.

Oct 27 '06 #3
Rik wrote:
>
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.

1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Combining all the insert/replace operation into one statement would
help as well. Instead of doing a query immediate, store the links in
different arrays like this:

$images[$url] = true;

That collapses duplicate links in the page being parsed. Then it's just
a matter of looping through the arrays to build sql statements that
would create the records in one fell swoop.

Oct 28 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2702
by: Snyke | last post by:
Hi. I have a command line script which works really fine, the only problem is that it take *really* long for the first output to be printed on screen. Since I also get some HTTP headers I'm suspecting that some sort of output buffering is used. How can I tell PHP to flush the buffer automatically (without using flush(); after every print or...
12
2202
by: dvumani | last post by:
I have C code which computes the row sums of a matrix, divide each element of each row with the row sum and then compute the column sum of the resulting matrix. Is there a way I can speed up the code in C: /* Here is the code */ // Table is "wij" int i, j; for(i = 0; i < N; ++i) {
9
3392
by: mfyahya | last post by:
Hi, I'm new to databases :) I need help speeding up select queries on my data which are currently taking 4-5 seconds. I set up a single large table of coordinates data with an index on the fields I use most frequently in select queries. The data is about 100MB and index is 80MB. The table has the following structure: CREATE TABLE `ptimes`...
2
1556
by: Robert Wilkens | last post by:
Ok... This may be the wrong forum, but it's the first place I'm trying. I'm new to C# and just implemented the 3-tier Distributed application from Chapter 1 (the first walkthrough) in the "Walkthrough" book that comes with Visual Studio .NET 2003 Enterprise Architect. My first observation is -- woah, is this thing slow. From the time I...
2
1253
by: OHM | last post by:
I was wondering about this topic and although I accept that different situations call for different solutions, but wondered are there any other solutions and whether has anyone carried out a comparison of the different methods for avoiding JIT. Further more, is there anything I should be considering before using NGEN? Better methods etc > ...
11
2418
by: Dan Sugalski | last post by:
Is there any good way to speed up SQL that uses like and has placeholders? Here's the scoop. I've got a system that uses a lot of pre-generated SQL with placeholders in it. At runtime these SQL statements are fired off (through the C PQexecParams function, if that matters) for execution. No prepares or anything, just bare statements with $1...
2
4336
by: salad | last post by:
This is a tip on how to speed up listboxes DRAMATICALLY. Persons that would benefit are those that are constantly updating the rowsource of a listbox/combobox in order to filter and sort the data and the refreshes are slow. (OT. I've often wondered why there is no .Sort or .Filter property for Combos and Listboxes.) My listboxes , and...
4
1617
by: jocknerd | last post by:
About 10 years ago, I wrote a C app that would read scores from football games and calculate rankings based on the outcome of the games. In fact, I still use this app. You can view my rankings at http://members.cox.net/jocknerd/football. A couple of years ago, I got interested in Python and decided to rewrite my app in Python. I got it...
0
7625
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7935
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8144
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7992
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6313
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5519
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5244
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3677
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3666
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.