I was hoping to parse a webpage and extract all link and image URLs
from it and enter the new ones into mySQL tables, below is my code to
do it, I've tried to optimise it as much as I can but it still takes
too long to execute (server timeouts on a server which I can not
control) I was wondering if there was some way to compile the code or
if anyone can spot something which could be better written.
Thanks in advance,
Martin
<?
$start=microtime_float();
$uid=$_GET['uid'];
$restriction= "dontstayin";
include("loadstuff.php"); //just contains a function which echos a tick
or cross
include("userpass.php"); //contains database access details.
mysql_connect($host,$user,$password);
@mysql_select_db("$database") or die(cross());
$result = mysql_query("SELECT * FROM mark_toparse WHERE uid='$uid'");
//mysql_close();
$contents = mysql_result($result,0,"fcontents");
$fullorig = mysql_result($result,0,"originalpath");
$origpath = substr($origpath,0,strrpos($fullorig,"/")+1);
$lines = explode(">",$contents);
for($i=0;$i<count($lines);$i++){
$imgsrc = stristr($lines[$i],"<img");
if($imgsrc!=false){
$imgsrc = str_replace("'","",$imgsrc);
$f = strpos($imgsrc,"\"",strpos($imgsrc,"src"));
$l = strpos($imgsrc,"\"",$f+1);
$url = substr($imgsrc,$f+1,$l-$f-1);
if(strncasecmp($url,"http:",5)!=0){
$url = $origpath . $url;
$url= str_replace(":/","://",str_replace("//","/",$url));
}
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$link = stristr($lines[$i],"href=");
if($link!=false){
$link = str_replace("'","",$link);
$f = strpos($link,"\"");
$l = strpos($link,"\"",$f+1);
$url= substr($link,$f+1,$l-$f-1);
if(strncasecmp($url,"http:",5)!=0){
if(strncasecmp($url,"mailto:",7)==0||strncasecmp($ url,"ftp:",4)==0||strncasecmp($url,"msnim:",6)==0) {
//ignore ftp, mailto and msn links
}else{
$url = $origpath . $url;
$url= str_replace(":/","://",str_replace("//","/",$url));
if(eregi("\.jp[eg2]{1,2}$",$url)){
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$resone = mysql_query("SELECT count(*) FROM mark_linktodl WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_linktodl VALUES('','$url')");
}
}
}
}elseif(eregi($restriction,$url)){
if(eregi("\.jp[eg2]{1,2}$",$url)){
$resone = mysql_query("SELECT count(*) FROM mark_images WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_images VALUES('','$url')");
}
}else{
$resone = mysql_query("SELECT count(*) FROM mark_linktodl WHERE
url='$url'");
$a = (mysql_fetch_array($resone,MYSQL_NUM));
if($a[0]==0){
mysql_query("INSERT INTO mark_linktodl VALUES('','$url')");
}
}
}
}
}
}
//mysql_query("DELETE FROM mark_toparse WHERE
originalpath='$fullorig'");
mysql_query("INSERT INTO mark_parsed
VALUES('','$fullorig','".md5($contents)."')");
mysql_close();
$finish=microtime_float();
if(strcmp($_GET['debug'],"t")==0){
$tim=$finish-$start;
include("error_image.php");
echo imagepng(errorimage("Analysis: $tim s"));
}else{
echo tick();
}
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
?> 3 1152 Cl*******@hotmail.com wrote:
I was hoping to parse a webpage and extract all link and image URLs
from it and enter the new ones into mySQL tables, below is my code to
do it, I've tried to optimise it as much as I can but it still takes
too long to execute (server timeouts on a server which I can not
control) I was wondering if there was some way to compile the code or
if anyone can spot something which could be better written.
Thanks in advance,
Martin
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.
1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Rik wrote:
>
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.
1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Combining all the insert/replace operation into one statement would
help as well. Instead of doing a query immediate, store the links in
different arrays like this:
$images[$url] = true;
That collapses duplicate links in the page being parsed. Then it's just
a matter of looping through the arrays to build sql statements that
would create the records in one fell swoop.
Rik wrote:
>
Well, it would be simpler for us if you could describe what it is exactly
what you're trying to do, instead of letting us decypher it.
1. You can use a WHERE REGEXP to have a small result to check from the
database.
2. In the case, I really advise preg_replace() to change the img src
instead of the exploding, looping strpos, str_replace etc. This can be a
one-liner.
3. MySQL has a handy REPLACE INTO, as long as you have a correct key, it
would mean that no checking on already existing rows is required.
--
Rik Wasmus
Combining all the insert/replace operation into one statement would
help as well. Instead of doing a query immediate, store the links in
different arrays like this:
$images[$url] = true;
That collapses duplicate links in the page being parsed. Then it's just
a matter of looping through the arrays to build sql statements that
would create the records in one fell swoop. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Snyke |
last post by:
Hi.
I have a command line script which works really fine, the only problem
is that it take *really* long for the first output to be printed on
screen.
Since I also get some HTTP headers I'm...
|
by: dvumani |
last post by:
I have C code which computes the row sums of a matrix, divide each
element of each row with the row sum and then compute the column sum of
the resulting matrix. Is there a way I can speed up the...
|
by: mfyahya |
last post by:
Hi,
I'm new to databases :) I need help speeding up select queries on my
data which are currently taking 4-5 seconds. I set up a single large
table of coordinates data with an index on the fields...
|
by: Robert Wilkens |
last post by:
Ok... This may be the wrong forum, but it's the first place I'm trying.
I'm new to C# and just implemented the 3-tier Distributed application from
Chapter 1 (the first walkthrough) in the...
|
by: OHM |
last post by:
I was wondering about this topic and although I accept that different
situations call for different solutions, but wondered are there any other
solutions and whether has anyone carried out a...
|
by: Dan Sugalski |
last post by:
Is there any good way to speed up SQL that uses like and has placeholders?
Here's the scoop. I've got a system that uses a lot of pre-generated
SQL with placeholders in it. At runtime these SQL...
|
by: salad |
last post by:
This is a tip on how to speed up listboxes DRAMATICALLY. Persons that
would benefit are those that are constantly updating the rowsource of a
listbox/combobox in order to filter and sort the data...
|
by: jocknerd |
last post by:
About 10 years ago, I wrote a C app that would read scores from
football games and calculate rankings based on the outcome of the
games. In fact, I still use this app. You can view my rankings at...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
| |