By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,490 Members | 1,398 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,490 IT Pros & Developers. It's quick & easy.

preg_match_all Maximum execution time of 60 seconds error

P: n/a
I am getting this error. Can it be fixed by setting more than 60 for the
max_execution_time
in php.in file?

Fatal error: Maximum execution time of 60 seconds exceeded in
categorycrawler.php on line 19

on this line i have regular expression

preg_match_all(,,)
Aug 17 '06 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Rik
Shuan wrote:
I am getting this error. Can it be fixed by setting more than 60 for
the max_execution_time
in php.in file?

Fatal error: Maximum execution time of 60 seconds exceeded in
categorycrawler.php on line 19

on this line i have regular expression

preg_match_all(,,)
Possibly. It's quite impossible to know without knowing your actual code. 60
second is really quite long. What are you trying to do exactly?

Grtz,
--
Rik Wasmus
Aug 17 '06 #2

P: n/a
I am trying to grab sites like craigslist, parse with regular expression
and put some content into database.

$request -fetch( $region_link );

if( !$request -error ){
$pageContent = $request -results;

$regionpattern =
"/<a[^>]*href=\"(\/s\/SL\/sg_maY.*)\".*>.*<img.*alt=\"(.*)\".*id=\"btn.*\">/
siU";

if(preg_match_all( $regionpattern, $pageContent, $categorylinks ))
{
for( $y = 0; $y < count( $categorylinks[ 1 ] ); $y++ ){

$category_link="http://www.mysite.com".$categorylinks[ 1 ][ $y ];

include( "pagecrawler.php" );
}
}
Aug 17 '06 #3

P: n/a
*** Shuan escribió/wrote (Thu, 17 Aug 2006 21:22:04 GMT):
I am getting this error. Can it be fixed by setting more than 60 for the
max_execution_time
in php.in file?
It's okay if you're doing a job that really needs a long time to execute,
such us retriving a 50 MB file from the Internet, spidering a web site or
synchronising two databases.

Is that the case?

--
-+ http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
++ Mi sitio sobre programación web: http://bits.demogracia.com
+- Mi web de humor con rayos UVA: http://www.demogracia.com
--
Aug 17 '06 #4

P: n/a
*** Shuan escribió/wrote (Thu, 17 Aug 2006 21:35:19 GMT):
I am trying to grab sites like craigslist, parse with regular expression
and put some content into database.
Try something like this:

ini_set('max_execution_time', 3600); // 1 hour
--
-+ http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
++ Mi sitio sobre programación web: http://bits.demogracia.com
+- Mi web de humor con rayos UVA: http://www.demogracia.com
--
Aug 17 '06 #5

P: n/a
Rik
Shuan wrote:
I am trying to grab sites like craigslist, parse with regular
expression and put some content into database.

$request -fetch( $region_link );

if( !$request -error ){
$pageContent = $request -results;

$regionpattern =
"/<a[^>]*href=\"(\/s\/SL\/sg_maY.*)\".*>.*<img.*alt=\"(.*)\".*id=\"btn.*\">/
siU";

if(preg_match_all( $regionpattern, $pageContent, $categorylinks ))
I was almost tempted to say it was a greedyness issue, before I spotted the /U.
Dodged a bullet there :-).

If I interprete you regex correctly, try this rewrite (I tend to use dots very
sparingly, I'm more a fan of negative character classes, in which proper
greediness is more usefull). I'm not really sure it will gain much on the
resources consumption, but we can try:

'|<a[^>]*?href="(/s/SL/sg_maY[^"]*)"[^>]*>.*?<img[^>]*?alt="([^"]*)"[^>]*?id="bt
n[^"]*"[^>]*>|si

I'd suggest a foreach loop also, instead your for loop:

foreach($categorylinks[1] as $link){
$category_link="http://www.mysite.com".$link;
include( "pagecrawler.php" );//I'm still curious what this does....
}

Or if you do use capture 2:
if(preg_match_all( $regionpattern, $pageContent, $categorylinks,
PREG_SET_ORDER)){
foreach($categorylinks as $link){
$category_link="http://www.mysite.com".$link[1];
include( "pagecrawler.php" );//I'm still curious what this does....
}
}

If you still have issues I'd like to see/know the actual site you're leeching
right now :-).(If you're trying to get a page all at once, be sure to unset()
unused/past variables.) I don't know what your actual pagecrawler.php does, but
if it doesn't use capture 2 you might as well not capture it.

Grtz,
--
Rik Wasmus
Aug 17 '06 #6

P: n/a
It's okay if you're doing a job that really needs a long time to execute,
such us retriving a 50 MB file from the Internet, spidering a web site or
synchronising two databases.

Is that the case?
i don't have to get a big file but i need to crawl the whole websites( about
1000 pages )
and that takes time;


Aug 17 '06 #7

P: n/a
Rik
Shuan wrote:
>It's okay if you're doing a job that really needs a long time to
execute, such us retriving a 50 MB file from the Internet, spidering
a web site or synchronising two databases.

Is that the case?

i don't have to get a big file but i need to crawl the whole
websites( about 1000 pages )
and that takes time;
Well yeah, about a 1000 pages will almost certainly need more execution time,
forget what I said about the regex. As the preg_match_all is likely the most
time consuming part of your script (be it relativelyy fast), chances are quite
high that in it's nth execution the limit is passed.

DO you know how many pages you do parse in those 60 second BTW?

Grtz,
--
Rik Wasmus
Aug 17 '06 #8

P: n/a
Shuan wrote:
I am trying to grab sites like craigslist, parse with regular expression
and put some content into database.

$request -fetch( $region_link );

if( !$request -error ){
$pageContent = $request -results;

$regionpattern =
"/<a[^>]*href=\"(\/s\/SL\/sg_maY.*)\".*>.*<img.*alt=\"(.*)\".*id=\"btn.*\">/
siU";
There is a lot of back-tracking in your pattern, even though you've
specified ungreedy behavior. If there are many instances matching the
<a[^>]*href=\"(\/s\/SL\/sg_maY part of the pattern but not the rest,
then the .* that follows would make the regexp engine continually scan
to the end of the file.

My suggestion is to do /<a\s+href=\"(\/s\/SL\/sg_maY.*)\">(.*)<\/a>/siU
first, then loop through the results and regexp for the img tag.

Aug 17 '06 #9

P: n/a
Hi,
>ini_set('max_execution_time', 3600); // 1 hour
This solution worked, but I need to check out the reg. expression
to see if i am doing it effiiciently.

Thanks all for your supports.

--
-+ http://alvaro.es - Alvaro G. Vicario - Burgos, Spain
++ Mi sitio sobre programacion web: http://bits.demogracia.com
+- Mi web de humor con rayos UVA: http://www.demogracia.com
"Rik" <lu************@hotmail.comwrote in message
news:eb***************************@news2.tudelft.n l...
Shuan wrote:
It's okay if you're doing a job that really needs a long time to
execute, such us retriving a 50 MB file from the Internet, spidering
a web site or synchronising two databases.

Is that the case?
i don't have to get a big file but i need to crawl the whole
websites( about 1000 pages )
and that takes time;

Well yeah, about a 1000 pages will almost certainly need more execution
time,
forget what I said about the regex. As the preg_match_all is likely the
most
time consuming part of your script (be it relativelyy fast), chances are
quite
high that in it's nth execution the limit is passed.

DO you know how many pages you do parse in those 60 second BTW?

Grtz,
--
Rik Wasmus


Aug 18 '06 #10

This discussion thread is closed

Replies have been disabled for this discussion.