By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,795 Members | 1,216 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,795 IT Pros & Developers. It's quick & easy.

PHP Script - Efficient use of execution time

P: n/a
Hi,

I am working on a project that requires me to parse 52,005,167 rows in
a MySQL table. I am using a series of PHP scripts for each test I
would like to run on the data. For example I have one test that is to
look for fields with consecutive values. If a row is found to have
more than 4 fields containing consecutive values I would like to
remove the row from the table.

I understand that this will take a good deal of time to process, which
is not a problem. What I am wondering is, should I disable the
maximum script execution time?

Currently I am running these tests in blocks of 10,000 rows at a time.
I would like to just run a test and let it run from start to finish.

Any comments?

TIA to for any help.
Adam
Jul 17 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
do*******@hotmail.com (Adam) wrote in
news:f1**************************@posting.google.c om:
Hi,

I am working on a project that requires me to parse 52,005,167 rows in
a MySQL table. I am using a series of PHP scripts for each test I
would like to run on the data. For example I have one test that is to
look for fields with consecutive values. If a row is found to have
more than 4 fields containing consecutive values I would like to
remove the row from the table.

I understand that this will take a good deal of time to process, which
is not a problem. What I am wondering is, should I disable the
maximum script execution time?

Currently I am running these tests in blocks of 10,000 rows at a time.
I would like to just run a test and let it run from start to finish.

Any comments?


If I was doing it I'd make sure the rows have a unique key,
and I'd have my 'processing script' accept query string
parameters such as "start" and "count". For example:
http://www.example.com/myscript.php?start=1&count=10000
Then I could run multiple instances in multiple browser
windows, perhaps at staggered times, and be able to
have partial results while processing continues on
other parts of the table.

--
Dave Patton
Canadian Coordinator, Degree Confluence Project
http://www.confluence.org/
My website: http://members.shaw.ca/davepatton/
Jul 17 '05 #2

P: n/a
I hope this will help you...

This function increses the time set for execution..
set_time_limit()

max_execution_time integer
This sets the maximum time in seconds a script is allowed to run before
it is terminated by the parser. This helps prevent poorly written
scripts from tying up the server. The default setting is 30.

The maximum execution time is not affected by system calls, stream
operations etc. Please see the set_time_limit() function for more
details.

You can not change this setting with ini_set() when running in safe
mode. The only workaround is to turn off safe mode or by changing the
time limit in the php.ini.

Jul 17 '05 #3

P: n/a
I'd use set_time_limit(N) -- it's never a good idea to simply disable
execution time, and because you're looping through these rows, you can
use this function to increase the current time limit by N seconds,
where N is a reasonably safe estimate of the time needed for each
iteration of the loop. This will only work if you're not running in
safe mode, so hopefully this will be of some help.

Jul 17 '05 #4

P: n/a
"Adam" <do*******@hotmail.com> wrote in message
news:f1**************************@posting.google.c om...
Hi,

I am working on a project that requires me to parse 52,005,167 rows in
a MySQL table. I am using a series of PHP scripts for each test I
would like to run on the data. For example I have one test that is to
look for fields with consecutive values. If a row is found to have
more than 4 fields containing consecutive values I would like to
remove the row from the table.

I understand that this will take a good deal of time to process, which
is not a problem. What I am wondering is, should I disable the
maximum script execution time?

Currently I am running these tests in blocks of 10,000 rows at a time.
I would like to just run a test and let it run from start to finish.

Any comments?


You might want to consider upgrading to PHP 5. PHP 4's garbage collection is
weak. It might not be able to get through that much data processing before
it runs out of memory.

And yes, you should disable time limit.
Jul 17 '05 #5

P: n/a
Following on from Adam's message. . .
Hi,

I am working on a project that requires me to parse 52,005,167 rows in
a MySQL table. I am using a series of PHP scripts for each test I
would like to run on the data. For example I have one test that is to
look for fields with consecutive values. If a row is found to have
more than 4 fields containing consecutive values I would like to
remove the row from the table.

I understand that this will take a good deal of time to process, which
is not a problem. What I am wondering is, should I disable the
maximum script execution time?

Currently I am running these tests in blocks of 10,000 rows at a time.
I would like to just run a test and let it run from start to finish.

Any comments?

TIA to for any help.
Adam


* It /might/ not be a good idea to DELETE in the middle of processing.
You may want to mark the record ready for deletion or make a separate
note and then use MySQL to delete in bulk.

* you /might/ find it worthwhile to export to a text file only possible
candidates (for example you might be able to put useful filtering in SQL
to weed out definite OKs or select certain likely candidates) and then
process the text file sequentially to get your record IDs for deleting.

* What alternative to disabling the max execution time did you have in
mind?

* You might also consider pre-processing your data in other ways
SELECT IF(COL2-COL1=1,1,0),IF(COL4-COL3=1,1,0),IF(COL6-COL5=1,1,0)
.....

Having processed lots of data with steam computers it really is worth
while optimising your method 'cos you can get order of magnitude
improvements.

--
PETER FOX Not the same since the bottom fell out of the bucket business
pe******@eminent.demon.co.uk.not.this.bit.no.html
2 Tees Close, Witham, Essex.
Gravity beer in Essex <http://www.eminent.demon.co.uk>
Jul 17 '05 #6

P: n/a
"Peter Fox" <pe******@eminent.demon.co.uk.not.this.bit.no.html > wrote in
message news:ND**************@eminent.demon.co.uk...
Following on from Adam's message. . .
Hi,

I am working on a project that requires me to parse 52,005,167 rows in
a MySQL table. I am using a series of PHP scripts for each test I
would like to run on the data. For example I have one test that is to
look for fields with consecutive values. If a row is found to have
more than 4 fields containing consecutive values I would like to
remove the row from the table.

I understand that this will take a good deal of time to process, which
is not a problem. What I am wondering is, should I disable the
maximum script execution time?

Currently I am running these tests in blocks of 10,000 rows at a time.
I would like to just run a test and let it run from start to finish.

Any comments?


You could have the script that runs the 10,000 rows call itself again,
passing an index pointer so that it restarts at the new pointer.

If you want, I have a script that does this - you could look over the code
(if you want, email me "tony" at "naturesflavors" dot "com"). My solution
uses JavaScript to call the PHP script again, passing the pointers in the
URL.
Jul 17 '05 #7

P: n/a
Good morning all,

mysql_unbuffered_query is a freaking amazing tool. As originally
written this script was estimated to take 7 days 19 hours to complete;
mysql_unbuffered_query and turning off script execution limits knocked
that down to 1 hour and 11 minutes.

Many thanks to all of you. You helped to point me in the right
direction and look outside of what I was used to using. I'll admit
this is my first time using mysql_unbuffered_query and it has
literally saved this project. Thanks again.

Adam
Jul 17 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.