By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,454 Members | 3,133 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,454 IT Pros & Developers. It's quick & easy.

Can anyone explain PHP slowing down please ?

P: n/a

I'm using PHP 5 on Win-98 command line (ie no web server involved)

I'm processing a large csv file and when I loop through it I can process
around 275 records per second.

However at around 6,000 records this suddenly drops off to around 40
records per second.

This is a big problem as the "live" list is over 4 million records long.
I'd break it up but this is to be a regular test so that would be messy
to say the least -

Each record is 8 fields & total length tends to be below 200 characters
CSV is comma and ""

I was wondering if anyone with strong PHP knowledge has heard of this or
could help explain it please (As you probably know I'm very new to PHP)

I've trimmed the startup code to pseudocode to make it easier to read.
Otherwise my code is as below:

Sorry if line wrap is wrong - that would be my newsreader not the code

As you can see the code grabs a field from the database - spawns a
windows (msdos command line) .exe file to test it and writes the field
out to either a good or bad result file.

I dont do any file seeking or open and closing of files during the loop.

Tony

------------------------------ CODE START ------------------

<?php

//+++++++++++++++++++++++++++ PSeudocode start
open all new files for appending here (fopen($fin, 'a');)
open database for read-only here
Initialise all variables to 0 here

START;
get start-time
loop()
get end-time
write-statistics
close all files here
exit;
// +++++++++++++++++++++++++PSeudocode End

function loop() {
global
$fin,$fout,$fgood,$records,$fields,$good,$bad,$tot al,$dif_fcount,$nodata
;
while (($data = fgetcsv($fin, 1024, ",", "\"")) !== FALSE) {
if($data == '') { continue; }
$records++;
if (count($data) != $fields ) { $fields = count($data);
$dif_fcount++; }
if ($data[2] == '') { $data[2] = 'NO DATA' ; $nodata++; }
$raw = $data[7];
$star = "\"" . ($data[2]) . "\"";
$star = $raw ;
if (checkit($star) == false) {
fwrite($fout, $records . "," . $raw . "\r");
$bad += 1;
} else {
fwrite($fgood,$star . "\r");
$good += 1;
}
$total += 1;
echo("Total checked: " . $total . "\r" );
} //while
}
function checkit($star) {
exec("declination.exe " . $star , $aout, $returnval);
if ($aout[0][0] === "Y") {
return true;
} else {
return false;
}
}
?>
Jun 10 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a

ps - yes I know about the $star/$raw data confusion etc. that will be
debugged later.

tony
Jun 10 '06 #2

P: n/a
to**@tony.com wrote:
I'm using PHP 5 on Win-98 command line (ie no web server involved)

I'm processing a large csv file and when I loop through it I can process
around 275 records per second.

However at around 6,000 records this suddenly drops off to around 40
records per second.

This is a big problem as the "live" list is over 4 million records long.
I'd break it up but this is to be a regular test so that would be messy
to say the least -

Each record is 8 fields & total length tends to be below 200 characters
CSV is comma and ""

I was wondering if anyone with strong PHP knowledge has heard of this or
could help explain it please (As you probably know I'm very new to PHP)

I've trimmed the startup code to pseudocode to make it easier to read.
Otherwise my code is as below:

Sorry if line wrap is wrong - that would be my newsreader not the code

As you can see the code grabs a field from the database - spawns a
windows (msdos command line) .exe file to test it and writes the field
out to either a good or bad result file.

I dont do any file seeking or open and closing of files during the loop.

Tony

------------------------------ CODE START ------------------

<?php

//+++++++++++++++++++++++++++ PSeudocode start
open all new files for appending here (fopen($fin, 'a');)
open database for read-only here
Initialise all variables to 0 here

START;
get start-time
loop()
get end-time
write-statistics
close all files here
exit;
// +++++++++++++++++++++++++PSeudocode End

function loop() {
global
$fin,$fout,$fgood,$records,$fields,$good,$bad,$tot al,$dif_fcount,$nodata
;
while (($data = fgetcsv($fin, 1024, ",", "\"")) !== FALSE) {
if($data == '') { continue; }
$records++;
if (count($data) != $fields ) { $fields = count($data);
$dif_fcount++; }
if ($data[2] == '') { $data[2] = 'NO DATA' ; $nodata++; }
$raw = $data[7];
$star = "\"" . ($data[2]) . "\"";
$star = $raw ;
if (checkit($star) == false) {
fwrite($fout, $records . "," . $raw . "\r");
$bad += 1;
} else {
fwrite($fgood,$star . "\r");
$good += 1;
}
$total += 1;
echo("Total checked: " . $total . "\r" );
} //while
}
function checkit($star) {
exec("declination.exe " . $star , $aout, $returnval);
if ($aout[0][0] === "Y") {
return true;
} else {
return false;
}
}
?>


A slow down of this magnitude typically points to some system-related
bottleneck rather than an algorithmic one. Have you checked the
processes' virtual memory use? I would suspect that you are starting to
swap around the 6,000th record.

If not, I would start to place finer-grained time information around the
major I/O points (fgetcsv, fwrite) to see if they are causing the slow down.

On a stylistic note, why do you use $x++ in some places and $x += 1 in
others? Also, the checkit function could use the trinary compare operator:

function checkit($star) {
exec('declination.exe '.$star, $aout, $returnval);
return ($aout[0][0] === 'Y');
}

and 'if( $checkit($star) == false ) {' could become
'if( ! $checkit($star) ) {'

However, I don't think any of these would contribute to your slow down
issue.

-david-

Jun 10 '06 #3

P: n/a
In article <xP****************@fe24.usenetserver.com>,
da***********@sympatico.ca says...

A slow down of this magnitude typically points to some system-related
bottleneck rather than an algorithmic one. Have you checked the
processes' virtual memory use? I would suspect that you are starting to
swap around the 6,000th record.

If not, I would start to place finer-grained time information around the
major I/O points (fgetcsv, fwrite) to see if they are causing the slow down.

On a stylistic note, why do you use $x++ in some places and $x += 1 in
others? Also, the checkit function could use the trinary compare operator:

function checkit($star) {
exec('declination.exe '.$star, $aout, $returnval);
return ($aout[0][0] === 'Y');
}

and 'if( $checkit($star) == false ) {' could become
'if( ! $checkit($star) ) {'

However, I don't think any of these would contribute to your slow down
issue.

-david-


Thanks for the comments david - I've run this on both windows and linux
now and the linux system I ran through apache - I get the same results on
that too (very similar but not identical) (The linux box is entirely
different hardware) - I'll post timing differences later - they're
outrageous !!!

I've also run the windows version with apache too now and still get the
slowdown.

I've tried removing my call to the executable replacing it with a simple
return and it still slows down.

Looking at what you suggest - could it be that the fgetcsv command is
searching from the top of the file on every itteration?
I dont cause it to - but if thats how it works ?

I have discovered a windows loss of 2Mb of system memory for every run -
thats either a windows or PHP problem I dont know which - probably
windows and I dont think its connected to the slowdown.

If this isnt obvious to anyone I guess the only thingI can do is start
taking things out one by one starting with your suggestions.

On the memory side - no it all(suprisingly) seems to happen in ram - even
the test database file of 1 million records seems to go straight to ram.
There's no thrashing or anything (The win-98 system has 1Gb ram - the
linux system just 128Mb)

On the style front - Dave you wouldn't believe my working methods!
In my time I've used maybe 7 languages in anger and currently I use 3 or
4 so I'm in and out of them all the time - things get mixed up in my
head.
I always do a "comment" sweep when I'm done and tidy things up for this
very reason but sometimes I do use ++x and += 1 because it helps me
remember whats going on and what I need to keep an eye on. (I have a bad
short term memory problem) My "if" statements and other stuff follow the
same routine for the same reasons - there is method in my madness - you
just have to be mad too to see it...
This particular code has been messed about with quite badly too as I try
to find the problem.

I care more about function not fancy - most PHP style I've seen is
abysmal anyway - the standard style reccomended is appaling. I'm not a
fan of the way most people code - nor of the way many languages work for
that matter. Anything other than Forth is bad form in my book ;-)

As for the trinary operator - its a terrible construct for anyone who
doesn't use it regularly or anyone looking at someone elses code so I
avoid it.

tony

Jun 11 '06 #4

P: n/a
I wouldn't be surprised at all if PHP was loading the whole file into
memory or scanning from start to finish every fgetcsv() call. Checking
the C source code would give better insight.

there are also a couple functions that will tell you the current memory
usage by PHP. Putting those in for each iteration might give some
insight into if its running out of memory.

Maybe it is the data confusion the function, too? Unescaped quote,
missing newline, too long of a line, etc. Something like that might
confuse the fgetcsv() function.

to**@tony.com wrote:
In article <xP****************@fe24.usenetserver.com>,
da***********@sympatico.ca says...

A slow down of this magnitude typically points to some system-related
bottleneck rather than an algorithmic one. Have you checked the
processes' virtual memory use? I would suspect that you are starting to
swap around the 6,000th record.

If not, I would start to place finer-grained time information around the
major I/O points (fgetcsv, fwrite) to see if they are causing the slow down.

On a stylistic note, why do you use $x++ in some places and $x += 1 in
others? Also, the checkit function could use the trinary compare operator:

function checkit($star) {
exec('declination.exe '.$star, $aout, $returnval);
return ($aout[0][0] === 'Y');
}

and 'if( $checkit($star) == false ) {' could become
'if( ! $checkit($star) ) {'

However, I don't think any of these would contribute to your slow down
issue.

-david-


Thanks for the comments david - I've run this on both windows and linux
now and the linux system I ran through apache - I get the same results on
that too (very similar but not identical) (The linux box is entirely
different hardware) - I'll post timing differences later - they're
outrageous !!!

I've also run the windows version with apache too now and still get the
slowdown.

I've tried removing my call to the executable replacing it with a simple
return and it still slows down.

Looking at what you suggest - could it be that the fgetcsv command is
searching from the top of the file on every itteration?
I dont cause it to - but if thats how it works ?

I have discovered a windows loss of 2Mb of system memory for every run -
thats either a windows or PHP problem I dont know which - probably
windows and I dont think its connected to the slowdown.

If this isnt obvious to anyone I guess the only thingI can do is start
taking things out one by one starting with your suggestions.

On the memory side - no it all(suprisingly) seems to happen in ram - even
the test database file of 1 million records seems to go straight to ram.
There's no thrashing or anything (The win-98 system has 1Gb ram - the
linux system just 128Mb)

On the style front - Dave you wouldn't believe my working methods!
In my time I've used maybe 7 languages in anger and currently I use 3 or
4 so I'm in and out of them all the time - things get mixed up in my
head.
I always do a "comment" sweep when I'm done and tidy things up for this
very reason but sometimes I do use ++x and += 1 because it helps me
remember whats going on and what I need to keep an eye on. (I have a bad
short term memory problem) My "if" statements and other stuff follow the
same routine for the same reasons - there is method in my madness - you
just have to be mad too to see it...
This particular code has been messed about with quite badly too as I try
to find the problem.

I care more about function not fancy - most PHP style I've seen is
abysmal anyway - the standard style reccomended is appaling. I'm not a
fan of the way most people code - nor of the way many languages work for
that matter. Anything other than Forth is bad form in my book ;-)

As for the trinary operator - its a terrible construct for anyone who
doesn't use it regularly or anyone looking at someone elses code so I
avoid it.

tony


Jun 11 '06 #5

P: n/a
In article <11**********************@m38g2000cwc.googlegroups .com>,
ri********@gmail.com says...

I wouldn't be surprised at all if PHP was loading the whole file into
memory or scanning from start to finish every fgetcsv() call. Checking
the C source code would give better insight.

there are also a couple functions that will tell you the current memory
usage by PHP. Putting those in for each iteration might give some
insight into if its running out of memory.

Maybe it is the data confusion the function, too? Unescaped quote,
missing newline, too long of a line, etc. Something like that might


Thanks Richard - I looked at windows memory usage but never thought about
PHP itself - it does have a limit in the .INI file I think... I'll take a
look.

It isnt connected to the data (the csv file itself is decoded perfectly)

I've now run the process with just the drop to shell and return back
and it still slows down both on windows and linux so it is looking like
the fget function... or the loop ...(stack overflow bug?) for a while I
thought maybe it was windows but it doesn't seem to be. I changed it to
fgets and get the same result so that would seem to rule out the CSV
decoding part too.

I'm going to try just an empty file read loop but the trouble with that
is it probably isnt going to show up as it wont be doing any work. ;-(

There also seems to be a massive "flush" of something about 20 seconds
after the PHP code ends which slows windows down to a crawl for about 3
seconds - again I don't know what that is yet. Could be windows.

The other thing I've just discovered is PHP is taking 94% of processor
usage and I can't seem to change its priority in windows with the tool I
normally use for that. The number changes but PHP doesn't respond.

Can PHP be controlled for that - I dont remember reading anything in the
docs. 94% is way over the top - my real time TV capture device only uses
25%

PHP seems to have a default priority of 8 which is normal so this is
somewhat confusing too as it refuses to release to other apps.

This one is another application killer if I can't sort it ...

tony
Jun 12 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.