By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,262 Members | 1,128 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,262 IT Pros & Developers. It's quick & easy.

Optimizing a string manipulation script.

P: n/a
I'm not really accustomed to string manipulation and so I was wondering
if any of you could be any help i speeding up this script intended to
change the format of some saved log information into a CSV file while
removing duplicate records.
The main problem is that the script currently takes about 20 seconds to
execute, and were it to take much longer it would time out.

Below is the script itself, and then some example lines from the log
file it processes:

<?
$data ="";

$fp = fopen("logs.txt", "r");

while(!feof($fp)){
$data .= fread($fp, 4096);
}
fclose($fp);

$fullArray = explode("\n", $data);
$ArrayofArrays[0] = array("","");
$myArray[0]=$fullArray[0];

(int)$flg;
for($i=0;$i<count($fullArray);$i++){
$flg=0;
for($j=0;$j<count($myArray);$j++){
if($myArray[$j]==$fullArray[$i]){$flg++;}
}
if($flg==0){
$myArray[count($myArray)]=$fullArray[$i];
}
}

for($maincount=0;$maincount<count($myArray);$mainc ount++){
$newArray = explode("\"",$myArray[$maincount]);
$newArray[0] = str_replace(array("[","]","+"),"", $newArray[0]);
$outArray = explode(" ", $newArray[0]);
$tmpArray = explode(" ", $newArray[1]);
$j=count($outArray);
for($i=$j;$i<$j+count($tmpArray);$i++){
$outArray[$i] = $tmpArray[$i-$j];
}

$tmpArray = explode(" ", $newArray[2]);
$j=count($outArray);
for($i=$j;$i<$j+count($tmpArray);$i++){
$outArray[$i] = $tmpArray[$i-$j];
}
$outArray[count($outArray)] = $newArray[3];
$outArray[count($outArray)] = $newArray[5];

trim_array($outArray, " \n\t:;,");
$ArrayofArrays[$maincount]=$outArray;
}

$out = fopen("output.csv", "a");
for($i=0;$i<count($ArrayofArrays);$i++){
for($j=0;$j<count($ArrayofArrays[$i]);$j++){
if($ArrayofArrays[$i][$j]!=""){
fwrite($out,$ArrayofArrays[$i][$j]);
fwrite($out,",");
}
}
fwrite($out,"\n");
}
fclose($out);

//printout($FinalAofAs);

function printout($a){
echo "<br><br>";
for($i=0;$i<count($a);$i++){
if(count($a[$i])!=1 && count($a[$i])!=0){printout($a[$i]);}
else{
if($a[$i]!="" && $a[$i]!="-"){
echo $a[$i];
echo "<br>";
}
}
}
}
function trim_array($a /*array to be trimmed*/,$b /*string of chars to
be removed*/){
for($i=0;$i<count($a);$i++){
$a[$i]=trim($a[$i],$b);
}
}

?>
jpgme.co.uk: [25/May/2006:13:04:47 +0100] 81.157.187.150 - - "GET /
HTTP/1.1" 200 5736 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:47 +0100] 81.157.187.150 - - "GET /
HTTP/1.1" 200 5736 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:47 +0100] 81.157.187.150 - - "GET
/images/title.gif HTTP/1.1" 200 5237 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but1.gif HTTP/1.1" 200 696 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/vline.gif HTTP/1.1" 200 85 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but2.gif HTTP/1.1" 200 742 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but3.gif HTTP/1.1" 200 742 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but4.gif HTTP/1.1" 200 506 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but5.gif HTTP/1.1" 200 711 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but6.gif HTTP/1.1" 200 600 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/ind_th2.jpg HTTP/1.1" 200 29533
"http://www.martinsphotos.co.uk/" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/ind_th1.jpg HTTP/1.1" 200 18673
"http://www.martinsphotos.co.uk/" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:49 +0100] 81.157.187.150 - - "GET
/images/ind_th3.jpg HTTP/1.1" 200 9298
"http://www.martinsphotos.co.uk/" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:54 +0100] 81.157.187.150 - - "GET
/gallery.php HTTP/1.1" 200 5787 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"

Jun 6 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
I'm not really accustomed to string manipulation and so I was wondering
if any of you could be any help i speeding up this script intended to
change the format of some saved log information into a CSV file while
removing duplicate records.
The main problem is that the script currently takes about 20 seconds to
execute, and were it to take much longer it would time out.

Below is the script itself, and then some example lines from the log
file it processes:

<?
[snip]
?>


Whew!

How about an example of the output you're trying to achieve? That might
be easier.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/
Jun 7 '06 #2

P: n/a

Alan Little wrote:
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
I'm not really accustomed to string manipulation and so I was wondering
if any of you could be any help i speeding up this script intended to
change the format of some saved log information into a CSV file while
removing duplicate records.
The main problem is that the script currently takes about 20 seconds to
execute, and were it to take much longer it would time out.

Below is the script itself, and then some example lines from the log
file it processes:

<?
[snip]
?>


Whew!

How about an example of the output you're trying to achieve? That might
be easier.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/


Here we go then:

jpgme.co.uk:,25/May/2006:13:04:47,0100,81.157.187.150,-,-,GET,/,HTTP/1.1,200,5736,-,Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322),
jpgme.co.uk:,25/May/2006:13:04:47,0100,81.157.187.150,-,-,GET,/images/title.gif,HTTP/1.1,200,5237,http://www.martinsphotos.co.uk/,Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322),
jpgme.co.uk:,25/May/2006:13:04:48,0100,81.157.187.150,-,-,GET,/images/but1.gif,HTTP/1.1,200,696,http://www.martinsphotos.co.uk/,Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322),
jpgme.co.uk:,25/May/2006:13:04:48,0100,81.157.187.150,-,-,GET,/images/vline.gif,HTTP/1.1,200,85,http://www.martinsphotos.co.uk/,Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322),

Jun 7 '06 #3

P: n/a
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
Alan Little wrote:
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
> I'm not really accustomed to string manipulation and so I was
> wondering if any of you could be any help i speeding up this script
> intended to change the format of some saved log information into a
> CSV file while removing duplicate records.
> The main problem is that the script currently takes about 20
> seconds to execute, and were it to take much longer it would time
> out.
>
> Below is the script itself, and then some example lines from the
> log file it processes:
>
> <?
> [snip]
> ?>


Whew!

How about an example of the output you're trying to achieve? That
might be easier.


Here we go then:


Try this:

<?php
$patt =
'!([^:]+:) \[([^:]+:\d\d:\d\d:\d\d) [+-](\d{4})\] '.
'(\d+\.\d+\.\d+\.\d+) (-) (-) "(\w+) (/[^ ]*) '.
'(HTTP/\d\.\d)" (\d+) (\d+) "([^"]+)" "([^"]+)"'.
"\n?".'!';

$log = fopen('log.csv', 'a');

$logfile = file_get_contents('logs.txt');
$logfile = ereg_replace("\r\n?", "\n", $logfile);

preg_match_all($patt, $x, $matches, PREG_SET_ORDER);

foreach($matches as $match) {
unset($match[0]);
$logline = implode(',', $match);
fputs($log, $logline."\n");
}

fclose($log);
?>

I don't know what those two blank log elements are after the IP, so this
pattern will only work when they're blank.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/
Jun 8 '06 #4

P: n/a
In message <11**********************@c74g2000cwc.googlegroups .com>,
Cl*******@hotmail.com writes
I'm not really accustomed to string manipulation and so I was wondering
if any of you could be any help i speeding up this script intended to
change the format of some saved log information into a CSV file while
removing duplicate records.
The main problem is that the script currently takes about 20 seconds to
execute, and were it to take much longer it would time out.


You may also want to look at PHP Performance Validator. This is a code
profiler for PHP. No requirement to modify your code. Works with PHP 4
and PHP 5. Its in beta at the moment. Windows only.

http://www.softwareverify.com/phpPer...tor/index.html

Stephen
--
Stephen Kellett
Object Media Limited http://www.objmedia.demon.co.uk/software.html
Computer Consultancy, Software Development
Windows C++, Java, Assembler, Performance Analysis, Troubleshooting
Jun 8 '06 #5

P: n/a

Alan Little wrote:
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
Alan Little wrote:
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:

> I'm not really accustomed to string manipulation and so I was
> wondering if any of you could be any help i speeding up this script
> intended to change the format of some saved log information into a
> CSV file while removing duplicate records.
> The main problem is that the script currently takes about 20
> seconds to execute, and were it to take much longer it would time
> out.
>
> Below is the script itself, and then some example lines from the
> log file it processes:
>
> <?
> [snip]
> ?>

Whew!

How about an example of the output you're trying to achieve? That
might be easier.


Here we go then:


Try this:

<?php
$patt =
'!([^:]+:) \[([^:]+:\d\d:\d\d:\d\d) [+-](\d{4})\] '.
'(\d+\.\d+\.\d+\.\d+) (-) (-) "(\w+) (/[^ ]*) '.
'(HTTP/\d\.\d)" (\d+) (\d+) "([^"]+)" "([^"]+)"'.
"\n?".'!';

$log = fopen('log.csv', 'a');

$logfile = file_get_contents('logs.txt');
$logfile = ereg_replace("\r\n?", "\n", $logfile);

preg_match_all($patt, $x, $matches, PREG_SET_ORDER);

foreach($matches as $match) {
unset($match[0]);
$logline = implode(',', $match);
fputs($log, $logline."\n");
}

fclose($log);
?>

I don't know what those two blank log elements are after the IP, so this
pattern will only work when they're blank.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/


They do seem to stay blank for the entire log, which means it should be
fine. I noticed though that your version produced smaller files that
mine, and on closer inspection I noticed some log lines sent it a
little insane. It seems to have problems when there's no file size
sent, for example on lines with errors. The following log lines are the
ones causing problems:

jpgme.co.uk: [26/May/2006:10:12:38 +0100] 130.88.199.23 - - "GET
/addthumbs.php HTTP/1.1" 200 37 "-" "Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:14:05 +0100] 130.88.199.23 - - "GET
/bulk.php HTTP/1.1" 200 42792 "-" "Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:14:23 +0100] 130.88.199.23 - - "GET
/add.php?folder=./All_work/Flowers HTTP/1.1" 200 21384
"http://www.martinsphotos.co.uk/bulk.php" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:54 +0100] 130.88.199.23 - - "GET
/add.php?folder=./All_work/Flowers HTTP/1.1" 200 18309
"http://www.martinsphotos.co.uk/bulk.php" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:54 +0100] 130.88.199.23 - - "GET
/images/folder2.png HTTP/1.1" 304 -
"http://www.martinsphotos.co.uk/add.php?folder=./All_work/Flowers"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:59 +0100] 130.88.199.23 - - "POST
/do_add.php HTTP/1.1" 200 12746
"http://www.martinsphotos.co.uk/add.php?folder=./All_work/Flowers"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:59 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=DSC04827.jpg
HTTP/1.1" 200 23231 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:59 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=DSC04822_1.JPG
HTTP/1.1" 200 23231 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:22:52 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=normal_DSC04962.jpg
HTTP/1.1" 200 23231 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:22:52 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=normal_DSC04980.jpg
HTTP/1.1" 200 23231 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:22:52 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=normal_DSC05000.jpg
HTTP/1.1" 200 20060 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:23:00 +0100] 130.88.199.23 - - "GET
/add.php?folder=./All_work/Flowers HTTP/1.1" 200 22952
"http://www.martinsphotos.co.uk/bulk.php" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:23:00 +0100] 130.88.199.23 - - "GET
/images/folder2.png HTTP/1.1" 304 -
"http://www.martinsphotos.co.uk/add.php?folder=./All_work/Flowers"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:25:50 +0100] 130.88.199.23 - - "GET
/add.php?folder=./All_work/Flowers HTTP/1.1" 200 18309
"http://www.martinsphotos.co.uk/bulk.php" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:25:50 +0100] 130.88.199.23 - - "GET
/images/folder2.png HTTP/1.1" 304 -
"http://www.martinsphotos.co.uk/add.php?folder=./All_work/Flowers"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"

Jun 12 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.