473,396 Members | 2,011 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Optimizing a string manipulation script.

I'm not really accustomed to string manipulation and so I was wondering
if any of you could be any help i speeding up this script intended to
change the format of some saved log information into a CSV file while
removing duplicate records.
The main problem is that the script currently takes about 20 seconds to
execute, and were it to take much longer it would time out.

Below is the script itself, and then some example lines from the log
file it processes:

<?
$data ="";

$fp = fopen("logs.txt", "r");

while(!feof($fp)){
$data .= fread($fp, 4096);
}
fclose($fp);

$fullArray = explode("\n", $data);
$ArrayofArrays[0] = array("","");
$myArray[0]=$fullArray[0];

(int)$flg;
for($i=0;$i<count($fullArray);$i++){
$flg=0;
for($j=0;$j<count($myArray);$j++){
if($myArray[$j]==$fullArray[$i]){$flg++;}
}
if($flg==0){
$myArray[count($myArray)]=$fullArray[$i];
}
}

for($maincount=0;$maincount<count($myArray);$mainc ount++){
$newArray = explode("\"",$myArray[$maincount]);
$newArray[0] = str_replace(array("[","]","+"),"", $newArray[0]);
$outArray = explode(" ", $newArray[0]);
$tmpArray = explode(" ", $newArray[1]);
$j=count($outArray);
for($i=$j;$i<$j+count($tmpArray);$i++){
$outArray[$i] = $tmpArray[$i-$j];
}

$tmpArray = explode(" ", $newArray[2]);
$j=count($outArray);
for($i=$j;$i<$j+count($tmpArray);$i++){
$outArray[$i] = $tmpArray[$i-$j];
}
$outArray[count($outArray)] = $newArray[3];
$outArray[count($outArray)] = $newArray[5];

trim_array($outArray, " \n\t:;,");
$ArrayofArrays[$maincount]=$outArray;
}

$out = fopen("output.csv", "a");
for($i=0;$i<count($ArrayofArrays);$i++){
for($j=0;$j<count($ArrayofArrays[$i]);$j++){
if($ArrayofArrays[$i][$j]!=""){
fwrite($out,$ArrayofArrays[$i][$j]);
fwrite($out,",");
}
}
fwrite($out,"\n");
}
fclose($out);

//printout($FinalAofAs);

function printout($a){
echo "<br><br>";
for($i=0;$i<count($a);$i++){
if(count($a[$i])!=1 && count($a[$i])!=0){printout($a[$i]);}
else{
if($a[$i]!="" && $a[$i]!="-"){
echo $a[$i];
echo "<br>";
}
}
}
}
function trim_array($a /*array to be trimmed*/,$b /*string of chars to
be removed*/){
for($i=0;$i<count($a);$i++){
$a[$i]=trim($a[$i],$b);
}
}

?>
jpgme.co.uk: [25/May/2006:13:04:47 +0100] 81.157.187.150 - - "GET /
HTTP/1.1" 200 5736 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:47 +0100] 81.157.187.150 - - "GET /
HTTP/1.1" 200 5736 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:47 +0100] 81.157.187.150 - - "GET
/images/title.gif HTTP/1.1" 200 5237 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but1.gif HTTP/1.1" 200 696 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/vline.gif HTTP/1.1" 200 85 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but2.gif HTTP/1.1" 200 742 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but3.gif HTTP/1.1" 200 742 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but4.gif HTTP/1.1" 200 506 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but5.gif HTTP/1.1" 200 711 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/but6.gif HTTP/1.1" 200 600 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/ind_th2.jpg HTTP/1.1" 200 29533
"http://www.martinsphotos.co.uk/" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:48 +0100] 81.157.187.150 - - "GET
/images/ind_th1.jpg HTTP/1.1" 200 18673
"http://www.martinsphotos.co.uk/" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:49 +0100] 81.157.187.150 - - "GET
/images/ind_th3.jpg HTTP/1.1" 200 9298
"http://www.martinsphotos.co.uk/" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
jpgme.co.uk: [25/May/2006:13:04:54 +0100] 81.157.187.150 - - "GET
/gallery.php HTTP/1.1" 200 5787 "http://www.martinsphotos.co.uk/"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"

Jun 6 '06 #1
5 1708
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
I'm not really accustomed to string manipulation and so I was wondering
if any of you could be any help i speeding up this script intended to
change the format of some saved log information into a CSV file while
removing duplicate records.
The main problem is that the script currently takes about 20 seconds to
execute, and were it to take much longer it would time out.

Below is the script itself, and then some example lines from the log
file it processes:

<?
[snip]
?>


Whew!

How about an example of the output you're trying to achieve? That might
be easier.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/
Jun 7 '06 #2

Alan Little wrote:
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
I'm not really accustomed to string manipulation and so I was wondering
if any of you could be any help i speeding up this script intended to
change the format of some saved log information into a CSV file while
removing duplicate records.
The main problem is that the script currently takes about 20 seconds to
execute, and were it to take much longer it would time out.

Below is the script itself, and then some example lines from the log
file it processes:

<?
[snip]
?>


Whew!

How about an example of the output you're trying to achieve? That might
be easier.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/


Here we go then:

jpgme.co.uk:,25/May/2006:13:04:47,0100,81.157.187.150,-,-,GET,/,HTTP/1.1,200,5736,-,Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322),
jpgme.co.uk:,25/May/2006:13:04:47,0100,81.157.187.150,-,-,GET,/images/title.gif,HTTP/1.1,200,5237,http://www.martinsphotos.co.uk/,Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322),
jpgme.co.uk:,25/May/2006:13:04:48,0100,81.157.187.150,-,-,GET,/images/but1.gif,HTTP/1.1,200,696,http://www.martinsphotos.co.uk/,Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322),
jpgme.co.uk:,25/May/2006:13:04:48,0100,81.157.187.150,-,-,GET,/images/vline.gif,HTTP/1.1,200,85,http://www.martinsphotos.co.uk/,Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322),

Jun 7 '06 #3
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
Alan Little wrote:
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
> I'm not really accustomed to string manipulation and so I was
> wondering if any of you could be any help i speeding up this script
> intended to change the format of some saved log information into a
> CSV file while removing duplicate records.
> The main problem is that the script currently takes about 20
> seconds to execute, and were it to take much longer it would time
> out.
>
> Below is the script itself, and then some example lines from the
> log file it processes:
>
> <?
> [snip]
> ?>


Whew!

How about an example of the output you're trying to achieve? That
might be easier.


Here we go then:


Try this:

<?php
$patt =
'!([^:]+:) \[([^:]+:\d\d:\d\d:\d\d) [+-](\d{4})\] '.
'(\d+\.\d+\.\d+\.\d+) (-) (-) "(\w+) (/[^ ]*) '.
'(HTTP/\d\.\d)" (\d+) (\d+) "([^"]+)" "([^"]+)"'.
"\n?".'!';

$log = fopen('log.csv', 'a');

$logfile = file_get_contents('logs.txt');
$logfile = ereg_replace("\r\n?", "\n", $logfile);

preg_match_all($patt, $x, $matches, PREG_SET_ORDER);

foreach($matches as $match) {
unset($match[0]);
$logline = implode(',', $match);
fputs($log, $logline."\n");
}

fclose($log);
?>

I don't know what those two blank log elements are after the IP, so this
pattern will only work when they're blank.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/
Jun 8 '06 #4
In message <11**********************@c74g2000cwc.googlegroups .com>,
Cl*******@hotmail.com writes
I'm not really accustomed to string manipulation and so I was wondering
if any of you could be any help i speeding up this script intended to
change the format of some saved log information into a CSV file while
removing duplicate records.
The main problem is that the script currently takes about 20 seconds to
execute, and were it to take much longer it would time out.


You may also want to look at PHP Performance Validator. This is a code
profiler for PHP. No requirement to modify your code. Works with PHP 4
and PHP 5. Its in beta at the moment. Windows only.

http://www.softwareverify.com/phpPer...tor/index.html

Stephen
--
Stephen Kellett
Object Media Limited http://www.objmedia.demon.co.uk/software.html
Computer Consultancy, Software Development
Windows C++, Java, Assembler, Performance Analysis, Troubleshooting
Jun 8 '06 #5

Alan Little wrote:
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:
Alan Little wrote:
Carved in mystic runes upon the very living rock, the last words of
<Cl*******@hotmail.com> of comp.lang.php make plain:

> I'm not really accustomed to string manipulation and so I was
> wondering if any of you could be any help i speeding up this script
> intended to change the format of some saved log information into a
> CSV file while removing duplicate records.
> The main problem is that the script currently takes about 20
> seconds to execute, and were it to take much longer it would time
> out.
>
> Below is the script itself, and then some example lines from the
> log file it processes:
>
> <?
> [snip]
> ?>

Whew!

How about an example of the output you're trying to achieve? That
might be easier.


Here we go then:


Try this:

<?php
$patt =
'!([^:]+:) \[([^:]+:\d\d:\d\d:\d\d) [+-](\d{4})\] '.
'(\d+\.\d+\.\d+\.\d+) (-) (-) "(\w+) (/[^ ]*) '.
'(HTTP/\d\.\d)" (\d+) (\d+) "([^"]+)" "([^"]+)"'.
"\n?".'!';

$log = fopen('log.csv', 'a');

$logfile = file_get_contents('logs.txt');
$logfile = ereg_replace("\r\n?", "\n", $logfile);

preg_match_all($patt, $x, $matches, PREG_SET_ORDER);

foreach($matches as $match) {
unset($match[0]);
$logline = implode(',', $match);
fputs($log, $logline."\n");
}

fclose($log);
?>

I don't know what those two blank log elements are after the IP, so this
pattern will only work when they're blank.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/


They do seem to stay blank for the entire log, which means it should be
fine. I noticed though that your version produced smaller files that
mine, and on closer inspection I noticed some log lines sent it a
little insane. It seems to have problems when there's no file size
sent, for example on lines with errors. The following log lines are the
ones causing problems:

jpgme.co.uk: [26/May/2006:10:12:38 +0100] 130.88.199.23 - - "GET
/addthumbs.php HTTP/1.1" 200 37 "-" "Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:14:05 +0100] 130.88.199.23 - - "GET
/bulk.php HTTP/1.1" 200 42792 "-" "Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:14:23 +0100] 130.88.199.23 - - "GET
/add.php?folder=./All_work/Flowers HTTP/1.1" 200 21384
"http://www.martinsphotos.co.uk/bulk.php" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:54 +0100] 130.88.199.23 - - "GET
/add.php?folder=./All_work/Flowers HTTP/1.1" 200 18309
"http://www.martinsphotos.co.uk/bulk.php" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:54 +0100] 130.88.199.23 - - "GET
/images/folder2.png HTTP/1.1" 304 -
"http://www.martinsphotos.co.uk/add.php?folder=./All_work/Flowers"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:59 +0100] 130.88.199.23 - - "POST
/do_add.php HTTP/1.1" 200 12746
"http://www.martinsphotos.co.uk/add.php?folder=./All_work/Flowers"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:59 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=DSC04827.jpg
HTTP/1.1" 200 23231 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:16:59 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=DSC04822_1.JPG
HTTP/1.1" 200 23231 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:22:52 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=normal_DSC04962.jpg
HTTP/1.1" 200 23231 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:22:52 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=normal_DSC04980.jpg
HTTP/1.1" 200 23231 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:22:52 +0100] 130.88.199.23 - - "GET
/addone.php?alb=61&folder=./All_work/Flowers&file=normal_DSC05000.jpg
HTTP/1.1" 200 20060 "http://www.martinsphotos.co.uk/do_add.php"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:23:00 +0100] 130.88.199.23 - - "GET
/add.php?folder=./All_work/Flowers HTTP/1.1" 200 22952
"http://www.martinsphotos.co.uk/bulk.php" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:23:00 +0100] 130.88.199.23 - - "GET
/images/folder2.png HTTP/1.1" 304 -
"http://www.martinsphotos.co.uk/add.php?folder=./All_work/Flowers"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:25:50 +0100] 130.88.199.23 - - "GET
/add.php?folder=./All_work/Flowers HTTP/1.1" 200 18309
"http://www.martinsphotos.co.uk/bulk.php" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.7.12) Gecko/20060210 Fedora/1.7.12-1.3.3.legacy"
jpgme.co.uk: [26/May/2006:10:25:50 +0100] 130.88.199.23 - - "GET
/images/folder2.png HTTP/1.1" 304 -
"http://www.martinsphotos.co.uk/add.php?folder=./All_work/Flowers"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060210
Fedora/1.7.12-1.3.3.legacy"

Jun 12 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Sean Williams | last post by:
I have been working in this problem for weeks and I was determined to sort it myself however I feel I need guidance to proceed forward. What I am trying to produce is some asp code that can split...
3
by: siddhartha mulpuru | last post by:
We have some rows that we need to do some tricky string manipulation on. We have a UserID column which has userid entries in the format firstname.lastname and i need to change each entry to...
3
by: Fabian | last post by:
I have created a javascript to manipulate a text strong given to it. It works in all the situations I put it in. Now, I want to create a form based interface. Essentially, the use types in the text...
14
by: Ian Richardson | last post by:
I'm writing a large Javascript application (uncompressed source around 400K) which is doing almost all the initialisation it needs to in a just-in-time manner. However, I have included an option...
32
by: tshad | last post by:
Can you do a search for more that one string in another string? Something like: someString.IndexOf("something1","something2","something3",0) or would you have to do something like: if...
24
by: Richard G. Riley | last post by:
Without resorting to asm chunks I'm working on a few small routines which manipulate bitmasks. I'm looking for any guidance on writing C in a manner which tilts the compilers hand in, if possible,...
4
by: WaterWalk | last post by:
Hello, I'm currently learning string manipulation. I'm curious about what is the favored way for string manipulation in C, expecially when strings contain non-ASCII characters. For example, if...
1
by: =?Utf-8?B?SWJyYWhpbQ==?= | last post by:
Hi, I am reading a csv file (comma separated values) by reading each line into a string and then using the split method to break it apart. I am running into a problem when the comma separated...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.