469,915 Members | 2,634 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,915 developers. It's quick & easy.

counting matched lines in extremely large files.

First off I'll say - I am a bad perl programmer.

I want to be better and with your help I'll get there and then be able
to contribute more here.

That being said, I have a simple problem compounded by file size.

I have a PIX that logs to my syslog server for a ton of items - my
logs sizes get extremely large; ~13 GIGABYTEs daily and they are
rotated daily.

I'm trying to set up some intrusion detection but with file sizes that
big just counting incidents to start getting a baseline gets time, cpu
and memory intensive using shell commands like grep. So I wanted to do
something in perl but I don't know if because of the file size and
memory limitations I can do that.

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Anyway - your help is appreciated.

The Mikester
Jul 19 '05 #1
7 5056
su*********@yahoo.com (mikester) wrote in message news:<69**************************@posting.google. com>...
First off I'll say - I am a bad perl programmer.

I want to be better and with your help I'll get there and then be able
to contribute more here.

That being said, I have a simple problem compounded by file size.

I have a PIX that logs to my syslog server for a ton of items - my
logs sizes get extremely large; ~13 GIGABYTEs daily and they are
rotated daily.

I'm trying to set up some intrusion detection but with file sizes that
big just counting incidents to start getting a baseline gets time, cpu
and memory intensive using shell commands like grep. So I wanted to do
something in perl but I don't know if because of the file size and
memory limitations I can do that.

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Anyway - your help is appreciated.

The Mikester

Sorry, typo it is actually
#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `grep -c $VARIABLE $LOG`; <----
print "$GREP\n";


Thanks
Jul 19 '05 #2
In article <69**************************@posting.google.com >, mikester
<su*********@yahoo.com> wrote:

[snip]

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?
Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";


More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...
Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.

Anyway - your help is appreciated.

The Mikester

Jul 19 '05 #3
Jim Gibson <jg*****@mail.arc.nasa.gov> wrote in message news:<191220031038058768%jg*****@mail.arc.nasa.gov >...
In article <69**************************@posting.google.com >, mikester
<su*********@yahoo.com> wrote:

[snip]

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?


Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";


More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...


Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.

Anyway - your help is appreciated.

The Mikester

I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.
Jul 19 '05 #4
Jim Gibson <jg*****@mail.arc.nasa.gov> wrote in message news:<191220031038058768%jg*****@mail.arc.nasa.gov >...
In article <69**************************@posting.google.com >, mikester
<su*********@yahoo.com> wrote:

[snip]

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?


Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";


More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...


Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.

Anyway - your help is appreciated.

The Mikester

I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.
Jul 19 '05 #5
su*********@yahoo.com (mikester) wrote in message news:<69**************************@posting.google. com>...
Jim Gibson <jg*****@mail.arc.nasa.gov> wrote in message news:<191220031038058768%jg*****@mail.arc.nasa.gov >...
In article <69**************************@posting.google.com >, mikester
<su*********@yahoo.com> wrote:

[snip]

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?


Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";


More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...


Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.

Anyway - your help is appreciated.

The Mikester

I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.

It works great - but not with the large files. The files are in the
13GB files size and I just don't have the memory to load that up.
Jul 19 '05 #6
In article <69*************************@posting.google.com> , mikester
<su*********@yahoo.com> wrote:
su*********@yahoo.com (mikester) wrote in message
news:<69**************************@posting.google. com>...
Jim Gibson <jg*****@mail.arc.nasa.gov> wrote in message
news:<191220031038058768%jg*****@mail.arc.nasa.gov >...
In article <69**************************@posting.google.com >, mikester
<su*********@yahoo.com> wrote:

[snip]

>
> Here's the shell command based perl script I run to get a basic count
> on a certain number of incidents.
>
[snip]
Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";

It works great - but not with the large files. The files are in the
13GB files size and I just don't have the memory to load that up.


It shouldn't take much more memory to run that program on a 13GB file
than it does no a small one. The program only reads in one line at a
time. What doesn't "work great" with the large file? What happens?
Jul 19 '05 #7
Jim Gibson <jg*****@mail.arc.nasa.gov> wrote in message news:<231220031527288072%jg*****@mail.arc.nasa.gov >...
In article <69*************************@posting.google.com> , mikester
<su*********@yahoo.com> wrote:
su*********@yahoo.com (mikester) wrote in message
news:<69**************************@posting.google. com>...
Jim Gibson <jg*****@mail.arc.nasa.gov> wrote in message
news:<191220031038058768%jg*****@mail.arc.nasa.gov >...
> In article <69**************************@posting.google.com >, mikester
> <su*********@yahoo.com> wrote:
>
> [snip]
>
> >
> > Here's the shell command based perl script I run to get a basic count
> > on a certain number of incidents.
> >
[snip]
Here is a simple perl program that will do that:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $log = $ARGV[1];
> my $count = 0;
>
> open(LOG,$log) or die("Can't open $log: $!");
> while(<LOG>) {
> $count++ if /$ARGV[0]/;
> }
> print "count of '$ARGV[0]' in $log is $count\n";
>


It works great - but not with the large files. The files are in the
13GB files size and I just don't have the memory to load that up.


It shouldn't take much more memory to run that program on a 13GB file
than it does no a small one. The program only reads in one line at a
time. What doesn't "work great" with the large file? What happens?


I'll post the output after the holiday.
Jul 19 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

24 posts views Thread by Joerg Schuster | last post: by
7 posts views Thread by Sam Lowry | last post: by
6 posts views Thread by jabailo | last post: by
18 posts views Thread by Conrad F | last post: by
5 posts views Thread by Anders K. Jacobsen [DK] | last post: by
14 posts views Thread by Steve McLellan | last post: by
7 posts views Thread by Mark..... | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.