468,544 Members | 1,743 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,544 developers. It's quick & easy.

Find the Total Lines in a log file?

Hi,

I was wondering the best and fastest way to determine how many lines are
in a log file.

At the moment I am simply doing a StreamReader.ReadLine and incrementing
a counter until I reach the end. Is there a better way??

Cheers,
Craig
Nov 16 '05 #1
10 7219
This all depends on how big the log file is. A common method for larger log
files
is to take a statistical sampling and gain an average line length then use the
file size
to compute the number of lines. This is never 100% precise, but does generally
work nicely. If each log line is always of the same length, then you are in
real luck
since then the operation is extremely easy.

As for using StreamReader.ReadLine, that isn't exacty fast. You are creating a
string
object for each line read. You can check for the characters used for line
termination
yourself by using ReadBytes and cycling through the data yourself (not that
hard), but
you'll have to take into account files with different line termination standards
(crlf vs cr vs lf),
since various systems all have their own methods. If this is your log file,
then you can just
search for whatever you've been writing out.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"Craig Bumpstead" <cb******@yahoo.com.au> wrote in message
news:Oi**************@tk2msftngp13.phx.gbl...
Hi,

I was wondering the best and fastest way to determine how many lines are
in a log file.

At the moment I am simply doing a StreamReader.ReadLine and incrementing
a counter until I reach the end. Is there a better way??

Cheers,
Craig

Nov 16 '05 #2
Craig Bumpstead <cb******@yahoo.com.au> wrote in news:OixsJ$FJEHA.3596
@tk2msftngp13.phx.gbl:
Hi,

I was wondering the best and fastest way to determine how many lines are
in a log file.

At the moment I am simply doing a StreamReader.ReadLine and incrementing
a counter until I reach the end. Is there a better way??


If you define a line a string of characters ended with a
Environment.NewLine, you can thus load the file into a buffer and count the
amounts of Environment.NewLine characters (+1 if the last character in the
file isn't an Environment.NewLine.)

FB

--
Get LLBLGen Pro, the new O/R mapper for .NET: http://www.llblgen.com
My .NET Blog: http://weblogs.asp.net/fbouma
Microsoft C# MVP
Nov 16 '05 #3
yeah I was going to say something kinda similar...

StreamReader sW = new StreamReader(file);
int lineCount = sW.ReadToEnd().Split((char)13).GetUpperBound(0) + 1;

something like that should work.

On 17/04/2004 "Frans Bouma [C# MVP]" <pe******************@xs4all.nl> wrote:
Craig Bumpstead <cb******@yahoo.com.au> wrote in news:OixsJ$FJEHA.3596
@tk2msftngp13.phx.gbl:
Hi,

I was wondering the best and fastest way to determine how many lines are
in a log file.

At the moment I am simply doing a StreamReader.ReadLine and incrementing
a counter until I reach the end. Is there a better way??


If you define a line a string of characters ended with a
Environment.NewLine, you can thus load the file into a buffer and count the
amounts of Environment.NewLine characters (+1 if the last character in the
file isn't an Environment.NewLine.)

FB

Nov 16 '05 #4
Wow, I definitely would not use that method. Looks pretty to say the least, but
creates a very large amount of extra baggage. ReadToEnd() creates one huge
string. Split repackages that data into a string for every single line. Big
memory
waste at this point. Since Split will return a string array with even the last
line you
should just need a Length call.

Here is a more performant version for large files that uses a sharing
FileStream.
I've also included an updated version of the ReadToEnd method. You can easily
add some timing code in and create a rather large file that demonstrates the
first
method being faster and more memory efficient.

using System;
using System.IO;

public class LineCount {
private static byte[] lineBuffer = new byte[4196]; // 4K
private static void Main(string[] args) {
int lines = 0;
using(FileStream fs = new FileStream(args[0], FileMode.Open,
FileAccess.Read, FileShare.Read, lineBuffer.Length)) {
int bufferRead = 0;
while( (bufferRead = fs.Read(lineBuffer, 0, lineBuffer.Length)) >
0 ) {
for(int i = 0; i < bufferRead; i++) {
if ( lineBuffer[i] == 0xD ) {
lines++;
}
}
}
fs.Close();
}
lines++;
Console.WriteLine(lines);

StreamReader sw = new StreamReader(args[0]);
lines = sw.ReadToEnd().Split((char)13).Length;
Console.WriteLine(lines);
}
}

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"Scatropolis" <ch*****@frayed.net> wrote in message
news:OP**************@tk2msftngp13.phx.gbl...
yeah I was going to say something kinda similar...

StreamReader sW =ew StreamReader(file);
int lineCount =W.ReadToEnd().Split((char)13).GetUpperBound(0) + 1;

something like that should work.

On 17/04/2004 "Frans Bouma [C# MVP]" <pe******************@xs4all.nl> wrote:
Craig Bumpstead <cb******@yahoo.com.au> wrote in news:OixsJ$FJEHA.3596
@tk2msftngp13.phx.gbl:
Hi,

I was wondering the best and fastest way to determine how many lines are
in a log file.

At the moment I am simply doing a StreamReader.ReadLine and incrementing
a counter until I reach the end. Is there a better way??


If you define a line a string of characters ended with a
Environment.NewLine, you can thus load the file into a buffer and count the
amounts of Environment.NewLine characters (+1 if the last character in the
file isn't an Environment.NewLine.)

FB

Nov 16 '05 #5
Thanks everybody for the advice,

The files that I have been reading are about 1 to 3 Gb in size.
So as you could imagine that the ReadLine takes some time to complete.
I wanted the amount of lines in a file so that I could then use it for
the calc. of the progress bar.

Cheers,
Craig
Justin Rogers wrote:
Wow, I definitely would not use that method. Looks pretty to say the least, but
creates a very large amount of extra baggage. ReadToEnd() creates one huge
string. Split repackages that data into a string for every single line. Big
memory
waste at this point. Since Split will return a string array with even the last
line you
should just need a Length call.

Here is a more performant version for large files that uses a sharing
FileStream.
I've also included an updated version of the ReadToEnd method. You can easily
add some timing code in and create a rather large file that demonstrates the
first
method being faster and more memory efficient.

using System;
using System.IO;

public class LineCount {
private static byte[] lineBuffer = new byte[4196]; // 4K
private static void Main(string[] args) {
int lines = 0;
using(FileStream fs = new FileStream(args[0], FileMode.Open,
FileAccess.Read, FileShare.Read, lineBuffer.Length)) {
int bufferRead = 0;
while( (bufferRead = fs.Read(lineBuffer, 0, lineBuffer.Length)) >
0 ) {
for(int i = 0; i < bufferRead; i++) {
if ( lineBuffer[i] == 0xD ) {
lines++;
}
}
}
fs.Close();
}
lines++;
Console.WriteLine(lines);

StreamReader sw = new StreamReader(args[0]);
lines = sw.ReadToEnd().Split((char)13).Length;
Console.WriteLine(lines);
}
}

Nov 16 '05 #6
Craig, since you are only displaying a progress bar, then you want an average
measurement. I would highly recommend using the method I show below,
with some form of cut-off. For example:

(Read 16k worth of data, 4 times through the loop). Then:

float fudge = 1.05f;
totalLines = (int) (averagedLines * (FileLength / 16k) * fudge);

If you are reading the total number of lines first, then you are already
processing the entire file. There are fast ways to do this (as I've shown
below)
and slow ways, but you need a way that doesn't force you to read the entire
file,
and instead guess at the total number of lines.
--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"Craig Bumpstead" <cb******@yahoo.com.au> wrote in message
news:et**************@TK2MSFTNGP09.phx.gbl...
Thanks everybody for the advice,

The files that I have been reading are about 1 to 3 Gb in size.
So as you could imagine that the ReadLine takes some time to complete.
I wanted the amount of lines in a file so that I could then use it for
the calc. of the progress bar.

Cheers,
Craig
Justin Rogers wrote:
Wow, I definitely would not use that method. Looks pretty to say the least, but creates a very large amount of extra baggage. ReadToEnd() creates one huge
string. Split repackages that data into a string for every single line. Big memory
waste at this point. Since Split will return a string array with even the last line you
should just need a Length call.

Here is a more performant version for large files that uses a sharing
FileStream.
I've also included an updated version of the ReadToEnd method. You can easily add some timing code in and create a rather large file that demonstrates the
first
method being faster and more memory efficient.

using System;
using System.IO;

public class LineCount {
private static byte[] lineBuffer = new byte[4196]; // 4K
private static void Main(string[] args) {
int lines = 0;
using(FileStream fs = new FileStream(args[0], FileMode.Open,
FileAccess.Read, FileShare.Read, lineBuffer.Length)) {
int bufferRead = 0;
while( (bufferRead = fs.Read(lineBuffer, 0, lineBuffer.Length))

0 ) {
for(int i = 0; i < bufferRead; i++) {
if ( lineBuffer[i] == 0xD ) {
lines++;
}
}
}
fs.Close();
}
lines++;
Console.WriteLine(lines);

StreamReader sw = new StreamReader(args[0]);
lines = sw.ReadToEnd().Split((char)13).Length;
Console.WriteLine(lines);
}
}

Nov 16 '05 #7
In the interest of providing a complete example for this:

http://weblogs.asp.net/justin_rogers...17/115346.aspx
and
http://weblogs.asp.net/justin_rogers...es/115345.aspx

The first link is the introduction to the article and the second link is an
article detailing the various concepts behind statistical line counting along
with full source code at the end.
--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"Justin Rogers" <Ju****@games4dotnet.com> wrote in message
news:uC**************@tk2msftngp13.phx.gbl...
Craig, since you are only displaying a progress bar, then you want an average
measurement. I would highly recommend using the method I show below,
with some form of cut-off. For example:

(Read 16k worth of data, 4 times through the loop). Then:

float fudge = 1.05f;
totalLines = (int) (averagedLines * (FileLength / 16k) * fudge);

If you are reading the total number of lines first, then you are already
processing the entire file. There are fast ways to do this (as I've shown
below)
and slow ways, but you need a way that doesn't force you to read the entire
file,
and instead guess at the total number of lines.
--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"Craig Bumpstead" <cb******@yahoo.com.au> wrote in message
news:et**************@TK2MSFTNGP09.phx.gbl...
Thanks everybody for the advice,

The files that I have been reading are about 1 to 3 Gb in size.
So as you could imagine that the ReadLine takes some time to complete.
I wanted the amount of lines in a file so that I could then use it for
the calc. of the progress bar.

Cheers,
Craig
Justin Rogers wrote:
Wow, I definitely would not use that method. Looks pretty to say the least,
but
creates a very large amount of extra baggage. ReadToEnd() creates one huge
string. Split repackages that data into a string for every single line. Big memory
waste at this point. Since Split will return a string array with even the last line you
should just need a Length call.

Here is a more performant version for large files that uses a sharing
FileStream.
I've also included an updated version of the ReadToEnd method. You can easily add some timing code in and create a rather large file that demonstrates
the first
method being faster and more memory efficient.

using System;
using System.IO;

public class LineCount {
private static byte[] lineBuffer = new byte[4196]; // 4K
private static void Main(string[] args) {
int lines = 0;
using(FileStream fs = new FileStream(args[0], FileMode.Open,
FileAccess.Read, FileShare.Read, lineBuffer.Length)) {
int bufferRead = 0;
while( (bufferRead = fs.Read(lineBuffer, 0,

lineBuffer.Length))
0 ) {
for(int i = 0; i < bufferRead; i++) {
if ( lineBuffer[i] == 0xD ) {
lines++;
}
}
}
fs.Close();
}
lines++;
Console.WriteLine(lines);

StreamReader sw = new StreamReader(args[0]);
lines = sw.ReadToEnd().Split((char)13).Length;
Console.WriteLine(lines);
}
}


Nov 16 '05 #8
Justin,

Thanks for that bit of code.

It took 2 mins 8 sec to read the 3 Gb file with 12,656,376 lines.
The readline technique had only gotten up to 1,905,686 after 10 mins.

My machine:
2 x AMD Athlon MP 2400
1 Gb RAM
80Gb IDE HD

I was wondering if I should make it a thread so that I can start
proceessing the file?
Cheers,
Craig
Justin Rogers wrote:
Craig, since you are only displaying a progress bar, then you want an average
measurement. I would highly recommend using the method I show below,
with some form of cut-off. For example:

(Read 16k worth of data, 4 times through the loop). Then:

float fudge = 1.05f;
totalLines = (int) (averagedLines * (FileLength / 16k) * fudge);

If you are reading the total number of lines first, then you are already
processing the entire file. There are fast ways to do this (as I've shown
below)
and slow ways, but you need a way that doesn't force you to read the entire
file,
and instead guess at the total number of lines.

Nov 16 '05 #9
Definitely try some of the code I posted on my blog. What I would recommend
for a 3GB file is processing approximately 1 meg of that. That would only be
256
4K blocks. Check that value against your line count and see if it is relatively
close.
It should be and will take less than a second to process.

By setting the access mode to read and the share mode to read, you could process
the lines in a separate thread while starting the processing. However, you are
going
to incur double the disk access, which is why i think you need to use the
statistical
methods and bring your parsing time down.

I'm very interested in helping solve this particular problem in a performant
way, so
feel free to contact me through my blog if you run into any issues.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"Craig Bumpstead" <cb******@yahoo.com.au> wrote in message
news:OL**************@TK2MSFTNGP09.phx.gbl...
Justin,

Thanks for that bit of code.

It took 2 mins 8 sec to read the 3 Gb file with 12,656,376 lines.
The readline technique had only gotten up to 1,905,686 after 10 mins.

My machine:
2 x AMD Athlon MP 2400
1 Gb RAM
80Gb IDE HD

I was wondering if I should make it a thread so that I can start
proceessing the file?
Cheers,
Craig
Justin Rogers wrote:
Craig, since you are only displaying a progress bar, then you want an average measurement. I would highly recommend using the method I show below,
with some form of cut-off. For example:

(Read 16k worth of data, 4 times through the loop). Then:

float fudge = 1.05f;
totalLines = (int) (averagedLines * (FileLength / 16k) * fudge);

If you are reading the total number of lines first, then you are already
processing the entire file. There are fast ways to do this (as I've shown
below)
and slow ways, but you need a way that doesn't force you to read the entire
file,
and instead guess at the total number of lines.

Nov 16 '05 #10
Otis,

Thanks for your advice.

Justin suggested a similar method.

I think that estimating the lines int file is the best and quickest way.

I am reading unix server syslogs in to a database. Some times the log
files are 3 Gb in size before they reach me.
Cheers,
Craig
Otis Mukinfus wrote:
On Sat, 17 Apr 2004 19:40:22 +1000, Craig Bumpstead
<cb******@yahoo.com.au> wrote:

Hi,

I was wondering the best and fastest way to determine how many lines are
in a log file.

At the moment I am simply doing a StreamReader.ReadLine and incrementing
a counter until I reach the end. Is there a better way??

Cheers,
Craig

Craig,

Here is a simple way to do this. Not highly technical, but:

1. Find the size of the file.
2. Read the first 1000 lines of the file to determine the average
length of a line, including the line terminator.
3. Divide the file size by the line size and add one, if the last line
was not terminated. To determine this just seek the end of the file
and look at the last character in the file.

If the line length is fixed you will have your exact answer. If not
then you will have a pretty close estimate.

Question: What is the purpose of knowing the number of lines in the
log?

Otis Mukinfus
http://www.otismukinfus.com

Nov 16 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

25 posts views Thread by Neo Geshel | last post: by
2 posts views Thread by CSharpGuy | last post: by
5 posts views Thread by peter | last post: by
2 posts views Thread by karinmorena | last post: by
8 posts views Thread by W. eWatson | last post: by
reply views Thread by NPC403 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.