By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,175 Members | 1,710 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,175 IT Pros & Developers. It's quick & easy.

What is the fastest way to count lines in a text file?

P: n/a
I want to very quickly count the number of lines in text files without having
to read each line and increment a counter. I am working in VB.NET and C#.
Does anyone have a very fast example on how to do this?

Thanks,

Matt
Dec 26 '05 #1
Share this Question
Share on Google+
14 Replies


P: n/a
Mesterak,

In different test in these newsgroups have showed that just looping through
the file using the string as a Char array (not testing on a string however
testing on a char) and testing on the linebreack char is mostly the fastest
method.

I hope this helps,

Cor
Dec 26 '05 #2

P: n/a
Can you provide a code example else point me to the relevant posts?

"Cor Ligthert [MVP]" wrote:
Mesterak,

In different test in these newsgroups have showed that just looping through
the file using the string as a Char array (not testing on a string however
testing on a char) and testing on the linebreack char is mostly the fastest
method.

I hope this helps,

Cor

Dec 26 '05 #3

P: n/a
Maybe using regular expression can be fast solution ( for large text
files ).
You will count matches for \r\n or \n

--
Vadym Stetsyak aka Vadmyst
http://vadmyst.blogspot.com

"mesterak" <me******@discussions.microsoft.com> wrote in message
news:C9**********************************@microsof t.com...
I want to very quickly count the number of lines in text files without
having
to read each line and increment a counter. I am working in VB.NET and C#.
Does anyone have a very fast example on how to do this?

Thanks,

Matt

Dec 26 '05 #4

P: n/a
Here is one message thread

http://groups.google.com/group/micro...5c33cc87237dbf

Be aware that in this case the samples provided by Jay about the characters
are the fastest and not the VB Find which it is if it is about strings.

I hope this helps,

Cor
Dec 26 '05 #5

P: n/a
I tried the following which did not seem to work:

strContents = Regex.Replace(strContents, "\r{0,}\n+", vbCrLf)
myArrayList.AddRange(strContents.Split(CType(vbCrL f, Char)))
"Vadym Stetsyak" wrote:
Maybe using regular expression can be fast solution ( for large text
files ).
You will count matches for \r\n or \n

--
Vadym Stetsyak aka Vadmyst
http://vadmyst.blogspot.com

"mesterak" <me******@discussions.microsoft.com> wrote in message
news:C9**********************************@microsof t.com...
I want to very quickly count the number of lines in text files without
having
to read each line and increment a counter. I am working in VB.NET and C#.
Does anyone have a very fast example on how to do this?

Thanks,

Matt


Dec 26 '05 #6

P: n/a
Vadym Stetsyak <va*****@ukr.net> wrote:
Maybe using regular expression can be fast solution ( for large text
files ).
That's very unlikely, IMO.
You will count matches for \r\n or \n


And how will you provide the text for the regular expression to match?
As far as I'm aware, you can't provide regular expressions with
TextReaders - you have to provide them with strings.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Dec 26 '05 #7

P: n/a
Mesterak,

In those messages I show show you is using the split and the regex the
farmost slowest method to count lines.

Cor
Dec 26 '05 #8

P: n/a
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?

"Jon Skeet [C# MVP]" wrote:
Vadym Stetsyak <va*****@ukr.net> wrote:
Maybe using regular expression can be fast solution ( for large text
files ).


That's very unlikely, IMO.
You will count matches for \r\n or \n


And how will you provide the text for the regular expression to match?
As far as I'm aware, you can't provide regular expressions with
TextReaders - you have to provide them with strings.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 26 '05 #9

P: n/a
mesterak <me******@discussions.microsoft.com> wrote:
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?


By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.

Here's some sample code:

using System;
using System.IO;

class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time

int total=1; // All files have at least one line!

int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}

static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Dec 26 '05 #10

P: n/a
Thanks, that works perfectly!!!

I wrote the following which apparently works but does require that the
entire file be read into memory (your code is better):

Public Function GetLineCount(ByVal FileName As String) As Integer

If File.Exists(FileName) Then
Dim LogReader As StreamReader
LogReader = New StreamReader(New FileStream(FileName,
FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
Dim strContents As String = LogReader.ReadToEnd
LogReader.Close()
LogReader = Nothing
Dim r As New Regex(Chr(10))
Dim LineCount As Integer = r.Matches(strContents).Count
r = Nothing
Return LineCount
End If

End Function
"Jon Skeet [C# MVP]" wrote:
mesterak <me******@discussions.microsoft.com> wrote:
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?


By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.

Here's some sample code:

using System;
using System.IO;

class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time

int total=1; // All files have at least one line!

int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}

static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 26 '05 #11

P: n/a
Ok, I used your baseline code to rewrite my VB.NET function. It is very fast
and efficient. The only thing I needed to added was a check to see if the
last character was a LF and increment the total if not; I get the correct
number of lines every time! Processing ~200MB of log files (209 files)
occurs extremely fast (only added 2 seconds overall to the date/time indexing
functions I was already performing.)

Thanks a million!!!

Here's my new VB.NET function to benefit anyone else needing to count lines
in a file in VB.NET:

Public Function GetLineCount(ByVal FileName As String) As Integer
Dim total As Integer = 0

If File.Exists(FileName) Then
Dim buffer(32 * 1024) As Char
Dim i As Integer
Dim read As Integer

Dim reader As TextReader = File.OpenText(FileName)
read = reader.Read(buffer, 0, buffer.Length)

While (read > 0)
i = 0
While i < read

If buffer(i) = Chr(10) Then
total += 1
End If

i += 1
End While

read = reader.Read(buffer, 0, buffer.Length)
End While

reader.Close()
reader = Nothing

If Not buffer(i - 1) = Chr(10) Then
total += 1
End If

End If

Return total
End Function

"Jon Skeet [C# MVP]" wrote:
mesterak <me******@discussions.microsoft.com> wrote:
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?


By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.

Here's some sample code:

using System;
using System.IO;

class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time

int total=1; // All files have at least one line!

int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}

static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 26 '05 #12

P: n/a
Jon,

While I saw you in past forever telling about multithreading, is this in my
opinion a perfect situations for multithreading.

An IO operation has forever (IO) stops in it and is therefore perfectly to
paralyse with the counting thread.

Just my opinion.

Cor
So how can I count the lines of the file without loading the whole file
into
memory as a string and counting lines?


By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.

Here's some sample code:

using System;
using System.IO;

class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time

int total=1; // All files have at least one line!

int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}

static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 27 '05 #13

P: n/a
Cor Ligthert [MVP] <no************@planet.nl> wrote:
While I saw you in past forever telling about multithreading, is this in my
opinion a perfect situations for multithreading.

An IO operation has forever (IO) stops in it and is therefore perfectly to
paralyse with the counting thread.


It's certainly *possible* that it would speed things up. I wouldn't
suggest that it's worth doing unless the performance of doing it in a
single thread is a problem though. Assuming the IO performance
dominates the time taken, you'd only be able to shave off the time
taken for the scanning, which I suspect would be absolutely minute.
Compare this with the development cost/risk of turning a simple bit of
single-threaded code into multi-threaded code, and I'd certainly need
to see concrete figures before taking that risk.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Dec 27 '05 #14

P: n/a
The 2 VB.NET functions I created based on your code example are pretty darn
fast. I counted a total of several million lines across about 200+ files in
a matter of a few seconds. If someone has issues with this speed to require
multi-threading, then something's just wrong!

However, one of my new line counting functions is used in a separate thread
after my app initially counts the lines and partially indexes the files'
entries by date/time (to get a time reference per file so I only parse parts
of files applicable to the date/time window of interest.) The line counter
that runs in a separate thread goes back over all of the files and determines
the actual byte position per chr(10) detected. This enables the user of my
log viewer to quickly jump to a particular line and also speeds content
paging (for viewability performance.) So to answer Cor, yes it is good to
use in a separate thread when there are extended purposes at play which you
may not want your app (or user) to wait on to complete.

-Matt

"Jon Skeet [C# MVP]" wrote:
Cor Ligthert [MVP] <no************@planet.nl> wrote:
While I saw you in past forever telling about multithreading, is this in my
opinion a perfect situations for multithreading.

An IO operation has forever (IO) stops in it and is therefore perfectly to
paralyse with the counting thread.


It's certainly *possible* that it would speed things up. I wouldn't
suggest that it's worth doing unless the performance of doing it in a
single thread is a problem though. Assuming the IO performance
dominates the time taken, you'd only be able to shave off the time
taken for the scanning, which I suspect would be absolutely minute.
Compare this with the development cost/risk of turning a simple bit of
single-threaded code into multi-threaded code, and I'd certainly need
to see concrete figures before taking that risk.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 29 '05 #15

This discussion thread is closed

Replies have been disabled for this discussion.