By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,215 Members | 1,936 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,215 IT Pros & Developers. It's quick & easy.

Reading a large text file line by line backwards

P: n/a
Is there a way through .net to read a very large text file (400MB+)
backwards line by line. In system.io the filestream class has a "seek"
method but the only read method requires you to know how many bytes to read
in.

My problem is that the line length of this log file is not constant so there
is no easy way to read one line in. The only thing that is constant is that
each line is terminated by a carriage return.

If my only option was to use the filestream class I suppose I can read data
back in chunks and try to parse the data and determine lines...

Is there a better way?
Amy.
Jul 21 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
"Amy L." <am**@paxemail.com> wrote in message
news:ua**************@TK2MSFTNGP10.phx.gbl...
Is there a way through .net to read a very large text file (400MB+)
backwards line by line. In system.io the filestream class has a "seek"
method but the only read method requires you to know how many bytes to read in.

My problem is that the line length of this log file is not constant so there is no easy way to read one line in. The only thing that is constant is that each line is terminated by a carriage return.

If my only option was to use the filestream class I suppose I can read data back in chunks and try to parse the data and determine lines...

Is there a better way?
Amy.


About all you can do is seek to, say, 1024 bytes before the end; read 1024
bytes; work through them byte by byte to chop up into lines; seek to 1024
bytes before that; read 1024 bytes; and keep going.

The problem is that, as you know, "lines" mean nothing to the low-level disk
functions in the operating system. This isn't like an IBM mainframe where
files have a "record length" and are divided into "card images".

I suggested 1024 because the cluster size (allocation unit) is probably a
multiple of 1024, so you will be synchronized with the disk sector
boundaries; that should be mildly advantageous as regards speed.
Jul 21 '05 #2

P: 1
Old thread, but Google grabs it so I'll add to it.

Here's how I did it in VB.NET. We have like 4G text files that we usually only want the bottom half of. This worked faster than just using readline forever (although on a local drive, I didn't see any difference at all). Just the basic framework:

'Imports System.IO
'Imports System.Text.Encoding
Public Sub ReadTextFileBackwards(ByVal sFilePath As String, ByVal sSearchString As String)
Dim i As Integer

Dim streamTextFile As Stream

streamTextFile = File.OpenRead(sFilePath)
streamTextFile.Seek(0, SeekOrigin.End)

Dim stringArray() As String
Dim sBuffer As String = ""

' the ideal block size is something of a holy grail, I guess. After some
' testing of sizes from 1K to 75K, it looked like speed really dropped off
' after the mid 60Ks or so. When testing for the minimum time to finish a task,
' the number 41K and 42K kept coming up, with occasional 34K thrown in for fun.
' There may be an ideal size for each file - I'm not really sure. What I do know
' is that over a network drive, this was 7X faster than just doing 'readline' until
' the cows come home. On a local drive, it doesn't make as much of a difference
' because the file access calls are similar in load to the overhead necessary
' when pulling large blocks. But when it is a network drive, the file access gets
' slower, so we're much better off minimizing the number of times we access
' the file.
Dim iBlockSize As Integer = 41000
Dim iFirstElement As Integer = 1

While streamTextFile.Position > 0
If streamTextFile.Position <= iBlockSize Then
iBlockSize = CInt(streamTextFile.Position)
iFirstElement = 0
End If
Dim byteArray(iBlockSize - 1) As Byte
streamTextFile.Seek(-1 * iBlockSize, SeekOrigin.Current)
streamTextFile.Read(byteArray, 0, byteArray.Length)
streamTextFile.Seek(-1 * iBlockSize, SeekOrigin.Current)
stringArray = Split(ASCII.GetString(byteArray), vbCrLf)
stringArray(stringArray.Length - 1) = stringArray(stringArray.Length - 1) + sBuffer
For i = stringArray.GetUpperBound(0) To iFirstElement Step -1
If stringArray(i).Contains(sSearchString) Then
MsgBox("Found It!")
End If
Next
sBuffer = stringArray(0)
End While

End Sub

I probably won't check back , but comments are always welcome. ~Ed
May 3 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.