472,334 Members | 2,221 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,334 software developers and data experts.

Reading a large text file line by line backwards

Is there a way through .net to read a very large text file (400MB+)
backwards line by line. In system.io the filestream class has a "seek"
method but the only read method requires you to know how many bytes to read
in.

My problem is that the line length of this log file is not constant so there
is no easy way to read one line in. The only thing that is constant is that
each line is terminated by a carriage return.

If my only option was to use the filestream class I suppose I can read data
back in chunks and try to parse the data and determine lines...

Is there a better way?
Amy.
Jul 21 '05 #1
2 11496
"Amy L." <am**@paxemail.com> wrote in message
news:ua**************@TK2MSFTNGP10.phx.gbl...
Is there a way through .net to read a very large text file (400MB+)
backwards line by line. In system.io the filestream class has a "seek"
method but the only read method requires you to know how many bytes to read in.

My problem is that the line length of this log file is not constant so there is no easy way to read one line in. The only thing that is constant is that each line is terminated by a carriage return.

If my only option was to use the filestream class I suppose I can read data back in chunks and try to parse the data and determine lines...

Is there a better way?
Amy.


About all you can do is seek to, say, 1024 bytes before the end; read 1024
bytes; work through them byte by byte to chop up into lines; seek to 1024
bytes before that; read 1024 bytes; and keep going.

The problem is that, as you know, "lines" mean nothing to the low-level disk
functions in the operating system. This isn't like an IBM mainframe where
files have a "record length" and are divided into "card images".

I suggested 1024 because the cluster size (allocation unit) is probably a
multiple of 1024, so you will be synchronized with the disk sector
boundaries; that should be mildly advantageous as regards speed.
Jul 21 '05 #2
Old thread, but Google grabs it so I'll add to it.

Here's how I did it in VB.NET. We have like 4G text files that we usually only want the bottom half of. This worked faster than just using readline forever (although on a local drive, I didn't see any difference at all). Just the basic framework:

'Imports System.IO
'Imports System.Text.Encoding
Public Sub ReadTextFileBackwards(ByVal sFilePath As String, ByVal sSearchString As String)
Dim i As Integer

Dim streamTextFile As Stream

streamTextFile = File.OpenRead(sFilePath)
streamTextFile.Seek(0, SeekOrigin.End)

Dim stringArray() As String
Dim sBuffer As String = ""

' the ideal block size is something of a holy grail, I guess. After some
' testing of sizes from 1K to 75K, it looked like speed really dropped off
' after the mid 60Ks or so. When testing for the minimum time to finish a task,
' the number 41K and 42K kept coming up, with occasional 34K thrown in for fun.
' There may be an ideal size for each file - I'm not really sure. What I do know
' is that over a network drive, this was 7X faster than just doing 'readline' until
' the cows come home. On a local drive, it doesn't make as much of a difference
' because the file access calls are similar in load to the overhead necessary
' when pulling large blocks. But when it is a network drive, the file access gets
' slower, so we're much better off minimizing the number of times we access
' the file.
Dim iBlockSize As Integer = 41000
Dim iFirstElement As Integer = 1

While streamTextFile.Position > 0
If streamTextFile.Position <= iBlockSize Then
iBlockSize = CInt(streamTextFile.Position)
iFirstElement = 0
End If
Dim byteArray(iBlockSize - 1) As Byte
streamTextFile.Seek(-1 * iBlockSize, SeekOrigin.Current)
streamTextFile.Read(byteArray, 0, byteArray.Length)
streamTextFile.Seek(-1 * iBlockSize, SeekOrigin.Current)
stringArray = Split(ASCII.GetString(byteArray), vbCrLf)
stringArray(stringArray.Length - 1) = stringArray(stringArray.Length - 1) + sBuffer
For i = stringArray.GetUpperBound(0) To iFirstElement Step -1
If stringArray(i).Contains(sSearchString) Then
MsgBox("Found It!")
End If
Next
sBuffer = stringArray(0)
End While

End Sub

I probably won't check back , but comments are always welcome. ~Ed
May 3 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Jay | last post by:
I have a very large text file (being read by a CGI script on a web server), and I get memory errors when I try to read the whole file into a list of...
1
by: Amy L. | last post by:
Is there a way through .net to read a very large text file (400MB+) backwards line by line. In system.io the filestream class has a "seek" method...
11
by: mkarja | last post by:
Hi, I'm trying to figure out how to read some range of rows from a file. Is it possible to search the file with some criteria and then when the...
20
by: sahukar praveen | last post by:
Hello, I have a question. I try to print a ascii file in reverse order( bottom-top). Here is the logic. 1. Go to the botton of the file...
11
by: Matt DeFoor | last post by:
I have some log files that I'm working with that look like this: 1000000000 3456 1234 1000000001 3456 1235 1000020002 3456 1223 1000203044 3456...
6
by: Rajorshi Biswas | last post by:
Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is...
6
by: Neil Patel | last post by:
I have a log file that puts the most recent record at the bottom of the file. Each line is delimited by a \r\n Does anyone know how to seek to...
3
by: booksnore | last post by:
I have to read data from a flat file with millions of records. I wanted to find the most efficient way of doing this. I was just going to use a...
1
by: stoogots2 | last post by:
I have written a Windows App in C# that needs to read a text file over the network, starting from the end of the file and reading backwards toward...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.