473,320 Members | 1,916 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Reading a large text file line by line backwards

Is there a way through .net to read a very large text file (400MB+)
backwards line by line. In system.io the filestream class has a "seek"
method but the only read method requires you to know how many bytes to read
in.

My problem is that the line length of this log file is not constant so there
is no easy way to read one line in. The only thing that is constant is that
each line is terminated by a carriage return.

If my only option was to use the filestream class I suppose I can read data
back in chunks and try to parse the data and determine lines...

Is there a better way?
Amy.
Jul 21 '05 #1
2 11670
"Amy L." <am**@paxemail.com> wrote in message
news:ua**************@TK2MSFTNGP10.phx.gbl...
Is there a way through .net to read a very large text file (400MB+)
backwards line by line. In system.io the filestream class has a "seek"
method but the only read method requires you to know how many bytes to read in.

My problem is that the line length of this log file is not constant so there is no easy way to read one line in. The only thing that is constant is that each line is terminated by a carriage return.

If my only option was to use the filestream class I suppose I can read data back in chunks and try to parse the data and determine lines...

Is there a better way?
Amy.


About all you can do is seek to, say, 1024 bytes before the end; read 1024
bytes; work through them byte by byte to chop up into lines; seek to 1024
bytes before that; read 1024 bytes; and keep going.

The problem is that, as you know, "lines" mean nothing to the low-level disk
functions in the operating system. This isn't like an IBM mainframe where
files have a "record length" and are divided into "card images".

I suggested 1024 because the cluster size (allocation unit) is probably a
multiple of 1024, so you will be synchronized with the disk sector
boundaries; that should be mildly advantageous as regards speed.
Jul 21 '05 #2
Old thread, but Google grabs it so I'll add to it.

Here's how I did it in VB.NET. We have like 4G text files that we usually only want the bottom half of. This worked faster than just using readline forever (although on a local drive, I didn't see any difference at all). Just the basic framework:

'Imports System.IO
'Imports System.Text.Encoding
Public Sub ReadTextFileBackwards(ByVal sFilePath As String, ByVal sSearchString As String)
Dim i As Integer

Dim streamTextFile As Stream

streamTextFile = File.OpenRead(sFilePath)
streamTextFile.Seek(0, SeekOrigin.End)

Dim stringArray() As String
Dim sBuffer As String = ""

' the ideal block size is something of a holy grail, I guess. After some
' testing of sizes from 1K to 75K, it looked like speed really dropped off
' after the mid 60Ks or so. When testing for the minimum time to finish a task,
' the number 41K and 42K kept coming up, with occasional 34K thrown in for fun.
' There may be an ideal size for each file - I'm not really sure. What I do know
' is that over a network drive, this was 7X faster than just doing 'readline' until
' the cows come home. On a local drive, it doesn't make as much of a difference
' because the file access calls are similar in load to the overhead necessary
' when pulling large blocks. But when it is a network drive, the file access gets
' slower, so we're much better off minimizing the number of times we access
' the file.
Dim iBlockSize As Integer = 41000
Dim iFirstElement As Integer = 1

While streamTextFile.Position > 0
If streamTextFile.Position <= iBlockSize Then
iBlockSize = CInt(streamTextFile.Position)
iFirstElement = 0
End If
Dim byteArray(iBlockSize - 1) As Byte
streamTextFile.Seek(-1 * iBlockSize, SeekOrigin.Current)
streamTextFile.Read(byteArray, 0, byteArray.Length)
streamTextFile.Seek(-1 * iBlockSize, SeekOrigin.Current)
stringArray = Split(ASCII.GetString(byteArray), vbCrLf)
stringArray(stringArray.Length - 1) = stringArray(stringArray.Length - 1) + sBuffer
For i = stringArray.GetUpperBound(0) To iFirstElement Step -1
If stringArray(i).Contains(sSearchString) Then
MsgBox("Found It!")
End If
Next
sBuffer = stringArray(0)
End While

End Sub

I probably won't check back , but comments are always welcome. ~Ed
May 3 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Jay | last post by:
I have a very large text file (being read by a CGI script on a web server), and I get memory errors when I try to read the whole file into a list of strings. The problem is, I want to read the file...
1
by: Amy L. | last post by:
Is there a way through .net to read a very large text file (400MB+) backwards line by line. In system.io the filestream class has a "seek" method but the only read method requires you to know how...
11
by: mkarja | last post by:
Hi, I'm trying to figure out how to read some range of rows from a file. Is it possible to search the file with some criteria and then when the search string is found read 3 rows before and...
20
by: sahukar praveen | last post by:
Hello, I have a question. I try to print a ascii file in reverse order( bottom-top). Here is the logic. 1. Go to the botton of the file fseek(). move one character back to avoid the EOF. 2....
11
by: Matt DeFoor | last post by:
I have some log files that I'm working with that look like this: 1000000000 3456 1234 1000000001 3456 1235 1000020002 3456 1223 1000203044 3456 986 etc. I'm trying to read the file...
6
by: Rajorshi Biswas | last post by:
Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is insignificant. I'm confused as to how best to do it. Upon...
6
by: Neil Patel | last post by:
I have a log file that puts the most recent record at the bottom of the file. Each line is delimited by a \r\n Does anyone know how to seek to the end of the file and start reading backwards?
3
by: booksnore | last post by:
I have to read data from a flat file with millions of records. I wanted to find the most efficient way of doing this. I was just going to use a StreamReader and then break up the input line using...
1
by: stoogots2 | last post by:
I have written a Windows App in C# that needs to read a text file over the network, starting from the end of the file and reading backwards toward the beginning (looking for the last occurrence of a...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.