By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,110 Members | 1,010 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,110 IT Pros & Developers. It's quick & easy.

Issue reading data lines multiple times from a file

P: 1
Hi,
I am trying to make a Python2.6 script on a Win32 that will read all the text files stored in a directory and print only the lines containing actual data. A sample file -
Set : 1
Date: 10212009
12 34 56
25 67 90
End Set
********
Set: 2
Date: 10222009
34 56 89
25 67 89
End Set

In the above example file, I want to print only the lines 3, 4 and 9, 10 (the actual data values). The program does this iteratively on all txt files.
I wrote the script as below and am testing it on a single txt file as I go.
My logic is to read the input files one by one and search for a start string. As soon as the match is found, start searching for end string. when both are found, print the lines from start string to end string.Repeat on the rest of the file before opening another file.
The problem I am having is that it successfully reads the Set 1 of data, but then screws up on subsequent sets in the file. For set 2, it identifies the no. of lines to read, but prints them starting at incorrect line number.
A little digging leads to following explanations -
1. Using seek and tell to reposition the 2nd iteration of the loop, which did not work since the file is read from buffer and that screws up "tell" value.
2. Opening the file in binary mode helped someone, but it is not working for me.
3. Open the file with 0 buffer mode, but it did not work.

Second problem I am having is when it prints data from Set 1, it inserts a blank line between 2 lines of data values. How can I get rid of it?

Note: Ignore all references to next_run in the code below. I was trying it out for repositioning line read. Subsequent searches for start string should begin from the last position of end string

Expand|Select|Wrap|Line Numbers
  1. #!C:/Python26 python
  2.  
  3. # Import necessary modules
  4. import os, glob, string, sys, fileinput, linecache
  5. from goto import goto, label
  6.  
  7. # Set working path
  8. path = 'C:\\System_Data'
  9.  
  10.  
  11. # --------------------
  12. # PARSE DATA MODULE
  13. # --------------------
  14.  
  15. # Define the search strings for data
  16. start_search = "Set :"
  17. end_search ="End Set"
  18. # For Loop to read the input txt files one by one
  19. for inputfile in glob.glob( os.path.join( path, '*.txt' ) ):
  20.   inputfile_fileHandle = open ( inputfile, 'rb', 0 )
  21.   print( "Current file being read: " +inputfile )
  22.   # start_line initializes to first line
  23.   start_line = 0
  24.   # After first set of data is extracted, next_run will store the position to read the rest of the file
  25.   # next_run = 0
  26.   # start reading the input files, one line by one line
  27.   for line in inputfile:
  28.     line = inputfile_fileHandle.readline()
  29.     start_line += 1
  30.     # next_run+=1
  31.     # If a line matched with the start_search string
  32.     has_match = line.find( start_search )
  33.     if has_match >= 0:
  34.       print ( "Start String found at line number: %d" %( start_line ) )
  35.       # Store the location where the search will be restarted
  36.       # next_run = inputfile_fileHandle.tell() #inputfile_fileHandle.lineno()
  37.       print ("Current Position: %d" % next_run)
  38.       end_line = start_line
  39.       print ( "Start_Line: %d" %start_line )
  40.       print ( "End_Line: %d" %end_line )
  41.       #print(line)
  42.       for line in inputfile:
  43.         line = inputfile_fileHandle.readline()
  44.         #print (line)
  45.         end_line += 1
  46.         has_match = line.find(end_search)
  47.         if has_match >= 0:
  48.           print 'End   String found at line number: %d' % (end_line)
  49.           # total lines to print:
  50.           k=0
  51.           # for loop to print all the lines from start string to end string
  52.           for j in range(0,end_line-start_line-1):
  53.             print linecache.getline(inputfile, start_line +1+ j )
  54.             k+=1
  55.           print ( "Number of lines Printed: %d " %k )
  56.           # Using goto to get out of 2 loops at once
  57.           goto .re_search_start_string
  58.     label .re_search_start_string
  59.     #inputfile_fileHandle.seek(next_run,0)
  60.  
  61.   inputfile_fileHandle.close ()
Nov 18 '09 #1
Share this Question
Share on Google+
2 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
I take it this is not homework.

You only need to iterate on the file once. This part of your code:
Expand|Select|Wrap|Line Numbers
  1.   for line in inputfile:
  2.     line = inputfile_fileHandle.readline()
is part of your problem. There is no need to redefine variable line.

You have a space after "Set" one place but no space after the other. Change start_search to "Set".

Assuming you have a list of file names fnList, the following will compile a list of all the data from the files in fnList:
Expand|Select|Wrap|Line Numbers
  1. results = []
  2. start_search = "Set"
  3. end_search ="End Set"
  4. for fn in fnList:
  5.     f = open(fn)
  6.     inData = False
  7.     for line in f:
  8.         if line.startswith(start_search):
  9.             inData = True
  10.         elif line.startswith(end_search):
  11.             inData = False
  12.         elif inData and not line.startswith("Date"):
  13.             results.append(line.strip())
  14.     f.close()
The following prints out the results using string method join():
Expand|Select|Wrap|Line Numbers
  1. print "\n".join(results)
Nov 18 '09 #2

Expert 100+
P: 391
In the past when I've done similar things I've had a boolean variable called something like "recording". Then I set recording=False, and go through the lines as you suggest. If I find the start_search I set recording to True, and if I find the end_search I set it to False.

So in pseudocode it's something like this:
Expand|Select|Wrap|Line Numbers
  1. recording=False
  2. Loop through the files:
  3.     Loop through the lines:
  4.         if startCondition: recording=True
  5.         if endCondition: recording=False
  6.         if recording:
  7.             Print it, or save it to a file or whatever you want
  8.  
Nov 19 '09 #3

Post your reply

Sign in to post your reply or Sign up for a free account.