473,465 Members | 1,538 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Issue reading data lines multiple times from a file

1 New Member
Hi,
I am trying to make a Python2.6 script on a Win32 that will read all the text files stored in a directory and print only the lines containing actual data. A sample file -
Set : 1
Date: 10212009
12 34 56
25 67 90
End Set
********
Set: 2
Date: 10222009
34 56 89
25 67 89
End Set

In the above example file, I want to print only the lines 3, 4 and 9, 10 (the actual data values). The program does this iteratively on all txt files.
I wrote the script as below and am testing it on a single txt file as I go.
My logic is to read the input files one by one and search for a start string. As soon as the match is found, start searching for end string. when both are found, print the lines from start string to end string.Repeat on the rest of the file before opening another file.
The problem I am having is that it successfully reads the Set 1 of data, but then screws up on subsequent sets in the file. For set 2, it identifies the no. of lines to read, but prints them starting at incorrect line number.
A little digging leads to following explanations -
1. Using seek and tell to reposition the 2nd iteration of the loop, which did not work since the file is read from buffer and that screws up "tell" value.
2. Opening the file in binary mode helped someone, but it is not working for me.
3. Open the file with 0 buffer mode, but it did not work.

Second problem I am having is when it prints data from Set 1, it inserts a blank line between 2 lines of data values. How can I get rid of it?

Note: Ignore all references to next_run in the code below. I was trying it out for repositioning line read. Subsequent searches for start string should begin from the last position of end string

Expand|Select|Wrap|Line Numbers
  1. #!C:/Python26 python
  2.  
  3. # Import necessary modules
  4. import os, glob, string, sys, fileinput, linecache
  5. from goto import goto, label
  6.  
  7. # Set working path
  8. path = 'C:\\System_Data'
  9.  
  10.  
  11. # --------------------
  12. # PARSE DATA MODULE
  13. # --------------------
  14.  
  15. # Define the search strings for data
  16. start_search = "Set :"
  17. end_search ="End Set"
  18. # For Loop to read the input txt files one by one
  19. for inputfile in glob.glob( os.path.join( path, '*.txt' ) ):
  20.   inputfile_fileHandle = open ( inputfile, 'rb', 0 )
  21.   print( "Current file being read: " +inputfile )
  22.   # start_line initializes to first line
  23.   start_line = 0
  24.   # After first set of data is extracted, next_run will store the position to read the rest of the file
  25.   # next_run = 0
  26.   # start reading the input files, one line by one line
  27.   for line in inputfile:
  28.     line = inputfile_fileHandle.readline()
  29.     start_line += 1
  30.     # next_run+=1
  31.     # If a line matched with the start_search string
  32.     has_match = line.find( start_search )
  33.     if has_match >= 0:
  34.       print ( "Start String found at line number: %d" %( start_line ) )
  35.       # Store the location where the search will be restarted
  36.       # next_run = inputfile_fileHandle.tell() #inputfile_fileHandle.lineno()
  37.       print ("Current Position: %d" % next_run)
  38.       end_line = start_line
  39.       print ( "Start_Line: %d" %start_line )
  40.       print ( "End_Line: %d" %end_line )
  41.       #print(line)
  42.       for line in inputfile:
  43.         line = inputfile_fileHandle.readline()
  44.         #print (line)
  45.         end_line += 1
  46.         has_match = line.find(end_search)
  47.         if has_match >= 0:
  48.           print 'End   String found at line number: %d' % (end_line)
  49.           # total lines to print:
  50.           k=0
  51.           # for loop to print all the lines from start string to end string
  52.           for j in range(0,end_line-start_line-1):
  53.             print linecache.getline(inputfile, start_line +1+ j )
  54.             k+=1
  55.           print ( "Number of lines Printed: %d " %k )
  56.           # Using goto to get out of 2 loops at once
  57.           goto .re_search_start_string
  58.     label .re_search_start_string
  59.     #inputfile_fileHandle.seek(next_run,0)
  60.  
  61.   inputfile_fileHandle.close ()
Nov 18 '09 #1
2 4077
bvdet
2,851 Recognized Expert Moderator Specialist
I take it this is not homework.

You only need to iterate on the file once. This part of your code:
Expand|Select|Wrap|Line Numbers
  1.   for line in inputfile:
  2.     line = inputfile_fileHandle.readline()
is part of your problem. There is no need to redefine variable line.

You have a space after "Set" one place but no space after the other. Change start_search to "Set".

Assuming you have a list of file names fnList, the following will compile a list of all the data from the files in fnList:
Expand|Select|Wrap|Line Numbers
  1. results = []
  2. start_search = "Set"
  3. end_search ="End Set"
  4. for fn in fnList:
  5.     f = open(fn)
  6.     inData = False
  7.     for line in f:
  8.         if line.startswith(start_search):
  9.             inData = True
  10.         elif line.startswith(end_search):
  11.             inData = False
  12.         elif inData and not line.startswith("Date"):
  13.             results.append(line.strip())
  14.     f.close()
The following prints out the results using string method join():
Expand|Select|Wrap|Line Numbers
  1. print "\n".join(results)
Nov 18 '09 #2
Glenton
391 Recognized Expert Contributor
In the past when I've done similar things I've had a boolean variable called something like "recording". Then I set recording=False, and go through the lines as you suggest. If I find the start_search I set recording to True, and if I find the end_search I set it to False.

So in pseudocode it's something like this:
Expand|Select|Wrap|Line Numbers
  1. recording=False
  2. Loop through the files:
  3.     Loop through the lines:
  4.         if startCondition: recording=True
  5.         if endCondition: recording=False
  6.         if recording:
  7.             Print it, or save it to a file or whatever you want
  8.  
Nov 19 '09 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: John | last post by:
I have over 5000 thumbnail pictures of size 5kb each. I would like to able to load all 5000 pictures and view 50 per page using mysql_data_seek(). I would like to know what are the advantages and...
0
by: Manoj Sharma | last post by:
I am performance testing some batch processing engines. These are written in ..NET and SQL Server and are typically concerned with generating XML files out of data retrieved from the database. I...
6
by: KevinD | last post by:
assumption: I am new to C and old to COBOL I have been reading a lot (self teaching) but something is not sinking in with respect to reading a simple file - one record at a time. Using C, I am...
4
by: Gaijinco | last post by:
I had a file named nap.in which looks like this: 4 10:00 12:00 Lectures 12:00 13:00 Lunch, like always. 13:00 15:00 Boring lectures... 15:30 17:45 Reading 4 10:00 12:00 Lectures 12:00 13:00...
3
by: Purti Malhotra | last post by:
Hi All, In our Web hosting environment we are using Virtual hosting i.e. multiple websites are on one server and multiple domains are pointing to a single website. Issue: We have two domains...
13
by: souissipro | last post by:
Hi, I have written a C program that does some of the functionalities mentionned in my previous topic posted some days ago. This shell should: 1- execute input commands from standard input,...
6
by: Catch_22 | last post by:
Hi, I have a large SQL Server 2000 database with 3 core tables. Table A : 10 million + records Table B : 2 million + records Table C : 6 million + records One of the batch tasks that I...
4
by: fniles | last post by:
I create a thread where I pass thru a message. When I click very fast many times (like 50 times) to create 50 threads, the message did not get pass thru ProcessMessage. For example: strBuffer =...
3
by: The Cool Giraffe | last post by:
Regarding the following code i have a problem. void read () { fstream file; ios::open_mode opMode = ios::in; file.open ("some.txt", opMode); char *ch = new char; vector <charv; while...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.