473,748 Members | 2,225 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Issue reading data lines multiple times from a file

1 New Member
Hi,
I am trying to make a Python2.6 script on a Win32 that will read all the text files stored in a directory and print only the lines containing actual data. A sample file -
Set : 1
Date: 10212009
12 34 56
25 67 90
End Set
********
Set: 2
Date: 10222009
34 56 89
25 67 89
End Set

In the above example file, I want to print only the lines 3, 4 and 9, 10 (the actual data values). The program does this iteratively on all txt files.
I wrote the script as below and am testing it on a single txt file as I go.
My logic is to read the input files one by one and search for a start string. As soon as the match is found, start searching for end string. when both are found, print the lines from start string to end string.Repeat on the rest of the file before opening another file.
The problem I am having is that it successfully reads the Set 1 of data, but then screws up on subsequent sets in the file. For set 2, it identifies the no. of lines to read, but prints them starting at incorrect line number.
A little digging leads to following explanations -
1. Using seek and tell to reposition the 2nd iteration of the loop, which did not work since the file is read from buffer and that screws up "tell" value.
2. Opening the file in binary mode helped someone, but it is not working for me.
3. Open the file with 0 buffer mode, but it did not work.

Second problem I am having is when it prints data from Set 1, it inserts a blank line between 2 lines of data values. How can I get rid of it?

Note: Ignore all references to next_run in the code below. I was trying it out for repositioning line read. Subsequent searches for start string should begin from the last position of end string

Expand|Select|Wrap|Line Numbers
  1. #!C:/Python26 python
  2.  
  3. # Import necessary modules
  4. import os, glob, string, sys, fileinput, linecache
  5. from goto import goto, label
  6.  
  7. # Set working path
  8. path = 'C:\\System_Data'
  9.  
  10.  
  11. # --------------------
  12. # PARSE DATA MODULE
  13. # --------------------
  14.  
  15. # Define the search strings for data
  16. start_search = "Set :"
  17. end_search ="End Set"
  18. # For Loop to read the input txt files one by one
  19. for inputfile in glob.glob( os.path.join( path, '*.txt' ) ):
  20.   inputfile_fileHandle = open ( inputfile, 'rb', 0 )
  21.   print( "Current file being read: " +inputfile )
  22.   # start_line initializes to first line
  23.   start_line = 0
  24.   # After first set of data is extracted, next_run will store the position to read the rest of the file
  25.   # next_run = 0
  26.   # start reading the input files, one line by one line
  27.   for line in inputfile:
  28.     line = inputfile_fileHandle.readline()
  29.     start_line += 1
  30.     # next_run+=1
  31.     # If a line matched with the start_search string
  32.     has_match = line.find( start_search )
  33.     if has_match >= 0:
  34.       print ( "Start String found at line number: %d" %( start_line ) )
  35.       # Store the location where the search will be restarted
  36.       # next_run = inputfile_fileHandle.tell() #inputfile_fileHandle.lineno()
  37.       print ("Current Position: %d" % next_run)
  38.       end_line = start_line
  39.       print ( "Start_Line: %d" %start_line )
  40.       print ( "End_Line: %d" %end_line )
  41.       #print(line)
  42.       for line in inputfile:
  43.         line = inputfile_fileHandle.readline()
  44.         #print (line)
  45.         end_line += 1
  46.         has_match = line.find(end_search)
  47.         if has_match >= 0:
  48.           print 'End   String found at line number: %d' % (end_line)
  49.           # total lines to print:
  50.           k=0
  51.           # for loop to print all the lines from start string to end string
  52.           for j in range(0,end_line-start_line-1):
  53.             print linecache.getline(inputfile, start_line +1+ j )
  54.             k+=1
  55.           print ( "Number of lines Printed: %d " %k )
  56.           # Using goto to get out of 2 loops at once
  57.           goto .re_search_start_string
  58.     label .re_search_start_string
  59.     #inputfile_fileHandle.seek(next_run,0)
  60.  
  61.   inputfile_fileHandle.close ()
Nov 18 '09 #1
2 4098
bvdet
2,851 Recognized Expert Moderator Specialist
I take it this is not homework.

You only need to iterate on the file once. This part of your code:
Expand|Select|Wrap|Line Numbers
  1.   for line in inputfile:
  2.     line = inputfile_fileHandle.readline()
is part of your problem. There is no need to redefine variable line.

You have a space after "Set" one place but no space after the other. Change start_search to "Set".

Assuming you have a list of file names fnList, the following will compile a list of all the data from the files in fnList:
Expand|Select|Wrap|Line Numbers
  1. results = []
  2. start_search = "Set"
  3. end_search ="End Set"
  4. for fn in fnList:
  5.     f = open(fn)
  6.     inData = False
  7.     for line in f:
  8.         if line.startswith(start_search):
  9.             inData = True
  10.         elif line.startswith(end_search):
  11.             inData = False
  12.         elif inData and not line.startswith("Date"):
  13.             results.append(line.strip())
  14.     f.close()
The following prints out the results using string method join():
Expand|Select|Wrap|Line Numbers
  1. print "\n".join(results)
Nov 18 '09 #2
Glenton
391 Recognized Expert Contributor
In the past when I've done similar things I've had a boolean variable called something like "recording" . Then I set recording=False , and go through the lines as you suggest. If I find the start_search I set recording to True, and if I find the end_search I set it to False.

So in pseudocode it's something like this:
Expand|Select|Wrap|Line Numbers
  1. recording=False
  2. Loop through the files:
  3.     Loop through the lines:
  4.         if startCondition: recording=True
  5.         if endCondition: recording=False
  6.         if recording:
  7.             Print it, or save it to a file or whatever you want
  8.  
Nov 19 '09 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

7
7099
by: John | last post by:
I have over 5000 thumbnail pictures of size 5kb each. I would like to able to load all 5000 pictures and view 50 per page using mysql_data_seek(). I would like to know what are the advantages and disadvantages of using a MySQL blob field rather than reading the images directly from the file? How does one insert an image into a blob field? Can it be done dynamically? Thank you John
0
1838
by: Manoj Sharma | last post by:
I am performance testing some batch processing engines. These are written in ..NET and SQL Server and are typically concerned with generating XML files out of data retrieved from the database. I have two machines one running IIS Server and the other SQL Server. Problem Description: One of the engines is failing with the following exception:
6
3793
by: KevinD | last post by:
assumption: I am new to C and old to COBOL I have been reading a lot (self teaching) but something is not sinking in with respect to reading a simple file - one record at a time. Using C, I am trying to read a flatfile. In COBOL, my simple file layout and READ statement would look like below. Question: what is the standard, simple coding convention for reading in a flatfile - one record at a time?? SCANF does not work because of...
4
3265
by: Gaijinco | last post by:
I had a file named nap.in which looks like this: 4 10:00 12:00 Lectures 12:00 13:00 Lunch, like always. 13:00 15:00 Boring lectures... 15:30 17:45 Reading 4 10:00 12:00 Lectures 12:00 13:00 Lunch, just lunch.
3
2945
by: Purti Malhotra | last post by:
Hi All, In our Web hosting environment we are using Virtual hosting i.e. multiple websites are on one server and multiple domains are pointing to a single website. Issue: We have two domains say “www.Test1.com” and “www.Test2.com” pointing to a single website. Website content is located onto UNCPath i.e. remote location. Domain 1: www.Test1.com points to \\servername\websitefolder\homedirectory
13
2705
by: souissipro | last post by:
Hi, I have written a C program that does some of the functionalities mentionned in my previous topic posted some days ago. This shell should: 1- execute input commands from standard input, and also from a file conatining the commands 2- does the redirection of the input and output from and to files. 3- retrieve the environment variables like HOME,..
6
1620
by: Catch_22 | last post by:
Hi, I have a large SQL Server 2000 database with 3 core tables. Table A : 10 million + records Table B : 2 million + records Table C : 6 million + records One of the batch tasks that I have to perform firstly builds a list of all keys for records from each of the three tables that I need to
4
1419
by: fniles | last post by:
I create a thread where I pass thru a message. When I click very fast many times (like 50 times) to create 50 threads, the message did not get pass thru ProcessMessage. For example: strBuffer = "#TRADE, D1410-123456, BUY, 1, ESM7, DAY, LIMIT, 1490.00, , , 0, 0, 0, 0, 0, links |52994/25/2007 10:47:17 AM !A", when I trigger to create many threads (like 50), this message did not get to sub ProcessMessage in clsEachMessage. I have added...
3
2836
by: The Cool Giraffe | last post by:
Regarding the following code i have a problem. void read () { fstream file; ios::open_mode opMode = ios::in; file.open ("some.txt", opMode); char *ch = new char; vector <charv; while (!file.eof ()) { do {
0
8991
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
8831
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9374
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9249
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6796
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6076
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4607
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3315
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2787
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.