On 20 Dec 2005 08:06:39 -0800, "sicvic" <mo************ @gmail.com> wrote:
Not homework...not even in school (do any universities even teach
classes using python?). Just not a programmer. Anyways I should
probably be more clear about what I'm trying to do.
Ok, not homework.
Since I cant show the actual output file lets say I had an output file
that looked like this:
aaaaa bbbbb Person: Jimmy
Current Location: Denver
Next Location: Chicago
----------------------------------------------
aaaaa bbbbb Person: Sarah
Current Location: San Diego
Next Location: Miami
Next Location: New York
----------------------------------------------
Now I want to put (and all recurrences of "Person: Jimmy")
Person: Jimmy
Current Location: Denver
Next Location: Chicago
in a file called jimmy.txt
and the same for Sarah in sarah.txt
The code I currently have looks something like this:
import re
import sys
person_jimmy = open('jimmy.txt ', 'w') #creates jimmy.txt
person_sarah = open('sarah.txt ', 'w') #creates sarah.txt
f = open(sys.argv[1]) #opens output file
#loop that goes through all lines and parses specified text
for line in f.readlines():
if re.search(r'Per son: Jimmy', line):
person_jimmy.wr ite(line)
elif re.search(r'Per son: Sarah', line):
person_sarah.wr ite(line)
#closes all files
person_jimmy.c lose()
person_sarah.c lose()
f.close()
However this only would produces output files that look like this:
jimmy.txt:
aaaaa bbbbb Person: Jimmy
sarah.txt:
aaaaa bbbbb Person: Sarah
My question is what else do I need to add (such as an embedded loop
where the if statements are?) so the files look like this
aaaaa bbbbb Person: Jimmy
Current Location: Denver
Next Location: Chicago
and
aaaaa bbbbb Person: Sarah
Current Location: San Diego
Next Location: Miami
Next Location: New York
Basically I need to add statements that after finding that line copy
all the lines following it and stopping when it sees
'----------------------------------------------'
Any help is greatly appreciated.
Ok, I generalized on your theme of extracting file chunks to named files,
where the beginning line has the file name. I made '.txt' hardcoded extension.
I provided a way to direct the output to a (I guess not necessarily sub) directory
Not tested beyond what you see. Tweak to suit.
----< extractfilesegs .py >--------------------------------------------------------
"""
Usage: [python] extractfilesegs [source [outdir [startpat [endpat]]]]
where source is -tf for test file, a file name, or an open file
outdir is a directory prefix that will be joined to output file names
startpat is a regular expression with group 1 giving the extracted file name
endpat is a regular expression whose match line is excluded and ends the segment
"""
import re, os
def extractFileSegs (linesrc, outdir='extract eddata', start=r'Person: \s+(\w+)', stop='-'*30):
rxstart = re.compile(star t)
rxstop = re.compile(stop )
if isinstance(line src, basestring): linesrc = open(linesrc)
lineit = iter(linesrc)
files = []
for line in lineit:
match = rxstart.search( line)
if not match: continue
name = match.group(1)
filename = name.lower() + '.txt'
filename = os.path.join(ou tdir, filename)
#print 'opening file %r'%filename
files.append(fi lename)
fout = open(filename, 'a') # append in case repeats?
fout.write(matc h.group(0)+'\n' ) # did you want aaa bbb stuff?
for data_line in lineit:
if rxstop.search(d ata_line):
#print 'closing file %r'%filename
fout.close() # don't write line with ending mark
fout = None
break
else:
fout.write(data _line)
if fout:
fout.close()
print 'file %r ended with source file EOF, not stop mark'%filename
return files
def get_testfile():
from StringIO import StringIO
return StringIO("""\
....irrelevant leading
stuff ...
aaaaa bbbbb Person: Jimmy
Current Location: Denver
Next Location: Chicago
----------------------------------------------
aaaaa bbbbb Person: Sarah
Current Location: San Diego
Next Location: Miami
Next Location: New York
----------------------------------------------
irrelevant
trailing stuff ...
with a blank line
""")
if __name__ == '__main__':
import sys
args = sys.argv[1:]
if not args: raise SystemExit(__do c__)
tf = args.pop(0)
if tf=='-tf': fin = get_testfile()
else: fin = tf
if not args:
files = extractFileSegs (fin)
elif len(args)==1:
files = extractFileSegs (fin, args[0])
elif len(args)==2:
files = extractFileSegs (fin, args[0], args[1], '^$') # stop on blank line?
else:
files = extractFileSegs (fin, args[0], '|'.join(args[1:-1]), args[-1])
print '\nFiles created:'
for fname in files:
print ' "%s"'% fname
if tf == '-tf':
for fpath in files:
print '====< %s >====\n%s====== ======'%(fpath, open(fpath).rea d())
----------------------------------------------------------------------------------
Running on your test data:
[15:19] C:\pywk\clp>md extracteddata
[15:19] C:\pywk\clp>py2 4 extractfilesegs .py -tf
Files created:
"extracteddata\ jimmy.txt"
"extracteddata\ sarah.txt"
====< extracteddata\j immy.txt >====
Person: Jimmy
Current Location: Denver
Next Location: Chicago
============
====< extracteddata\s arah.txt >====
Person: Sarah
Current Location: San Diego
Next Location: Miami
Next Location: New York
============
[15:20] C:\pywk\clp>md xd
[15:20] C:\pywk\clp>py2 4 extractfilesegs .py -tf xd (Jimmy) ----
Files created:
"xd\jimmy.t xt"
====< xd\jimmy.txt >====
Jimmy
Current Location: Denver
Next Location: Chicago
============
[15:21] C:\pywk\clp>py2 4 extractfilesegs .py -tf xd "Person: (Sarah)" ----
Files created:
"xd\sarah.t xt"
====< xd\sarah.txt >====
Person: Sarah
Current Location: San Diego
Next Location: Miami
Next Location: New York
============
[15:22] C:\pywk\clp>py2 4 extractfilesegs .py -tf xd "^(irreleva nt)"
Files created:
"xd\irrelevant. txt"
====< xd\irrelevant.t xt >====
irrelevant
trailing stuff ...
============
HTH, NO WARRANTIES ;-)
Regards,
Bengt Richter