471,344 Members | 1,083 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,344 software developers and data experts.

remove header line when reading/writing files

I'm a newbie with a large number of data files in multiple
directories. I want to uncompress, read, and copy the contents of
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.

import os
import sys
import glob
import gzip
zipdir = "G:/Research/Data/"
outfilename = "G:/Research/Data/master_data.txt"
outfile = open(outfilename,'w')
os.chdir(zipdir)
dirlist = os.listdir(os.curdir)
for item in dirlist:
if os.path.isdir(item):
os.chdir(item)
filelist = glob.glob("*.gz")
for zipfile in filelist:
filein = gzip.GzipFile(zipfile,'r')
filecontent = filein.read()
filein.close()
outfile.write(filecontent)
os.chdir(os.pardir)
outfile.close()

Oct 11 '07 #1
5 7873
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.
[snip]
for zipfile in filelist:
filein = gzip.GzipFile(zipfile,'r')
filecontent = filein.read()
filein.close()
outfile.write(filecontent)
for zipfile in filelist:
for i, line in gzip.Gzipfile(zipfile,'r'):
if i: outfile.write(line)

should do the trick for you.

If you like a little more readable code, you can change that line to

if i <0: outfile.write(line)

or

if i == 0: continue
outfile.write(line)

whichever you like.

-tkc

Oct 11 '07 #2
Forgot the enumerate call of all things
for zipfile in filelist:
for i, line in enumerate(gzip.Gzipfile(zipfile,'r')):
if i: outfile.write(line)

Some days, I'm braindead.

-tkc
Oct 11 '07 #3
On Oct 12, 12:23 pm, Tim Chase <python.l...@tim.thechases.comwrote:
Forgot the enumerate call of all things
for zipfile in filelist:
for i, line in enumerate(gzip.Gzipfile(zipfile,'r')):
if i: outfile.write(line)

Some days, I'm braindead.

-tkc
I would move the 'if' test outside the loop :

for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for i, line in enumerate(fziter):
outfile.write(line)

I'm not sure if the iter(...) is required. This will raise a
StopIteration exception if zipfile is empty.

Cheers
Tim

Oct 12 '07 #4
ti******@gmail.com wrote:
...
for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for i, line in enumerate(fziter):
outfile.write(line)
Or even:
writes = outfile.write
for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for line in zfiter:
writes(line)
Oct 12 '07 #5
On Thu, 11 Oct 2007 22:52:55 +0000, RyanL wrote:
I'm a newbie with a large number of data files in multiple
directories. I want to uncompress, read, and copy the contents of
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.
Untested version with `itertools.islice()`:

import glob
import gzip
import os
from itertools import islice
def main():
zipdir = 'G:/Research/Data/'
outfilename = 'G:/Research/Data/master_data.txt'
out_file = open(outfilename, 'w')
for name in os.listdir(os.curdir):
if os.path.isdir(name):
os.chdir(name)
for zip_name in glob.glob('*.gz'):
in_file = gzip.GzipFile(zip_name, 'r')
out_file.writelines(islice(in_file, 1, None))
in_file.close()
os.chdir(os.pardir)
out_file.close()

Ciao,
Marc 'BlackJack' Rintsch
Oct 12 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by pooja | last post: by
2 posts views Thread by Kevin Joplin | last post: by
5 posts views Thread by grinder | last post: by
2 posts views Thread by Cliff72 | last post: by
30 posts views Thread by xiao | last post: by
reply views Thread by Ronak mishra | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.