Connecting Tech Pros Worldwide Help | Site Map

remove header line when reading/writing files

RyanL
Guest
 
Posts: n/a
#1: Oct 11 '07
I'm a newbie with a large number of data files in multiple
directories. I want to uncompress, read, and copy the contents of
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.

import os
import sys
import glob
import gzip
zipdir = "G:/Research/Data/"
outfilename = "G:/Research/Data/master_data.txt"
outfile = open(outfilename,'w')
os.chdir(zipdir)
dirlist = os.listdir(os.curdir)
for item in dirlist:
if os.path.isdir(item):
os.chdir(item)
filelist = glob.glob("*.gz")
for zipfile in filelist:
filein = gzip.GzipFile(zipfile,'r')
filecontent = filein.read()
filein.close()
outfile.write(filecontent)
os.chdir(os.pardir)
outfile.close()

Tim Chase
Guest
 
Posts: n/a
#2: Oct 12 '07

re: remove header line when reading/writing files


each file into one master data file. The code below seems to be doing
Quote:
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.
[snip]
Quote:
for zipfile in filelist:
filein = gzip.GzipFile(zipfile,'r')
filecontent = filein.read()
filein.close()
outfile.write(filecontent)
for zipfile in filelist:
for i, line in gzip.Gzipfile(zipfile,'r'):
if i: outfile.write(line)

should do the trick for you.

If you like a little more readable code, you can change that line to

if i <0: outfile.write(line)

or

if i == 0: continue
outfile.write(line)

whichever you like.

-tkc



Tim Chase
Guest
 
Posts: n/a
#3: Oct 12 '07

re: remove header line when reading/writing files


Forgot the enumerate call of all things
Quote:
for zipfile in filelist:
for i, line in enumerate(gzip.Gzipfile(zipfile,'r')):
if i: outfile.write(line)

Some days, I'm braindead.

-tkc


timaranz@gmail.com
Guest
 
Posts: n/a
#4: Oct 12 '07

re: remove header line when reading/writing files


On Oct 12, 12:23 pm, Tim Chase <python.l...@tim.thechases.comwrote:
Quote:
Forgot the enumerate call of all things
>
Quote:
for zipfile in filelist:
for i, line in enumerate(gzip.Gzipfile(zipfile,'r')):
if i: outfile.write(line)
>
Some days, I'm braindead.
>
-tkc
I would move the 'if' test outside the loop :

for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for i, line in enumerate(fziter):
outfile.write(line)

I'm not sure if the iter(...) is required. This will raise a
StopIteration exception if zipfile is empty.

Cheers
Tim

Scott David Daniels
Guest
 
Posts: n/a
#5: Oct 12 '07

re: remove header line when reading/writing files


timaranz@gmail.com wrote:
Quote:
...
for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for i, line in enumerate(fziter):
outfile.write(line)
Or even:
writes = outfile.write
for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for line in zfiter:
writes(line)
Marc 'BlackJack' Rintsch
Guest
 
Posts: n/a
#6: Oct 12 '07

re: remove header line when reading/writing files


On Thu, 11 Oct 2007 22:52:55 +0000, RyanL wrote:
Quote:
I'm a newbie with a large number of data files in multiple
directories. I want to uncompress, read, and copy the contents of
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.
Untested version with `itertools.islice()`:

import glob
import gzip
import os
from itertools import islice


def main():
zipdir = 'G:/Research/Data/'
outfilename = 'G:/Research/Data/master_data.txt'
out_file = open(outfilename, 'w')
for name in os.listdir(os.curdir):
if os.path.isdir(name):
os.chdir(name)
for zip_name in glob.glob('*.gz'):
in_file = gzip.GzipFile(zip_name, 'r')
out_file.writelines(islice(in_file, 1, None))
in_file.close()
os.chdir(os.pardir)
out_file.close()

Ciao,
Marc 'BlackJack' Rintsch
Closed Thread