473,396 Members | 1,875 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

remove header line when reading/writing files

I'm a newbie with a large number of data files in multiple
directories. I want to uncompress, read, and copy the contents of
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.

import os
import sys
import glob
import gzip
zipdir = "G:/Research/Data/"
outfilename = "G:/Research/Data/master_data.txt"
outfile = open(outfilename,'w')
os.chdir(zipdir)
dirlist = os.listdir(os.curdir)
for item in dirlist:
if os.path.isdir(item):
os.chdir(item)
filelist = glob.glob("*.gz")
for zipfile in filelist:
filein = gzip.GzipFile(zipfile,'r')
filecontent = filein.read()
filein.close()
outfile.write(filecontent)
os.chdir(os.pardir)
outfile.close()

Oct 11 '07 #1
5 8149
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.
[snip]
for zipfile in filelist:
filein = gzip.GzipFile(zipfile,'r')
filecontent = filein.read()
filein.close()
outfile.write(filecontent)
for zipfile in filelist:
for i, line in gzip.Gzipfile(zipfile,'r'):
if i: outfile.write(line)

should do the trick for you.

If you like a little more readable code, you can change that line to

if i <0: outfile.write(line)

or

if i == 0: continue
outfile.write(line)

whichever you like.

-tkc

Oct 11 '07 #2
Forgot the enumerate call of all things
for zipfile in filelist:
for i, line in enumerate(gzip.Gzipfile(zipfile,'r')):
if i: outfile.write(line)

Some days, I'm braindead.

-tkc
Oct 11 '07 #3
On Oct 12, 12:23 pm, Tim Chase <python.l...@tim.thechases.comwrote:
Forgot the enumerate call of all things
for zipfile in filelist:
for i, line in enumerate(gzip.Gzipfile(zipfile,'r')):
if i: outfile.write(line)

Some days, I'm braindead.

-tkc
I would move the 'if' test outside the loop :

for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for i, line in enumerate(fziter):
outfile.write(line)

I'm not sure if the iter(...) is required. This will raise a
StopIteration exception if zipfile is empty.

Cheers
Tim

Oct 12 '07 #4
ti******@gmail.com wrote:
...
for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for i, line in enumerate(fziter):
outfile.write(line)
Or even:
writes = outfile.write
for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for line in zfiter:
writes(line)
Oct 12 '07 #5
On Thu, 11 Oct 2007 22:52:55 +0000, RyanL wrote:
I'm a newbie with a large number of data files in multiple
directories. I want to uncompress, read, and copy the contents of
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.
Untested version with `itertools.islice()`:

import glob
import gzip
import os
from itertools import islice
def main():
zipdir = 'G:/Research/Data/'
outfilename = 'G:/Research/Data/master_data.txt'
out_file = open(outfilename, 'w')
for name in os.listdir(os.curdir):
if os.path.isdir(name):
os.chdir(name)
for zip_name in glob.glob('*.gz'):
in_file = gzip.GzipFile(zip_name, 'r')
out_file.writelines(islice(in_file, 1, None))
in_file.close()
os.chdir(os.pardir)
out_file.close()

Ciao,
Marc 'BlackJack' Rintsch
Oct 12 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: jock | last post by:
A script i'm working on is behaving real strange. It is more or less the same as dozens of others that work aok. Info is posted from one script to another, the second script does some work...
3
by: Girish | last post by:
I have this XML FILE where I am reading data from and it has this node doctype "< ! D O C T Y P E a d f S Y S T E M " h t t p : / / w h o s c a l l i n g . c o m / d t d / a d f d t d . d t d...
3
by: pooja | last post by:
Suppose i have created a class c1 with f1()in c1.cpp and included this c1.cpp in file1.cpp file , which is also having main() by giving the statement #include "c1.cpp". the same i can do by...
16
by: Michael | last post by:
I have a data application in a2k that I need to create two fixed width text files and then combine them to a single file The first file is header information and the second is transaction data. ...
2
by: Kevin Joplin | last post by:
Hi there, I've almost finished writing simple server application and i have one little doubt. Suppose we have main server code in server.c file. Rest of the code we divided into some pices and put...
0
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen....
5
by: grinder | last post by:
first off, i am an extreme newbie to C. i am an undergrad research assistant and i have been shifted to a project that involves building a fairly involved c program. The part that i am stuck on now...
2
by: Cliff72 | last post by:
I'm creating a database that will be uploading some text files into an access table. The problem is that the text files have a header which messes up my import specs. so what i have had to do is to...
30
by: xiao | last post by:
HI~ EVERY ONE~ I have a small program here, when I tried to compile it , it always reminds that arrary.c: In function `main': arrary.c:39: error: `header' undeclared (first use in this...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.