473,386 Members | 1,812 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

How to Remove Header info, add date, merge files

Hi Folks,

I am working on a little project and hoping someone out there might offer me some info/insight. I have a folder containing about 50 html files where the contents look like this:

Astronomical Applications Dept.
U.S. Naval Observatory
Washington, DC 20392-5420

ST. PETER'S NOVA SCOTIA
o , o ,
W 60 52, N45 39

Altitude and Azimuth of the Sun
Jun 1, 2008
Zone: 3h West of Greenwich

Altitude Azimuth
(E of N)

h m o o
03:55 -11.6 40.5
04:00 -11.0 41.6
04:05 -10.4 42.6
04:10 -9.9 43.6
04:15 -9.2 44.6
04:20 -8.6 45.6
04:25 -8.0 46.6
04:30 -7.4 47.5
04:35 -6.7 48.5
04:40 -6.0 49.5
04:45 -5.4 50.4
04:50 -4.7 51.4
04:55 -4.0 52.3
05:00 -3.3 53.2
05:05 -2.6 54.1
05:10 -1.9 55.0
05:15 -1.2 56.0
05:20 0.1 56.9
05:25 0.7 57.8
05:30 1.4 58.6
05:35 2.1 59.5
05:40 2.8 60.4

What I need to do is remove the header info from each file, and add a date in front of each remaining line so that it looks like this:

2008-06-01 03:55 -11.6 40.5

Finally, I need to merge all the info from these individual files into one master.dat file.

If anyone can offer some advice on how I would be able to do this, that would be awesome.
Thanks Very Much!
Feb 12 '08 #1
1 1973
bvdet
2,851 Expert Mod 2GB
Something like this:
Expand|Select|Wrap|Line Numbers
  1. import re, os
  2.  
  3. dir_name = 'your_directory'
  4.  
  5. def combine_files(dir_name, fn, prefix):
  6.     fileList = []
  7.     for file in os.listdir(dir_name):
  8.         dirfile = os.path.join(dir_name, file)
  9.         if os.path.isfile(dirfile):
  10.             fileList.append(dirfile)
  11.     patt = re.compile(r'\d{2}:')
  12.     outList = []
  13.     for f in fileList:
  14.         print 'Parsing file %s' % f
  15.         for line in open(f).readlines():
  16.             if patt.match(line):
  17.                 outList.append(' '.join([prefix, line.strip()]))
  18.     ff = open(fn, 'w')
  19.     ff.write('\n'.join(outList))
  20.     ff.close()
  21.  
  22. combine_files(dir_name, 'combined.dat', '2008-06-01')
Feb 12 '08 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Andrew Ward | last post by:
Hi All, I was wondering if it is possible to use precompiled headers without having to include a <stdafx.h> or whatever in every source file. My problem is that I have a project that makes heavy...
0
by: Shrage H. Smilowitz | last post by:
Hi, I have created a setup project in vs.net, i have included several merge modules, but when i test the installation on my development machine i dont want the uninstall to remove the files that...
31
by: Extremest | last post by:
I have a loop that is set to run as long as the arraylist is > 0. at the beginning of this loop I grab the first object and then remove it. I then go into another loop that checks to see if there...
2
by: Cliff72 | last post by:
I'm creating a database that will be uploading some text files into an access table. The problem is that the text files have a header which messes up my import specs. so what i have had to do is to...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.