473,396 Members | 1,780 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Parsing of a file

I have a file with the format

Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10

I would like to parse this file by extracting the field id, ra, dec
and mjd for each line. It is
not, however, certain that the width of each value of the field id,
ra, dec or mjd is the same
in each line. Is there a way to do this such that even if there was a
line where Ra=****** and
MJD=******** was swapped it would be parsed correctly?

Cheers
Tommy
Aug 6 '08 #1
8 1150
On Aug 6, 1:55*pm, Tommy Grav <tg...@mac.comwrote:
I have a file with the format

Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames *
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames *
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames *
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames *
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames *
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames *
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames *
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames *
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames *
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames *
5 Set 10

I would like to parse this file by extracting the field id, ra, dec *
and mjd for each line. It is
not, however, certain that the width of each value of the field id, *
ra, dec or mjd is the same
in each line. Is there a way to do this such that even if there was a *
line where Ra=****** and
MJD=******** was swapped it would be parsed correctly?

Cheers
* *Tommy
I'm sure Python can handle this. Try the PyParsing module or learn
Python regular expression syntax.

http://pyparsing.wikispaces.com/

You could probably do it very crudely by just iterating over each line
and then using the string's find() method.

Mike
Aug 6 '08 #2
On Aug 7, 6:02 am, Mike Driscoll <kyoso...@gmail.comwrote:
On Aug 6, 1:55 pm, Tommy Grav <tg...@mac.comwrote:
I have a file with the format
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10
I would like to parse this file by extracting the field id, ra, dec
and mjd for each line. It is
not, however, certain that the width of each value of the field id,
ra, dec or mjd is the same
in each line. Is there a way to do this such that even if there was a
line where Ra=****** and
MJD=******** was swapped it would be parsed correctly?
Cheers
Tommy

I'm sure Python can handle this. Try the PyParsing module or learn
Python regular expression syntax.

http://pyparsing.wikispaces.com/

You could probably do it very crudely by just iterating over each line
and then using the string's find() method.
Perhaps you and the OP could spend some time becoming familiar with
built-in functions and str methods. In particular, str.split is your
friend:

C:\junk>type tommy_grav.py
# Look, Ma, no imports!

guff = """\
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: MJD=53370.06811620123 Dec=+79:39:43.9 Ra=20:24:58.13
Frames 5 Set
2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5

Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10

"""

is_angle = {
'ra': True,
'dec': True,
'mjd': False,
}

def convert_angle(text):
deg, min, sec = map(float, text.split(':'))
return (sec / 60. + min) / 60. + deg

def parse_line(line):
t = line.split()
assert t[0].lower() == 'field'
assert t[1].startswith('f')
assert t[1].endswith(':')
field_id = t[1].rstrip(':')
rdict = {}
for f in t[2:]:
parts = f.split('=')
if len(parts) == 2:
key = parts[0].lower()
value = parts[1]
assert key not in rdict
if is_angle[key]:
rvalue = convert_angle(value)
else:
rvalue = float(value)
rdict[key] = rvalue
return field_id, rdict['ra'], rdict['dec'], rdict['mjd']

for line in guff.splitlines():
line = line.strip()
if not line:
continue
field_id, ra, dec, mjd = parse_line(line)
print field_id, ra, dec, mjd
C:\junk>tommy_grav.py
f29227 20.3962611111 67.5 53370.0679769
f31448 20.4161472222 79.6621944444 53370.0681162
f31226 20.4126388889 78.4458888889 53370.0682386
f31004 20.4181333333 77.2296944444 53370.0683602
f30782 20.4310944444 76.0135 53370.0684821
f30560 20.4505055556 74.7973055556 53370.068604
f30338 20.4756527778 73.5811111111 53370.0687262
f30116 20.5060277778 72.3648888889 53370.0688489
f29894 20.5412611111 71.1486111111 53370.0689707
f29672 20.5810805556 69.9323888889 53370.0690935

Cheers,
John

Aug 6 '08 #3
Using something like PyParsing is probably better, but if you don't
want to use it you may use something like this:

raw_data = """
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10"""

# from each line extract the fields: id, ra, dec, mjd
# even if they are swapped

data = []
for line in raw_data.lower().splitlines():
if line.startswith("field"):
parts = line.split()
record = {"id": int(parts[1][1:-1])}
for part in parts[2:]:
if "=" in part:
title, field = part.split("=")
record[title] = field
data.append(record)
print data

-----------------

Stefan Behnel:
>You can use named groups in a single regular expression.<
Can you show how to use them in this situation when fields can be
swapped?

Bye,
bearophile
Aug 6 '08 #4
On Aug 7, 7:06 am, John Machin <sjmac...@lexicon.netwrote:
On Aug 7, 6:02 am, Mike Driscoll <kyoso...@gmail.comwrote:
On Aug 6, 1:55 pm, Tommy Grav <tg...@mac.comwrote:
I have a file with the format
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10
I would like to parse this file by extracting the field id, ra, dec
and mjd for each line. It is
not, however, certain that the width of each value of the field id,
ra, dec or mjd is the same
in each line. Is there a way to do this such that even if there was a
line where Ra=****** and
MJD=******** was swapped it would be parsed correctly?
Cheers
Tommy
I'm sure Python can handle this. Try the PyParsing module or learn
Python regular expression syntax.
http://pyparsing.wikispaces.com/
You could probably do it very crudely by just iterating over each line
and then using the string's find() method.

Perhaps you and the OP could spend some time becoming familiar with
built-in functions and str methods. In particular, str.split is your
friend:

C:\junk>type tommy_grav.py
# Look, Ma, no imports!

guff = """\
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: MJD=53370.06811620123 Dec=+79:39:43.9 Ra=20:24:58.13
Frames 5 Set
2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5

Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10

"""

is_angle = {
'ra': True,
'dec': True,
'mjd': False,
}

def convert_angle(text):
deg, min, sec = map(float, text.split(':'))
return (sec / 60. + min) / 60. + deg

def parse_line(line):
t = line.split()
assert t[0].lower() == 'field'
assert t[1].startswith('f')
assert t[1].endswith(':')
field_id = t[1].rstrip(':')
rdict = {}
for f in t[2:]:
parts = f.split('=')
if len(parts) == 2:
key = parts[0].lower()
value = parts[1]
assert key not in rdict
if is_angle[key]:
rvalue = convert_angle(value)
else:
rvalue = float(value)
rdict[key] = rvalue
return field_id, rdict['ra'], rdict['dec'], rdict['mjd']

for line in guff.splitlines():
line = line.strip()
if not line:
continue
field_id, ra, dec, mjd = parse_line(line)
print field_id, ra, dec, mjd

C:\junk>tommy_grav.py
f29227 20.3962611111 67.5 53370.0679769
f31448 20.4161472222 79.6621944444 53370.0681162
f31226 20.4126388889 78.4458888889 53370.0682386
f31004 20.4181333333 77.2296944444 53370.0683602
f30782 20.4310944444 76.0135 53370.0684821
f30560 20.4505055556 74.7973055556 53370.068604
f30338 20.4756527778 73.5811111111 53370.0687262
f30116 20.5060277778 72.3648888889 53370.0688489
f29894 20.5412611111 71.1486111111 53370.0689707
f29672 20.5810805556 69.9323888889 53370.0690935

Cheers,
John
Slightly less ugly:

C:\junk>diff tommy_grav.py tommy_grav_2.py
18,23d17
< is_angle = {
< 'ra': True,
< 'dec': True,
< 'mjd': False,
< }
<
27a22,27
converter = {
'ra': convert_angle,
'dec': convert_angle,
'mjd': float,
}
41,44c41
< if is_angle[key]:
< rvalue = convert_angle(value)
< else:
< rvalue = float(value)
---
rvalue = converter[key](value)
Aug 6 '08 #5
On Aug 6, 3:55*pm, Tommy Grav <tg...@mac.comwrote:
I have a file with the format

Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames *
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames *
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames *
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames *
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames *
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames *
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames *
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames *
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames *
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames *
5 Set 10

I would like to parse this file by extracting the field id, ra, dec *
and mjd for each line. It is
not, however, certain that the width of each value of the field id, *
ra, dec or mjd is the same
in each line. Is there a way to do this such that even if there was a *
line where Ra=****** and
MJD=******** was swapped it would be parsed correctly?

Cheers
* *Tommy
Did you consider changing the file format in the first place, so that
you don't have to do any contortions to parse it ?

Anyway, here is a solution with regular expressions (I'm a beginner
with re's in python, so, please correct it if wrong and suggest better
solutions):

import re
s = """Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690
Frames 5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Dec=+74:47:50.3 Ra=20:27:01.82 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10"""

s = s.split('\n')
r = re.compile(r'Field (\S+): (?:(?:Ra=(\S+) Dec=(\S+))|(?:Dec=(\S+)
Ra=(\S+))) MJD=(\S+)')
for i in s:
match = r.findall(i)
field = match[0][0]
Ra = match[0][1] or match[0][4]
Dec = match[0][2] or match[0][3]
MJD = match[0][5]
print field, Ra, Dec, MJD
Aug 6 '08 #6
Tommy Grav a écrit :
I have a file with the format

Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames 5
Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames 5
Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames 5
Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames 5
Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames 5
Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames 5
Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames 5
Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames 5
Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames 5
Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames 5
Set 10

I would like to parse this file by extracting the field id, ra, dec and
mjd for each line. It is
not, however, certain that the width of each value of the field id, ra,
dec or mjd is the same
in each line. Is there a way to do this such that even if there was a
line where Ra=****** and
MJD=******** was swapped it would be parsed correctly?
Q&D :

src = open('/path/to/yourfile.ext')
parsed = []
for line in src:
line = line.strip()
if not line:
continue
head, rest = line.split(':', 1)
field_id = head.split()[1]
data = dict(field_id=field_id)
parts = rest.split()
for part in parts:
try:
key, val = part.split('=')
except ValueError:
continue
data[key] = val
parsed.append(data)
src.close()
Aug 7 '08 #7
On Aug 6, 4:06*pm, John Machin <sjmac...@lexicon.netwrote:
On Aug 7, 6:02 am, Mike Driscoll <kyoso...@gmail.comwrote:
On Aug 6, 1:55 pm, Tommy Grav <tg...@mac.comwrote:
I have a file with the format
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350Frames
5 Set 10
I would like to parse this file by extracting the field id, ra, dec
and mjd for each line. It is
not, however, certain that the width of each value of the field id,
ra, dec or mjd is the same
in each line. Is there a way to do this such that even if there was a
line where Ra=****** and
MJD=******** was swapped it would be parsed correctly?
Cheers
* *Tommy
I'm sure Python can handle this. Try the PyParsing module or learn
Python regular expression syntax.
http://pyparsing.wikispaces.com/
You could probably do it very crudely by just iterating over each line
and then using the string's find() method.

Perhaps you and the OP could spend some time becoming familiar with
built-in functions and str methods. In particular, str.split is your
friend:
I'm well aware of the split() method and built-ins, however since this
appeared to be a homework-type question and I was at work, I didn't
spend any time on the issue. The only reason I mentioned McGuire's
PyParsing module was because I had just finished reading his article
on the subject in Python Magazine and it sounded like something the OP
might find interesting.

Here's my own implementation based on what's already been done here.
I'm sure one get have some fun doing it with itertools or list
comprehensions if you wanted to get really fancy.

<code>

raw_data = """
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10
""".splitlines()

myList = []
for line in raw_data:
items = line.split()
myDict = {}
for item in items:
if '=' in item:
key, value = item.split('=')
myDict[key] = value
elif item[:1].lower() == 'f' and item[-1:] == ':':
myDict['id'] = item[1:-1]
myList.append(myDict)

print myList

</code>

This doesn't have any type checking or error handling, but it works
with the data provided.

Mike
Aug 7 '08 #8

On Aug 7, 2008, at 12:52 PM, Mike Driscoll wrote:
I'm well aware of the split() method and built-ins, however since this
appeared to be a homework-type question and I was at work, I didn't
spend any time on the issue. The only reason I mentioned McGuire's
PyParsing module was because I had just finished reading his article
on the subject in Python Magazine and it sounded like something the OP
might find interesting.\
Thanks to everyone that responded, I learned a lot about text parsing
from
the responses. I just wanted to respond to Mike and let him know that
this
was not a homework problem. I was given a file in the format by a
colleague
for a project that I am working on (it contains a list of fields
observed by
the LINEAR asteroid search project during 2005 and 2006). I could have
parsed it using slices of each line, but the unusual format of each line
got me thinking about wether there was another way to do it. I had
tried a
few approaches, but I had not considered the .split() and .split("=").
Of course
the list members quickly came up with a simple and elegant solution. And
I learned a lot in the process :)

Cheers
Tommy Grav
+
-----------------------------------------------------------------------------------------------------------------+
Associate Research Scientist Dept. of Physics and Astronomy
Johns Hopkins University Bloomberg 243
tg***@pha.jhu.edu 3400 N. Charles St.
(410) 516-7683 Baltimore, MD21218
+
-----------------------------------------------------------------------------------------------------------------+


Aug 8 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Willem Ligtenberg | last post by:
I decided to use SAX to parse my xml file. But the parser crashes on: File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception...
2
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
3
by: Pir8 | last post by:
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows: <?xml version="1.0" encoding="ISO-8859-1" ?> <magazine> <story>...
1
by: Christoph Bisping | last post by:
Hello! Maybe someone is able to give me a little hint on this: I've written a vb.net app which is mainly an interpreter for specialized CAD/CAM files. These files mainly contain simple movement...
4
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr>...
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
9
by: Paulers | last post by:
Hello, I have a log file that contains many multi-line messages. What is the best approach to take for extracting data out of each message and populating object properties to be stored in an...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
13
by: charliefortune | last post by:
I am fetching some product feeds with PHP like this $merch = substr($key,1); $feed = file_get_contents($_POST); $fp = fopen("./feeds/feed".$merch.".txt","w+"); fwrite ($fp,$feed); fclose...
2
by: Felipe De Bene | last post by:
I'm having problems parsing an HTML file with the following syntax : <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> <TH...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.