472,954 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,954 software developers and data experts.

Which is the better way to parse this file?

Hi,

I'm interested in parsing a file containing this "structure":

"""dataset {
int catalog_number;
sequence {
string experimenter;
int32 time;
structure {
float64 latitude;
float64 longitude;
} location;
sequence {
float depth;
float temperature;
} xbt;
} casts;
} data;"""

I want to obtain a dictionary like this:
pprint.pprint(data)

{'casts': {'experimenter': None,
'location': {'latitude': None, 'longitude': None},
'time': None,
'xbt': {'depth': None, 'temperature': None}},
'catalog_number': None}

The values ('None') will be filled later. I tried to do the parsing
using regular expressions, but things became too complicated. I had
more success using SimpleParse, but I'm interested in more insights on
different ways of parsing this file.

TIA,

Roberto
Jul 18 '05 #1
2 1884

"Roberto A. F. De Almeida" <ro*****@dealmeida.net> wrote in message
news:10**************************@posting.google.c om...
I'm interested in parsing a file containing this "structure":

"""dataset {
int catalog_number;
sequence {
string experimenter;
int32 time;
structure {
float64 latitude;
float64 longitude;
} location;
sequence {
float depth;
float temperature;
} xbt;
} casts;
} data;"""
I suspect that what you actually want to do is parse structures 'like'
the above, as defined be a grammar not shown ;-)

You did not specify whether you will get such files from an
uncontrolable external source or whether you control the input format.
If the later, there is no obvious reason for separate database,
sequence, and structure productions since all three result in
dictionaries with no functional difference.
I want to obtain a dictionary like this:
pprint.pprint(data)
{'casts': {'experimenter': None,
'location': {'latitude': None, 'longitude': None},
'time': None,
'xbt': {'depth': None, 'temperature': None}},
'catalog_number': None}
The values ('None') will be filled later.


Using None as placeholders either tosses the type information or
requires that it be recorded elsewhere. Use the int and float type
objects instead. Note that standard Python cannot differentiate
between float and float64.
I tried to do the parsing
using regular expressions, but things became too complicated.
REs are great for linear repetition but not for indefinite nesting.
I had
more success using SimpleParse, but I'm interested in more insights on different ways of parsing this file.


I know nothing of SimpleParse (and therefore, of what would be
different). If the grammar is as simple as I infer from the sample --
dataset and sequences containing sequences, structures, and types -- I
would reread about recursive-descent parsing and maybe try that. The
type_entry function would return a (name, typeobject) pair and the
structure, sequence, and database functions a (name, dict) pair.

But as hinted above, I would think about simplifying the grammar
before worryinng about parsing. If you only have sequences of
sequences and type entries, parsing is trivial.

Terry J. Reedy
Jul 18 '05 #2
"Terry Reedy" <tj*****@udel.edu> wrote in message news:<au********************@comcast.com>...
I suspect that what you actually want to do is parse structures 'like'
the above, as defined be a grammar not shown ;-)
Yes, you're right. :)

The grammar is not complex, but I'm still struggling to process the
result tree.
You did not specify whether you will get such files from an
uncontrolable external source or whether you control the input format.
If the later, there is no obvious reason for separate database,
sequence, and structure productions since all three result in
dictionaries with no functional difference.
This is a Dataset Descriptor for the Data Access Protocol
(http://www.unidata.ucar.edu/packages...dap-rfc-html/), an
API to access remote datasets. DAP servers describe their datasets
using this grammar, and I'm developing a module to access DAP servers.
I want to obtain a dictionary like this:
>> pprint.pprint(data)

{'casts': {'experimenter': None,
'location': {'latitude': None, 'longitude': None},
'time': None,
'xbt': {'depth': None, 'temperature': None}},
'catalog_number': None}
The values ('None') will be filled later.


Using None as placeholders either tosses the type information or
requires that it be recorded elsewhere. Use the int and float type
objects instead. Note that standard Python cannot differentiate
between float and float64.


Ok. One of the strong points of DAP is that data is retrieved only for
your region/period of interest. I created a class and redefined
__getitem__ so that data is only retrieved from the server when the
object is sliced.
data = file("http://dods.gso.uri.edu/cgi-bin/nph-nc/data/fnoc1.nc")
print data.variables['lat'].shape (17,) print data.variables['lat'][1:4] # only this subset is retrieved

[ 47.5 45. 42.5 40. ]
I know nothing of SimpleParse (and therefore, of what would be
different). If the grammar is as simple as I infer from the sample --
dataset and sequences containing sequences, structures, and types -- I
would reread about recursive-descent parsing and maybe try that. The
type_entry function would return a (name, typeobject) pair and the
structure, sequence, and database functions a (name, dict) pair.
Yes, it's very simple. As you see, even a structure is identical to a
sequence. The declarations are basically "types" or declarations
containing "types". Do you think it can be done without 3rd party
modules?
But as hinted above, I would think about simplifying the grammar
before worryinng about parsing. If you only have sequences of
sequences and type entries, parsing is trivial.


I'll take a look in that. Thanks very much for the insights.

Regards,

Roberto
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: chuck amadi | last post by:
any python script which will parse an email messages into a file to poplulate a database. Im trying with UnixMailbox but I cant figure out howto abstract the all email data messages to a file . ...
4
by: Matteo | last post by:
Hy everybody. I'm not a html writer, but a sysadmin who's trying to help a user able to compile an online form with IE but not with Mozilla (Moz1.6, Ns7.1, Firefox 0.8+) due to a javascript date...
5
by: Joergen Bech | last post by:
Basically, I want to convert hex values in the range "00000000" to "FFFFFFFF" to a signed, 32-bit Integer value. In VB6, I could just write lngValue = Val(hexstring$). In VB.Net, I seem to be...
11
by: UJ | last post by:
If I've got a video/audio file, how can I tell what Codec it needs? I want to be able to let the user upload a file to a server but I want to make sure before hand that the codec is already...
18
by: Steven Borrelli | last post by:
Hello, I am using the <?php include() ?statement on my website for organizational purposes. However, one of my includes contains some PHP code. Is there any way for the server to actually...
8
by: =?Utf-8?B?eWRibg==?= | last post by:
I need to write a program validate a text file in CSV format. So I will have a class DataType and a lot of of derived class for various type, e.g. IntType, StringType, FloatType, MoneyType,...
11
by: Peter Pei | last post by:
One bad design about elementtree is that it has different ways parsing a string and a file, even worse they return different objects: 1) When you parse a file, you can simply call parse, which...
6
by: Tony | last post by:
Hello! It seems to me that both Int32.Parse(..) and Convert.ToInt32(...) static methods works in exactly the same way. Both can throw an exeption. So is it any different at all between these...
5
by: goldtech | last post by:
SAX XML Parse Python error message Hi, My first attempt at SAX, but have an error message I need help with. I cite the error message, code, and xml below. Be grateful if anyone can tell me...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.