473,748 Members | 4,935 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Which is the better way to parse this file?

Hi,

I'm interested in parsing a file containing this "structure" :

"""dataset {
int catalog_number;
sequence {
string experimenter;
int32 time;
structure {
float64 latitude;
float64 longitude;
} location;
sequence {
float depth;
float temperature;
} xbt;
} casts;
} data;"""

I want to obtain a dictionary like this:
pprint.pprint(d ata)

{'casts': {'experimenter' : None,
'location': {'latitude': None, 'longitude': None},
'time': None,
'xbt': {'depth': None, 'temperature': None}},
'catalog_number ': None}

The values ('None') will be filled later. I tried to do the parsing
using regular expressions, but things became too complicated. I had
more success using SimpleParse, but I'm interested in more insights on
different ways of parsing this file.

TIA,

Roberto
Jul 18 '05 #1
2 1922

"Roberto A. F. De Almeida" <ro*****@dealme ida.net> wrote in message
news:10******** *************** ***@posting.goo gle.com...
I'm interested in parsing a file containing this "structure" :

"""dataset {
int catalog_number;
sequence {
string experimenter;
int32 time;
structure {
float64 latitude;
float64 longitude;
} location;
sequence {
float depth;
float temperature;
} xbt;
} casts;
} data;"""
I suspect that what you actually want to do is parse structures 'like'
the above, as defined be a grammar not shown ;-)

You did not specify whether you will get such files from an
uncontrolable external source or whether you control the input format.
If the later, there is no obvious reason for separate database,
sequence, and structure productions since all three result in
dictionaries with no functional difference.
I want to obtain a dictionary like this:
pprint.pprint(d ata)
{'casts': {'experimenter' : None,
'location': {'latitude': None, 'longitude': None},
'time': None,
'xbt': {'depth': None, 'temperature': None}},
'catalog_number ': None}
The values ('None') will be filled later.


Using None as placeholders either tosses the type information or
requires that it be recorded elsewhere. Use the int and float type
objects instead. Note that standard Python cannot differentiate
between float and float64.
I tried to do the parsing
using regular expressions, but things became too complicated.
REs are great for linear repetition but not for indefinite nesting.
I had
more success using SimpleParse, but I'm interested in more insights on different ways of parsing this file.


I know nothing of SimpleParse (and therefore, of what would be
different). If the grammar is as simple as I infer from the sample --
dataset and sequences containing sequences, structures, and types -- I
would reread about recursive-descent parsing and maybe try that. The
type_entry function would return a (name, typeobject) pair and the
structure, sequence, and database functions a (name, dict) pair.

But as hinted above, I would think about simplifying the grammar
before worryinng about parsing. If you only have sequences of
sequences and type entries, parsing is trivial.

Terry J. Reedy
Jul 18 '05 #2
"Terry Reedy" <tj*****@udel.e du> wrote in message news:<au******* *************@c omcast.com>...
I suspect that what you actually want to do is parse structures 'like'
the above, as defined be a grammar not shown ;-)
Yes, you're right. :)

The grammar is not complex, but I'm still struggling to process the
result tree.
You did not specify whether you will get such files from an
uncontrolable external source or whether you control the input format.
If the later, there is no obvious reason for separate database,
sequence, and structure productions since all three result in
dictionaries with no functional difference.
This is a Dataset Descriptor for the Data Access Protocol
(http://www.unidata.ucar.edu/packages...dap-rfc-html/), an
API to access remote datasets. DAP servers describe their datasets
using this grammar, and I'm developing a module to access DAP servers.
I want to obtain a dictionary like this:
>> pprint.pprint(d ata)

{'casts': {'experimenter' : None,
'location': {'latitude': None, 'longitude': None},
'time': None,
'xbt': {'depth': None, 'temperature': None}},
'catalog_number ': None}
The values ('None') will be filled later.


Using None as placeholders either tosses the type information or
requires that it be recorded elsewhere. Use the int and float type
objects instead. Note that standard Python cannot differentiate
between float and float64.


Ok. One of the strong points of DAP is that data is retrieved only for
your region/period of interest. I created a class and redefined
__getitem__ so that data is only retrieved from the server when the
object is sliced.
data = file("http://dods.gso.uri.ed u/cgi-bin/nph-nc/data/fnoc1.nc")
print data.variables['lat'].shape (17,) print data.variables['lat'][1:4] # only this subset is retrieved

[ 47.5 45. 42.5 40. ]
I know nothing of SimpleParse (and therefore, of what would be
different). If the grammar is as simple as I infer from the sample --
dataset and sequences containing sequences, structures, and types -- I
would reread about recursive-descent parsing and maybe try that. The
type_entry function would return a (name, typeobject) pair and the
structure, sequence, and database functions a (name, dict) pair.
Yes, it's very simple. As you see, even a structure is identical to a
sequence. The declarations are basically "types" or declarations
containing "types". Do you think it can be done without 3rd party
modules?
But as hinted above, I would think about simplifying the grammar
before worryinng about parsing. If you only have sequences of
sequences and type entries, parsing is trivial.


I'll take a look in that. Thanks very much for the insights.

Regards,

Roberto
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3304
by: chuck amadi | last post by:
any python script which will parse an email messages into a file to poplulate a database. Im trying with UnixMailbox but I cant figure out howto abstract the all email data messages to a file . ## mailbox-Survey.py #!/usr/bin/env python import mailbox,rfc822 # Open Users Mailbox mb = mailbox.UnixMailbox(open("/var/spool/mail/chucka"))
4
43536
by: Matteo | last post by:
Hy everybody. I'm not a html writer, but a sysadmin who's trying to help a user able to compile an online form with IE but not with Mozilla (Moz1.6, Ns7.1, Firefox 0.8+) due to a javascript date check. Let's go straight to the point: <script language="JavaScript"> alert("Date: "+Date.parse("2000-01-01"))
5
8039
by: Joergen Bech | last post by:
Basically, I want to convert hex values in the range "00000000" to "FFFFFFFF" to a signed, 32-bit Integer value. In VB6, I could just write lngValue = Val(hexstring$). In VB.Net, I seem to be forced to do something like ---snip--- Private Function HexToInteger(ByVal hexValue As String) As Integer
11
4756
by: UJ | last post by:
If I've got a video/audio file, how can I tell what Codec it needs? I want to be able to let the user upload a file to a server but I want to make sure before hand that the codec is already installed on the machine. If not I'll tell them it won't work. Any ideas how to do this? (I don't want them to download codec - I just want to use the codecs I have on the machine already.)
18
2843
by: Steven Borrelli | last post by:
Hello, I am using the <?php include() ?statement on my website for organizational purposes. However, one of my includes contains some PHP code. Is there any way for the server to actually parse the include? I've tried this before, and it did not parse the include. Rather, it included the file as just plain ASCII. ======================= /*EXAMPLE 1*/ /*index.php*/
8
2360
by: =?Utf-8?B?eWRibg==?= | last post by:
I need to write a program validate a text file in CSV format. So I will have a class DataType and a lot of of derived class for various type, e.g. IntType, StringType, FloatType, MoneyType, ... etc. For each column of a type, it may accept null/empty value. or not. It may have various max length for StringType, IntType,... etc.
11
3541
by: Peter Pei | last post by:
One bad design about elementtree is that it has different ways parsing a string and a file, even worse they return different objects: 1) When you parse a file, you can simply call parse, which returns a elementtree, on which you can then apply xpath; 2) To parse a string (xml section), you can call XML or fromstring, but both return element instead of elementtree. This alone is bad. To make it worse, you have to create an elementtree from...
6
1595
by: Tony | last post by:
Hello! It seems to me that both Int32.Parse(..) and Convert.ToInt32(...) static methods works in exactly the same way. Both can throw an exeption. So is it any different at all between these two ? string input1 = Console.ReadLine(); string input2 = Console.ReadLine();
5
2708
by: goldtech | last post by:
SAX XML Parse Python error message Hi, My first attempt at SAX, but have an error message I need help with. I cite the error message, code, and xml below. Be grateful if anyone can tell me what the fix is. Thanks.
0
8991
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8830
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9541
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8242
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6074
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4874
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3312
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2782
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2215
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.