parsing of initialization files

giulianodammando

In the development of a simple numerical simulation software i need to
read
some initialization parameters from a file that looks like:

# Global Setup

species = 1;

\begin{specie}<1>
name = NITROGEN;
aindex = 7;
ionstages = 1;
\begin{ionstage}<1>
nmax = -1;
iindex = 10;
ionDB = XSTAR;
eLevsDB = XSTAR;
bbCollDB = CHIANTI;
bfCollDB = NIST, XSTAR;
bbRadDB = OPIP;
bfRadDB = XSTAR;
ffRadDB = off;
\end{ionstage}<1>
\end{specie}<1>

# Here stay transport calculation
# related options.

# spatial mesh options
\begin{GridOpt}
Grid = Uniform; # uniform spacing
GridStep = 20; # step size in (cm)
\end{GridOpt}

# set physical domain extension
\begin{DomainOpt}
DomainLength = 100; # cm
\end{DomainOpt}

# Specify Boundary Conditions on intensity
\begin{RadBoundaryCond}
RadBoundaryPolicy = ZeroOnBoth; # no radiation
\end{RadBoundaryCond}

# NEEDS_WORK
# set other stuff
\begin{OtherOpt}
edenPolicy = ConstField;
eden = 1e12; # cm^-3
adenPolicy = ConstField;
aden = 1e16; # cm^-3
etemPolicy = ConstField
\end{OtherOpt}

I'm not a professional programmer, so i've implemented the reading
scheme as a simple token
iterator (basically jump #comments and tokenize the rest) which feeds a
simple parser.
The parser basically recognizes 3 kind of token:
1. \environment --start an environment of some kind
2. variable names/ variable comma-separated lists
3. the "=" token, meaning assigment (typically an internal variable is
assignd with a numeric/string value or a list of these)
3. the ";" token, meaning end of the right hand side in an assigment

This result in about 2000 lines of code because of an enormous switch
statement testing for each valid
option (and reporting an error in case there's something wrong).
I'd like to use a more flexible approach but i've not idea were to
start. It would be useful for me also to
read small matrices and vectors of floats (to restart a simulation from
a previously interrupted one).
So i need something more similar to a grammar driven parsing approach,
but i've not idea were to start.
I would appreciate any suggestion or pointers to resources (availables
libraries, documents) useful
to solve this design problem.
Thanks

Sep 18 '06 #1

Subscribe Post Reply

2294

Michael

In the development of a simple numerical simulation software i need to

read
some initialization parameters from a file that looks like:

(bunch of stuff deleted)

I'd like to use a more flexible approach but i've not idea were to
start.

Two answers, depending on your parameters:
1) If the format of the parameter file is fixed, and you're just
looking to write the grammer for it, check out Lex and YACC. Those are
standard parser generator tools. Here are links to GNU's
implementations (called Flex and Bison, respectively):
http://flex.sourceforge.net/
http://www.gnu.org/software/bison/

2) If you have control over the parameter file format, consider
changing it to some kind of XML-based thing. XML is designed to be
more parseable.
So things like
\begin{specie}<1>
name = NITROGEN;
\begin{ionstage}<1>
nmax = -1;
\end{ionstage}<1>
\end{specie}<1>

would become
<specie>
<name>NITROGEN</name>
<ionstage>
<nmax>-1</nmax>
</ionstage>
</specie>

In that case, look at xerces (http://xml.apache.org). Unless I'm
really worried about the size of these files, I tend to use a DOM-based
approach to reading them, rather than an event-based approach, because
the client code is easier.

Good luck with your project.

Michael

Sep 18 '06 #2

Jerry Coffin

In article <11**********************@h48g2000cwc.googlegroups .com>,
gi**************@libero.it says...

In the development of a simple numerical simulation software i need to
read some initialization parameters from a file that looks like:

[ ... example elided ]

The two obvious options for a parser are top-down and bottom-up. A
bottom-up parser is typically written with a tool like yacc. You supply
it with a description of the grammar, and it creates a table-driven
parser. The grammar description is usually something like a warped
version of BNF. For better or worse, it's a language all its own that's
probably not topical here, but (even though you're not technically
writing a compiler) almost certainly would be in comp.compilers. Given
the small size and complexity of your grammar, you might also want to
consider using Boost::Spirit, which is a parser generator written as a
set of C++ templates.

There's no theoretical reason you couldn't create top-down parsers the
same way, but most top-down parsers are written by hand, using recursive
descent. The basic idea is that you still have a grammar, but you
basically write a single function to recognize each non-terminal in your
grammar. Terminals in the grammar are normally recognized directly in
the lexer.

Glancing through your example, it looks like the grammar should come out
something like this.

file: statements

statements: statement | statement statements
statement: assignment | environment

assignment: variable '=' value ';'

environment: header statements footer
header: '\begin{' NAME '}<' NUMBER '>'
footer: '\end{' NAME '}<' NUMBER '>'

For the moment I've made this right-recursive (e.g. the definitions of
statements and list). If you decide to use a bottom-up parser, you'll
want to make those left-recursive.

I also haven't defined 'variable' or 'value' for the moment, because I'd
make that part of the parser data driven. Instead of the case statements
you used, I'd create a map that contained all allowable variable names,
and with each I'd associate a mini-parser that knew how to parse the
specific type of data that can be assigned to that variable. These would
all descend from a common base class that is passed (for example) a
string containing the raw data from the stream, up-to, but not including
the semicolon.

Some (maybe most) of those could be driven by external data as well --
for example:

name = NITROGEN;

You could use something like this:

class name_parser : public parser {
std::set<std::stringelements;
public:
name_parser(std::string file_name) {
std::ifstream infile(file_name);
std::istream_iterator<std::string>(input), end;
std::copy(input, end, std::inserter(elements));
}

bool operator()(std::string const &name) {
return elements.find(make_lower(name)) != elements.end();
}
}

This would use a file of allowable element names:
hydrogen
helium
lithium
[ ... ]
unuhexium

or, if you want to arrange it like a periodic table, you can do that
[though it really needs to be wider than is convenient on Usenet]:

hydrogen helium
lithium beryllium boron carbon nitrogen oxygen fluorine neon
[ ... ]

Basically, as long you just have names separated by white space (spaces,
tabs, new-lines) the program doesn't care at all about how the white
space is arranged. For that matter, the order doesn't matter either.
I've put them in atomic order for the moment, but the set will arrange
them in alphabetical order as it reads them -- though, oddly enough,
arranging them in alphabetical order in the file will actually reduce
efficiency.

Of coure, you might only allow a subset of the elements -- in which case
this file would only list those you allow. If you want to add new
elements as they're discovered, you can do that without changing the
program logic at all.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 18 '06 #3

Jerry Coffin

In article <MP************************@news.sunsite.dk>,
jc*****@taeus.com says...

[ ... ]

name_parser(std::string file_name) {
std::ifstream infile(file_name);
std::istream_iterator<std::string>(input), end;

Oops. That should be:

std::istream_iterator<std::stringinput(infile)

Sorry 'bout that.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 18 '06 #4

giulianodammando

Michael wrote:

In the development of a simple numerical simulation software i need to
read
some initialization parameters from a file that looks like:

(bunch of stuff deleted)

I'd like to use a more flexible approach but i've not idea were to
start.

Two answers, depending on your parameters:
1) If the format of the parameter file is fixed, and you're just
looking to write the grammer for it, check out Lex and YACC. Those are
standard parser generator tools. Here are links to GNU's
implementations (called Flex and Bison, respectively):
http://flex.sourceforge.net/
http://www.gnu.org/software/bison/

2) If you have control over the parameter file format, consider
changing it to some kind of XML-based thing. XML is designed to be
more parseable.
In that case, look at xerces (http://xml.apache.org). Unless I'm
really worried about the size of these files, I tend to use a DOM-based
approach to reading them, rather than an event-based approach, because
the client code is easier.

Good luck with your project.

Michael

I also think that the xml approach would be more feasible, because
there are
a bunch of xml parsers freely availables on the net, but i tend to
avoid this possibility
because the init file (at the moment) is intended to be manually
modified, and the
xml metalanguage is a bit too verbose (data are buried under tons of
formatting tags).
Obviously it would be fantastic if one could write a simple HTML form
to fill in a transparent
way the init file by means of an internet browser! I also i'm aware
that this should be a
joke for much relatively skilled people, but at the moment it would be
too complicated for me.

Thank you very much Michael.

Sep 19 '06 #5

giulianodammando

Jerry Coffin wrote:

In article <11**********************@h48g2000cwc.googlegroups .com>,
gi**************@libero.it says...
In the development of a simple numerical simulation software i need to
read some initialization parameters from a file that looks like:

[ ... example elided ]

... the small size and complexity of your grammar, you might also want to
consider using Boost::Spirit, which is a parser generator written as a
set of C++ templates.

In effect i was considering the possibility of using Spirit as a
parsing framework.
The heavy use of high level generic programming in this library is
causing me
some problems, but obviously it needs some time to become familiar with
such a
complex tool.

There's no theoretical reason you couldn't create top-down parsers the
same way, but most top-down parsers are written by hand, using recursive
descent. The basic idea is that you still have a grammar, but you
basically write a single function to recognize each non-terminal in your
grammar. Terminals in the grammar are normally recognized directly in
the lexer.

Glancing through your example, it looks like the grammar should come out
something like this.

file: statements

statements: statement | statement statements
statement: assignment | environment

assignment: variable '=' value ';'

environment: header statements footer
header: '\begin{' NAME '}<' NUMBER '>'
footer: '\end{' NAME '}<' NUMBER '>'

This exactly the same grammar i've written yesterday, taking
inspiration from
the manual of my metapost installation! This should allow me also to
parse things
like:

\begin{vector}<dim>
<number list>
\end{vector}<dim>

\begin{matrix}<dim_1, dim_2>
<n-uple list>
\end{vector}<dim_1,dim_2>

name = NITROGEN;

You could use something like this:

class name_parser : public parser {
std::set<std::stringelements;
public:
name_parser(std::string file_name) {
std::ifstream infile(file_name);
std::istream_iterator<std::string>(input), end;
std::copy(input, end, std::inserter(elements));
}

bool operator()(std::string const &name) {
return elements.find(make_lower(name)) != elements.end();
}
}

and these should be the semantic actions i would attach to the parsers!
Thank you very much Jerry, i will try to implement your ideas.

Sep 19 '06 #6

Jerry Coffin

In article <11**********************@b28g2000cwb.googlegroups .com>,
gi**************@libero.it says...

[ ... ]

file: statements

statements: statement | statement statements
statement: assignment | environment

assignment: variable '=' value ';'

environment: header statements footer
header: '\begin{' NAME '}<' NUMBER '>'
footer: '\end{' NAME '}<' NUMBER '>'

This exactly the same grammar i've written yesterday, taking
inspiration from
the manual of my metapost installation! This should allow me also to
parse things
like:

\begin{vector}<dim>
<number list>
\end{vector}<dim>

\begin{matrix}<dim_1, dim_2>
<n-uple list>
\end{vector}<dim_1,dim_2>

With one minor change, it should anyway -- in 'header' and 'footer'
you'd need to change the 'NUMBER' to soemthing like 'list', where a list
is defined as a list of numbers:

list: NUMBER | list ',' NUMBER

Thank you very much Jerry, i will try to implement your ideas.

Glad to help -- I hope things work out well...

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 19 '06 #7

by: Gerrit Holl | last post by:

Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...

Python

XML file parsing/validating with xerces-j

by: Cigdem | last post by:

Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...

.NET Framework

file parsing algorithms in vb.net?

by: Christoph Bisping | last post by:

Hello! Maybe someone is able to give me a little hint on this: I've written a vb.net app which is mainly an interpreter for specialized CAD/CAM files. These files mainly contain simple movement...

Visual Basic .NET

Initialization is done before the program starts executing ?

by: lovecreatesbea... | last post by:

K&R 2, sec 2.4 says: If the variable in question is not automatic, the initialization is done once only, conceptually before the program starts executing, ... . "Non-automatic variables are...

C / C++

parsing an ifstream to get some specific text

by: toton | last post by:

Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...

C / C++

When does static initialization occur?

by: JohnQ | last post by:

The way I understand the startup of a C++ program is: A.) The stuff that happens before the entry point. B.) The stuff that happens between the entry point and the calling of main(). C.)...

C / C++

Picking Up Parsing etc. Errors in PHP

by: Alan M Dunsmuir | last post by:

I'm using Kate in Linux (and UltraEdit when I have to drop back into Windows) for writing my PHP code. As a independent, self-employed developer, I cannot afford a commercial IDE for PHP such as is...

PHP

Re: programi parsing question

by: CBFalconer | last post by:

fjblurt@yahoo.com wrote: Considering the crosspost, I won't complain about using the non-standard open in place of fopen. However it is inappropriate on comp.programming. I have renamed the...

C / C++

xml parsing script dying with "Premature end of script headers" error

by: GazK | last post by:

I have been using an xml parsing script to parse a number of rss feeds and return relevant results to a database. The script has worked well for a couple of years, despite having very crude...

PHP

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

parsing of initialization files

Similar topics