473,241 Members | 1,818 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,241 software developers and data experts.

ASCII file parser - to read between brackets ()


Hi,
My ascii file is not exactly a comma separated file. The following is
a small but complete example of such a file. (This is the ISCAS circuit
file format that I need to read in.)

----------- Example c17.bench ----------------
INPUT(1)
INPUT(2)
INPUT(3)
INPUT(6)
INPUT(7)

OUTPUT(22)
OUTPUT(23)

10 = NAND(1, 3)
11 = NAND(3, 6)
16 = NAND(2, 11)
19 = NAND(11, 7)
22 = NAND(10, 16)
23 = NAND(16, 19)
----------------End of Example ----------------

I would like to note that the numbers can also be replaced by some
other symbols - so they are actually to be treated as strings and not
as numbers.
Eg. "1" can also be "N_1" or "Node_1" - or any other string
representation.

Now I would like to know what should be my approach to reading in this
file, i.e. the algorithm.

Off the top of my head I think I would just have to read in each line
as a string. Then I would search the string for various keywords. On
finding a keyword I would then find the location of the two brackets ()
- and then parse the values between them.

I am wondering if this approach is the right way to go.

Thanks a lot guys,
O.O.

Feb 15 '06 #1
9 4018
ol*******@yahoo.it wrote:
Hi,
My ascii file is not exactly a comma separated file. [...]

Now I would like to know what should be my approach to reading in this
file, i.e. the algorithm.

Off the top of my head I think I would just have to read in each line
as a string. Then I would search the string for various keywords. On
finding a keyword I would then find the location of the two brackets ()
- and then parse the values between them.

I am wondering if this approach is the right way to go.


Sounds fine. I don't see any C++ relation, however. Please don't just
say that you're "writing it in C++". The algorithm you've described can
just as easily be written in almost any other language. Did you mean to
post it to 'comp.programming'?

V
--
Please remove capital As from my address when replying by mail
Feb 15 '06 #2
TB
ol*******@yahoo.it sade:
Hi,
My ascii file is not exactly a comma separated file. The following is
a small but complete example of such a file. (This is the ISCAS circuit
file format that I need to read in.)

----------- Example c17.bench ----------------
INPUT(1)
INPUT(2)
INPUT(3)
INPUT(6)
INPUT(7)

OUTPUT(22)
OUTPUT(23)

10 = NAND(1, 3)
11 = NAND(3, 6)
16 = NAND(2, 11)
19 = NAND(11, 7)
22 = NAND(10, 16)
23 = NAND(16, 19)
----------------End of Example ----------------

I would like to note that the numbers can also be replaced by some
other symbols - so they are actually to be treated as strings and not
as numbers.
Eg. "1" can also be "N_1" or "Node_1" - or any other string
representation.

Now I would like to know what should be my approach to reading in this
file, i.e. the algorithm.

Off the top of my head I think I would just have to read in each line
as a string. Then I would search the string for various keywords. On
finding a keyword I would then find the location of the two brackets ()
- and then parse the values between them.


Tokenize the input before parsing.

--
TB @ SWEDEN
Feb 15 '06 #3
<ol*******@yahoo.it> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
: Hi,
: My ascii file is not exactly a comma separated file. The following is
: a small but complete example of such a file. (This is the ISCAS circuit
: file format that I need to read in.)
:
: ----------- Example c17.bench ----------------
: INPUT(1)
: INPUT(2)
: INPUT(3)
: INPUT(6)
: INPUT(7)
:
: OUTPUT(22)
: OUTPUT(23)
:
: 10 = NAND(1, 3)
: 11 = NAND(3, 6)
: 16 = NAND(2, 11)
: 19 = NAND(11, 7)
: 22 = NAND(10, 16)
: 23 = NAND(16, 19)
: ----------------End of Example ----------------
:
: I would like to note that the numbers can also be replaced by some
: other symbols - so they are actually to be treated as strings and not
: as numbers.
: Eg. "1" can also be "N_1" or "Node_1" - or any other string
: representation.
:
: Now I would like to know what should be my approach to reading in this
: file, i.e. the algorithm.
:
: Off the top of my head I think I would just have to read in each line
: as a string. Then I would search the string for various keywords. On
: finding a keyword I would then find the location of the two brackets ()
: - and then parse the values between them.
:
: I am wondering if this approach is the right way to go.

There are several ways in which this can be accomplished.
But because I don't know the complete 'grammar' of the file,
I am not sure which would be the most appropriate
(e.g. I assume there is not only NAND, but XOR etc.
Can a more complex expression be used? Unary NOT ? )

In any case, rather than parsing each line manually, you
could use one of the existing lexers or parser generators,
such as flex(with or without bison, a bit old-fashioned
but works - http://www.gnu.org/software/flex/),
or boost::spirit (http://www.boost.org/libs/spirit/index.html).

If the files are simple enough, a regular-expressions package
might be an alternative for extracting needed identifiers from
each line (e.g. http://www.boost.org/libs/regex/doc/index.html)
These are among a number of other options...
hth-Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Brainbench MVP for C++ <> http://www.brainbench.com
Feb 15 '06 #4
Dear Victor,
Thanks for responding. I forgot to mention in my post that I am
dealing with C++. I know that my algorithm was general, but sometimes a
certain language may have some features to handle this situation
differently. E.g. I had heard of RegEx's in perl, and I thought I
could not use them in C++. Also I did not know of what Tokenize means
which I learnt only after TB suggested it.
That's what I was looking for.
Thanks,
O.O.

Feb 15 '06 #5
Thanks TB. I think this was what I am looking for. I think my file is
simple enough so I don't need to use RegEx's. I have found some
code in the example at http://www.codeproject.com/cpp/stringtok.asp
that I think would be useful to me.
Regards,
O.O.

Feb 15 '06 #6
ol*******@yahoo.it wrote:
Thanks for responding. I forgot to mention in my post that I am
dealing with C++.
That's what I was afraid of...
I know that my algorithm was general, but sometimes a
certain language may have some features to handle this situation
differently. E.g. I had heard of RegEx's in perl, and I thought I
could not use them in C++.
They are not part of the language yet. As soon as you see the TR1
implemented, you could try using <regex> and whatever it is going to
contain. Until then, alas, no language mechanism to help you except some
very simple ones, like 'string', 'fstream', and others of which you are
probably already aware.
Also I did not know of what Tokenize means
which I learnt only after TB suggested it.


"Tokenize" usually means "identify and split the input stream into tokens"
and it can mean _whatever_you_make_it_to_mean_ because it depends entirely
on your definition of "a token".

V
--
Please remove capital As from my address when replying by mail
Feb 15 '06 #7
Thanks Ivan. I have heard of RegEx's - but I have not used them
much. I think I would start with string tokenizer and if that becomes
too complicated I would attempt this method.
O.O.

Feb 15 '06 #8
ol*******@yahoo.it wrote:
Thanks Ivan. I have heard of RegEx's - but I have not used them
much. I think I would start with string tokenizer and if that becomes
too complicated I would attempt this method.
O.O.


most implementations of scanf will handle this for you no problem.

if you don't like sscanf you can use regex.

if you don't like regex you can use lex (probably don't need a parser,
just the scanner should suffice).

if you don't like lex/yacc you can write your own scanner, the grammar
you have there isn't too complex.

if you don't want to write your own scanner you can..
actually that's sort of the problem people have with c++. your options
when dealing with any particular problem are quite literally, endless.

perl kind of herds you into trying to approach everything with regex's
and hashes, while vb/.net will get you to buy some prebuilt item. i'm
guessing thats why you came to c++ group, to find out what approach c++
lends itself most easily to. unfortunately, c++ lends itself to just
about every solution :p
Feb 16 '06 #9
Thanks pillbug. I did not have this insight before.
O.O.

Feb 16 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Nova's Taylor | last post by:
Hi folks, I am a newbie to Python and am hoping that someone can get me started on a log parser that I am trying to write. The log is an ASCII file that contains a process identifier (PID),...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
by: Martín Marconcini | last post by:
Hello there, I'm writting (or trying to) a Console Application in C#. I has to be console. I remember back in the old days of Cobol (Unisys), Clipper and even Basic, I used to use a program...
18
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found...
5
by: Sean Kirkpatrick | last post by:
As part of my ongoing effort to provide a set of .Net wrappers for DAO, I'm writing a simple parser in VB.Net to search collection of VB6 source files to add explicit qualifiers to existing...
24
by: Marc Dubois | last post by:
hi, is it possible to parse an XML file in C so that i can fulfill these requirements : 1) replace all "<" and ">" signs inside the body of tag by a space, e.g. : Example 1: <fooblabla < bla...
399
by: =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= | last post by:
PEP 1 specifies that PEP authors need to collect feedback from the community. As the author of PEP 3131, I'd like to encourage comments to the PEP included below, either here (comp.lang.python), or...
6
by: swetha | last post by:
HI Every1, I have a file in binary format. Using a C program i have to read that file and need to make some changes . So i wanted to convert that into ACII format.Can any1 suggest me how to do...
6
by: Jasper | last post by:
Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for ideas on how to parse a data file. I dont know XML but I know it parses data in text format. I have a structured data...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.