parsing a file..

broli

I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all
this about ?

Mar 14 '08 #1

Subscribe Post Reply

2450

Richard Heathfield

broli said:

I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all this about ?

Someone's pulling your leg. 2000 lines of text is nothing. Just write the
program so that it's clear, correct, and easy to understand. Then, if and
only if it's too slow (and you should define the "fast enough"/"too slow"
boundary before you start writing the program), it's time to think about
how it might be made faster.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Mar 14 '08 #2

Richard Heathfield

broli said:

<snip>

But then I
was told that " normally we don't read scientific data in ascii for
accuracy and speed concerns" which made me wonder what was so wrong ?

The statement!

I could parse 2000 lines in hardly any time and there was no problem
with ascii either.

Right. Someone's pulling your leg, or is overly concerned with efficiency
at the expense of development time and clarity. That isn't to say that
efficiency isn't important. But let's just pretend, for the sake of
argument, that you write it /both/ ways, and then you measure. You
discover that the "binary" technique takes 0.025 seconds to process the
2000 data groups, whereas the "text" version takes 0.075 seconds - three
times slower! Surely this is a triumph for binary!

Yeah, right, but who cares? You press ENTER, and then it takes you 0.1
seconds to look up at the screen, and everything's finished, no matter
which one you ran.

Write it clear, simple, and correct. Then worry about speed if and only if
you have to.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Mar 14 '08 #3

Richard Tobin

In article <4e**********************************@s19g2000prg. googlegroups.com>,
broli <Br*****@gmail.comwrote:

>I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all this about ?

Reading in large chunks is unrelated to whether it's binary or
ascii. Perhaps they meant that character-at-a-time reading with
getchar() is slow, which it is on some systems. You can perfectly
well use fread() on text files.

-- Richard

--
:wq

Mar 14 '08 #4

Richard Heathfield

Chris Dollin said:

Richard Heathfield wrote:

<snip>

>>
Someone's pulling your leg. 2000 lines of text is nothing. Just write
the program so that it's clear, correct, and easy to understand. Then,
if and only if it's too slow (and you should define the "fast
enough"/"too slow" boundary before you start writing the program), it's
time to think about how it might be made faster.

I agree that speed is unlikely to be a factor -- but accuracy may be.

Possibly, but that comes under correctness, not performance.

<snip>

After all, if they want to read those 2000 lines 1000 times per second
...

....and that is covered by "fast enough/too slow". Again, I would emphasise
that the first priority is to make the program *clear* (because it's
easier to make a clear program correct than to make a correct program
clear). The second priority (and a sine qua non, obviously) is to make the
program *correct*. When and only when it works, it's time to worry about
speed. (This obviously does *not* mean that one should intentionally adopt
gross algorithmic inefficiencies.)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Mar 14 '08 #5

broli

Richard HeathField,

There are many modules involved in my software package and this is
just one of them. My software would also involve huge number of
calculations, searching, memory allocation etc etc but the thing is
that I have to parallelize the software code to run on different
machines anyway. Even if speed is an issue, I doubt that reading a
file in ascii or "binary" would make a huge impact overall.

Mar 14 '08 #6

Richard Heathfield

broli said:

<snip>

But when I use fgets() then wouldn't I get a string
of characters (also many tabs, null character etc) ?

Yes.

Wouldn't it be a
difficult task to convert an array of characters into double type
floating numbers again ?

I don't see that you have any choice. If what you've described is correct,
the numbers are already in text form. Converting is easy enough, though,
using strtod.

I think using fread will make it very fast
(considering that it allows you to read as many bytes of data at a
time as you want) but once again I'm not very adept at file handling
just at the begginign stages.

It's very likely that the input stream is buffered, so it won't actually
make much, if any, difference.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Mar 14 '08 #7

Richard

ri*****@cogsci.ed.ac.uk (Richard Tobin) writes:

In article <4e**********************************@s19g2000prg. googlegroups.com>,
broli <Br*****@gmail.comwrote:

>>I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all this about ?

Reading in large chunks is unrelated to whether it's binary or
ascii.

I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.

Perhaps they meant that character-at-a-time reading with
getchar() is slow, which it is on some systems. You can perfectly
well use fread() on text files.

The text file will be larger. There is a need to parse the ascii text
into the destination formats.

It will be slower in the great majority of cases.

>
-- Richard

Mar 14 '08 #8

Chris Dollin

Richard wrote:

ri*****@cogsci.ed.ac.uk (Richard Tobin) writes:

>In article <4e**********************************@s19g2000prg. googlegroups.com>,
broli <Br*****@gmail.comwrote:

>>>I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all this about ?

Reading in large chunks is unrelated to whether it's binary or
ascii.

I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.

> Perhaps they meant that character-at-a-time reading with
getchar() is slow, which it is on some systems. You can perfectly
well use fread() on text files.

The text file will be larger. There is a need to parse the ascii text
into the destination formats.

It will be slower in the great majority of cases.

Quick test, one file, 2000 lines, each line with two floats (1.12345
and 7.890), about 28Kb total.

One single big-enough fread:

real 0m0.002s
user 0m0.000s
sys 0m0.001s

Repeat fscanf( ... "%lf %lf" ... ) until EOF:

real 0m0.004s
user 0m0.002s
sys 0m0.002s

Yes, in this test it's twice as slow. The data file is probably
cached (it's been read several other times already as I /cough/
debugged my code). It includes program start-up time (I just did
`time ./a.out` to get the numbers) so the actual reading time will
be less.

Myself I wouldn't count that as "LOTS faster" for binary data,
but doubtless there are applications where it is so counted;
I don't think the OPs case is one of them, and it does look as
though he's reading a text file anyway.

--
"Creation began." - James Blish, /A Clash of Cymbals/

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Mar 14 '08 #9

Richard Tobin

In article <fr**********@registered.motzarella.org>,
Richard <de***@gmail.comwrote:

>Reading in large chunks is unrelated to whether it's binary or
ascii.

>I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.

I didn't say whether it's in binary is unrelated to *speed*.

I meant: there are two separate issues; whether you read it in large
chunks, and whether it's binary. You can read each of text or binary
in small or large chunks. Each of these choices will separately affect
the speed.

-- Richard
--
:wq

Mar 14 '08 #10

Richard Bos

ri*****@cogsci.ed.ac.uk (Richard Tobin) wrote:

In article <fr**********@registered.motzarella.org>,
Richard <de***@gmail.comwrote:

Reading in large chunks is unrelated to whether it's binary or
ascii.

I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.

I didn't say whether it's in binary is unrelated to *speed*.

I meant: there are two separate issues; whether you read it in large
chunks, and whether it's binary. You can read each of text or binary
in small or large chunks. Each of these choices will separately affect
the speed.

Besides, he _has_ a text file. Yes, it's a lot larger than a binary file
would be, and therefore slower to read. But the fact that the _file_ is
text is not the OP's doing. Reading this file as text or as binary won't
make a large difference. _Writing_ it as a binary file would have; but
that's not something the OP can do.

Richard

Mar 14 '08 #11

Bartc

"Chris Dollin" <ch**********@hp.comwrote in message
news:fr**********@news-pa1.hpl.hp.com...

Richard wrote:

>ri*****@cogsci.ed.ac.uk (Richard Tobin) writes:

>>In article
<4e**********************************@s19g2000pr g.googlegroups.com>,
broli <Br*****@gmail.comwrote:

I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all this about ?

Reading in large chunks is unrelated to whether it's binary or
ascii.

I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.

Quick test, one file, 2000 lines, each line with two floats (1.12345
and 7.890), about 28Kb total.

One single big-enough fread:

real 0m0.002s
user 0m0.000s
sys 0m0.001s

Repeat fscanf( ... "%lf %lf" ... ) until EOF:

real 0m0.004s
user 0m0.002s
sys 0m0.002s

Yes, in this test it's twice as slow. The data file is probably
cached (it's been read several other times already as I /cough/

My own tests:

(A) 100,000 lines of text, each with 3 doubles (2900000 bytes):

2.1 seconds to read a number at a time, using sscanf() (but I use a wrapper
or two with some extra overhead)

(B) The same data as 300,000 doubles written as binary (2400000 bytes):

0.8 seconds to read a number at a time, using fread() 8 bytes at a time

(C) Same binary data as (B)

0.004 seconds to read as a single block into memory (possibly straight into
the array or whatever datastructure is used). Using fread() on 2400000
bytes.

So about 200-500 times faster in binary mode, when done properly.

--
Bart

Mar 14 '08 #12

Richard

ri*****@cogsci.ed.ac.uk (Richard Tobin) writes:

In article <fr**********@registered.motzarella.org>,
Richard <de***@gmail.comwrote:

>>Reading in large chunks is unrelated to whether it's binary or
ascii.

>>I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.

I didn't say whether it's in binary is unrelated to *speed*.

I'm not sure that parses :-;

>
I meant: there are two separate issues; whether you read it in large
chunks, and whether it's binary. You can read each of text or binary
in small or large chunks. Each of these choices will separately affect
the speed.

Yes, I agree.

>
-- Richard

Mar 14 '08 #13

Willem

Bartc wrote:
) My own tests:
)
) (A) 100,000 lines of text, each with 3 doubles (2900000 bytes):
)
) 2.1 seconds to read a number at a time, using sscanf() (but I use a wrapper
) or two with some extra overhead)
)
) (B) The same data as 300,000 doubles written as binary (2400000 bytes):
)
) 0.8 seconds to read a number at a time, using fread() 8 bytes at a time
)
) (C) Same binary data as (B)
)
) 0.004 seconds to read as a single block into memory (possibly straight into
) the array or whatever datastructure is used). Using fread() on 2400000
) bytes.
)
) So about 200-500 times faster in binary mode, when done properly.

Have you tried reading the text file into memory as a single block
and then using sscanf() to parse it ?
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Mar 14 '08 #14

Richard

Chris Dollin <ch**********@hp.comwrites:

Richard wrote:

>ri*****@cogsci.ed.ac.uk (Richard Tobin) writes:

>>In article <4e**********************************@s19g2000prg. googlegroups.com>,
broli <Br*****@gmail.comwrote:

I need to parse a file which has about 2000 lines and I'm getting
told that reading the file in ascii would be a slower way to do it and
so i need to resort to binary by reading it in large chunks. Can any
one please explain what is all this about ?

Reading in large chunks is unrelated to whether it's binary or
ascii.

I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.

>> Perhaps they meant that character-at-a-time reading with
getchar() is slow, which it is on some systems. You can perfectly
well use fread() on text files.

The text file will be larger. There is a need to parse the ascii text
into the destination formats.

It will be slower in the great majority of cases.

Quick test, one file, 2000 lines, each line with two floats (1.12345
and 7.890), about 28Kb total.

One single big-enough fread:

real 0m0.002s
user 0m0.000s
sys 0m0.001s

Repeat fscanf( ... "%lf %lf" ... ) until EOF:

real 0m0.004s
user 0m0.002s
sys 0m0.002s

Yes, in this test it's twice as slow. The data file is probably
cached (it's been read several other times already as I /cough/
debugged my code). It includes program start-up time (I just did
`time ./a.out` to get the numbers) so the actual reading time will
be less.

Myself I wouldn't count that as "LOTS faster" for binary data,
but doubtless there are applications where it is so counted;
I don't think the OPs case is one of them, and it does look as
though he's reading a text file anyway.

Then why not take the static noise out? Make the file a lot bigger and
report back.

But even these results do indicate quite a large % difference .....

And we do not know how often this data sample is written or read. I
could be thousands of times an hour leading to considerable unnecessary
overhead if using ascii over binary.

Mar 14 '08 #15

Richard

"Bartc" <bc@freeuk.comwrites:

"Chris Dollin" <ch**********@hp.comwrote in message
news:fr**********@news-pa1.hpl.hp.com...
>Richard wrote:

>>ri*****@cogsci.ed.ac.uk (Richard Tobin) writes:

In article
<4e**********************************@s19g2000p rg.googlegroups.com>,
broli <Br*****@gmail.comwrote:

>I need to parse a file which has about 2000 lines and I'm getting
>told that reading the file in ascii would be a slower way to do it and
>so i need to resort to binary by reading it in large chunks. Can any
>one please explain what is all this about ?

Reading in large chunks is unrelated to whether it's binary or
ascii.

I would question that statement. Reading in binary will be a LOT faster
,if its the same platform. for reading in the same NUMBER of
readings.

>Quick test, one file, 2000 lines, each line with two floats (1.12345
and 7.890), about 28Kb total.

One single big-enough fread:

real 0m0.002s
user 0m0.000s
sys 0m0.001s

Repeat fscanf( ... "%lf %lf" ... ) until EOF:

real 0m0.004s
user 0m0.002s
sys 0m0.002s

Yes, in this test it's twice as slow. The data file is probably
cached (it's been read several other times already as I /cough/

My own tests:

(A) 100,000 lines of text, each with 3 doubles (2900000 bytes):

2.1 seconds to read a number at a time, using sscanf() (but I use a wrapper
or two with some extra overhead)

(B) The same data as 300,000 doubles written as binary (2400000 bytes):

0.8 seconds to read a number at a time, using fread() 8 bytes at a time

(C) Same binary data as (B)

0.004 seconds to read as a single block into memory (possibly straight into
the array or whatever datastructure is used). Using fread() on 2400000
bytes.

So about 200-500 times faster in binary mode, when done properly.

I'm surprised this is even being contested.

Mar 14 '08 #16

Bartc

"Willem" <wi****@stack.nlwrote in message
news:sl*******************@snail.stack.nl...

Bartc wrote:
) My own tests:
)
) (A) 100,000 lines of text, each with 3 doubles (2900000 bytes):
)
) 2.1 seconds to read a number at a time, using sscanf() (but I use a
wrapper
) or two with some extra overhead)
)
) (B) The same data as 300,000 doubles written as binary (2400000 bytes):
)
) 0.8 seconds to read a number at a time, using fread() 8 bytes at a time
)
) (C) Same binary data as (B)
)
) 0.004 seconds to read as a single block into memory (possibly straight
into
) the array or whatever datastructure is used). Using fread() on 2400000
) bytes.
)
) So about 200-500 times faster in binary mode, when done properly.

Have you tried reading the text file into memory as a single block
and then using sscanf() to parse it ?

No. I would imagine it would add a second or so to the time.

However, I left out the word 'apparently' when quoting the 200+ speed-up for
the binary block. I'm sure the disk cache has a big effect here, unless my
harddrive has a 600MB/sec transfer rate.

--
Bart

Mar 14 '08 #17

Richard Tobin

In article <fr**********@registered.motzarella.org>,
Richard <de***@gmail.comwrote:

>I didn't say whether it's in binary is unrelated to *speed*.

>I'm not sure that parses :-;

I didn't say { { whether it's in binary } is unrelated to { *speed* } }.

-- Richard
--
:wq

Mar 14 '08 #18

Chris Dollin

Richard wrote:

Chris Dollin <ch**********@hp.comwrites:
>Myself I wouldn't count that as "LOTS faster" for binary data,
but doubtless there are applications where it is so counted;
I don't think the OPs case is one of them, and it does look as
though he's reading a text file anyway.

Then why not take the static noise out? Make the file a lot bigger and
report back.

Because 2000 lines was the OPs file size, and for that file size
and context, the difference in timing is unimportant, and because
life being finite, I'd already spent what time I had available.

But even these results do indicate quite a large % difference .....

And we do not know how often this data sample is written or read. I
could be thousands of times an hour leading to considerable unnecessary
overhead if using ascii over binary.

Yes, and it could be once a day. Or a week. And for all we know -- hey,
if you can invent facts, so can I -- his code will be run on machines
with different floating-point formats, making binary transfer a clear
road to the Pit and text transfer more of a Dragons of Bel'kwinith thing.

--
"Creation began." - James Blish, /A Clash of Cymbals/

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Mar 14 '08 #19

Chris Dollin

Richard wrote:

"Bartc" <bc@freeuk.comwrites:

>So about 200-500 times faster in binary mode, when done properly.

I'm surprised this is even being contested.

It's not being contested; it's being /quantified/, which is part of
deciding whether whatever is the right thing to do.

[You can drive along the M4 at 70mph or at 120mph [1]; the latter is
certainly faster.]

[1] And, Just In Case Someone Suspects A Weasel, at a whole bunch of
other speeds as well, including at times 0; I don't /think/ I've
ever had to go negative, though.

--
"It was the dawn of the third age of mankind." /Babylon 5/

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Mar 14 '08 #20

Mark Bluemel

Richard Tobin wrote:

In article <fr**********@aioe.org>,
Mark Bluemel <ma**********@pobox.comwrote:

>This is fairly clearly a text file, so I can't see why anyone
should consider processing it as binary.

Perhaps the idea is to instead store the data in binary in the file.

As the OP goes on to state "I am using a graphics package
which always produces the .zeus file strictly in the above format", I'm
inclined to doubt that that is an option.

Mar 14 '08 #21

Antoninus Twink

On 14 Mar 2008 at 14:36, Richard wrote:

"Bartc" <bc@freeuk.comwrites:
>My own tests:

(A) 100,000 lines of text, each with 3 doubles (2900000 bytes):

2.1 seconds to read a number at a time, using sscanf() (but I use a wrapper
or two with some extra overhead)

(B) The same data as 300,000 doubles written as binary (2400000 bytes):

0.8 seconds to read a number at a time, using fread() 8 bytes at a time

(C) Same binary data as (B)

0.004 seconds to read as a single block into memory (possibly straight into
the array or whatever datastructure is used). Using fread() on 2400000
bytes.

So about 200-500 times faster in binary mode, when done properly.

I'm surprised this is even being contested.

Are you really surprised by *anything* in clc any more?

Leave common sense at the door when you enter clc.

Mar 14 '08 #22

Richard

Chris Dollin <ch**********@hp.comwrites:

Richard wrote:

>"Bartc" <bc@freeuk.comwrites:

>>So about 200-500 times faster in binary mode, when done properly.

I'm surprised this is even being contested.

It's not being contested; it's being /quantified/, which is part of
deciding whether whatever is the right thing to do.

See other post. You can not quantify it without all the criteria.

And faster is faster.

Mar 14 '08 #23

Mark Bluemel

Bartc wrote:

My own tests:

(A) 100,000 lines of text, each with 3 doubles (2900000 bytes):

2.1 seconds to read a number at a time, using sscanf() (but I use a wrapper
or two with some extra overhead)

(B) The same data as 300,000 doubles written as binary (2400000 bytes):

0.8 seconds to read a number at a time, using fread() 8 bytes at a time

(C) Same binary data as (B)

0.004 seconds to read as a single block into memory (possibly straight into
the array or whatever datastructure is used). Using fread() on 2400000
bytes.

So about 200-500 times faster in binary mode, when done properly.

OK.

But given that the OP has the data presented to him in a specific
format, which he has explained and which he has indicated is fixed, as
far as he is aware, this option is not open to him.

We do not know whether there is a need for repeated parsing of this
file - the OP has not told us. If there is, then there _may_ be an
argument for transforming the file into a binary format. Otherwise,
I don't see where binary comes into the problem - the file is a text
file and will need to be read and parsed from the text format.

Mar 14 '08 #24

santosh

Richard wrote:

<snip about text vs. binary I/O>

Take the faster way and then the unknown are a moot point.

The fastest method to do a task isn't always the most appropriate one.
The OP might have reasons for preferring/using text which we do not
know, as he has not specified much detail.

Mar 14 '08 #25

Bartc

"Bartc" <bc@freeuk.comwrote in message
news:QP*****************@text.news.virginmedia.com ...

>
"Chris Dollin" <ch**********@hp.comwrote in message

>Quick test, one file, 2000 lines, each line with two floats (1.12345
and 7.890), about 28Kb total.

One single big-enough fread:

real 0m0.002s
user 0m0.000s
sys 0m0.001s

Repeat fscanf( ... "%lf %lf" ... ) until EOF:

real 0m0.004s
user 0m0.002s
sys 0m0.002s

Yes, in this test it's twice as slow. The data file is probably
cached (it's been read several other times already as I /cough/

My own tests:

(A) 100,000 lines of text, each with 3 doubles (2900000 bytes):

2.1 seconds to read a number at a time, using sscanf() (but I use a
wrapper
or two with some extra overhead)

(B) The same data as 300,000 doubles written as binary (2400000 bytes):

0.8 seconds to read a number at a time, using fread() 8 bytes at a time

(C) Same binary data as (B)

0.004 seconds to read as a single block into memory (possibly straight
into
the array or whatever datastructure is used). Using fread() on 2400000
bytes.

So about 200-500 times faster in binary mode, when done properly.

I've done new tests which (i) use sscanf more directly and (ii) allow for
disk caching:

300,000 doubles as text in 100,000 lines read with fgets/sscanf: 1.8 seconds
300,000 doubles as binary, read individually with fread(): 0.8 seconds
300,000 doubles as binary, read as one block with fread: 0.09 seconds

So reading binary directly into a memory array is still nearly 10 times
faster than read number by number, and twenty times faster than reading as
text.

--
Bart

Mar 14 '08 #26

santosh

broli wrote:

<snip>

Why have you posted it again?

You need allow at least a few hours for responses, and preferably, about
a couple of days.

Mar 17 '08 #27

Mark Bluemel

Bill Reid wrote:

and
yes, if several people here said "use a 'state machine'", that's actually
a good reason to NOT use a "state machine"...

Each to their own. Here's a quick and hopefully not too dirty state
machine solution anyway:-

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct {
double x, y, z;
} Vector;

typedef struct {
Vector V;
} Vertex;

typedef struct {
int v0, v1, v2;
} Triangle;

typedef struct {
int nvert, ntri;
Vertex *Vert;
Triangle *Tri;
} Object;

int main(void)
{

enum { FIRST_LINE,
SECOND_LINE,
THIRD_LINE,
FOURTH_LINE,
VERTICES,
GAP,
TRIANGLES,
LAST_LINE
} state = FIRST_LINE;

Object current_object;
int vertex_count = 0;
int triangle_count = 0;
int gap_line_count = 0;

FILE *zeus_file;
zeus_file = fopen("zeus.dat", "r");

while (state != LAST_LINE) {
char dataLine[120]; /* or some more appropriate value */
if (fgets(dataLine, 119, zeus_file) == NULL) {
perror("zeus file read failed");
exit(EXIT_FAILURE);
}
switch (state) {
case (FIRST_LINE):
/* do nothing */
state = SECOND_LINE;
break;
case (SECOND_LINE):
/* do nothing */
state = THIRD_LINE;
break;
case (THIRD_LINE):
sscanf(dataLine, "%d %*d %d", &current_object.nvert,
&current_object.ntri);
current_object.Vert =
calloc(sizeof(Vertex), current_object.nvert);
current_object.Tri =
calloc(sizeof(Triangle), current_object.ntri);
state = FOURTH_LINE;
printf("Getting %d Vertexes and %d Triangles\n",
current_object.nvert, current_object.ntri);
break;
case (FOURTH_LINE):
/* do nothing */
state = VERTICES;
break;
case (VERTICES):
sscanf(dataLine, "%lf %lf %lf",
&(current_object.Vert[vertex_count].V.x),
&(current_object.Vert[vertex_count].V.y),
&(current_object.Vert[vertex_count].V.z));
vertex_count += 1;
if (vertex_count >= current_object.nvert) {
state = GAP;
}
break;
case (GAP):
gap_line_count += 1;
if (gap_line_count >= 2) {
state = TRIANGLES;
}
break;
case (TRIANGLES):
sscanf(dataLine, "%d %d %d",
&(current_object.Tri[triangle_count].v0),
&(current_object.Tri[triangle_count].v1),
&(current_object.Tri[triangle_count].v2));
triangle_count += 1;
if (triangle_count >= current_object.ntri) {
state = LAST_LINE;
}
break;
}
}

printf("VERTEXES :-\n");
for (vertex_count = 0; vertex_count < current_object.nvert;
vertex_count++) {
printf("%d: %f %f %f\n", vertex_count,
current_object.Vert[vertex_count].V.x,
current_object.Vert[vertex_count].V.y,
current_object.Vert[vertex_count].V.z);
}
printf("Triangles :-\n");
for (triangle_count = 0; triangle_count < current_object.ntri;
triangle_count++) {
printf("%d: %d %d %d\n", triangle_count,
current_object.Tri[triangle_count].v0,
current_object.Tri[triangle_count].v1,
current_object.Tri[triangle_count].v2);
}
}

Mar 17 '08 #28

Richard Heathfield

broli said:

So here's my attempt using STATE MACHINE (as many people have
suggested) for reading the .zeus file in its proper format. Please
tell me what are the flaws in this program (Please do not point
obvious things

I have refrained, at your request, from pointing out the obvious problems
with the code.

There are no non-obvious problems that I can see.

If you'd like to know about the obvious problems after all, just say so.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Mar 17 '08 #29

Keith Thompson

Ben Bacarisse <be********@bsb.me.ukwrites:

broli <Br*****@gmail.comwrites:
>So here's my attempt using STATE MACHINE (as many people have
suggested)

Many people? I think, in this case it just complicates the program.
Your states are entered in sequence. Parsing the file is just a
sequence of actions, one after the other.

[...]

As one of the people who suggested a state machine, I think you're
right. If the states are purely sequential (state 1 is always
followed by state 2, which is always followed by state 3, etc.), then
an explicit state machine is probably overkill. The state can be
implicitly represented by where you are in the program.

But if you want to use an explicit state machine anyway, *please* give
your states meaningful names.

--
Keith Thompson (The_Other_Keith) <ks***@mib.org>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Mar 17 '08 #30

Keith Thompson

broli <Br*****@gmail.comwrites:
[...]

label:while(!feof(fp))

[...]

Please read section 12 of the comp.lang.c FAQ, <http://c-faq.com/>,
particularly question 12.2.

--
Keith Thompson (The_Other_Keith) <ks***@mib.org>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Mar 17 '08 #31

Mark Bluemel

Keith Thompson wrote:

Ben Bacarisse <be********@bsb.me.ukwrites:
>broli <Br*****@gmail.comwrites:
>>So here's my attempt using STATE MACHINE (as many people have
suggested)
Many people? I think, in this case it just complicates the program.
Your states are entered in sequence. Parsing the file is just a
sequence of actions, one after the other.
[...]

As one of the people who suggested a state machine, I think you're
right.

Yes, probably.

If the states are purely sequential (state 1 is always
followed by state 2, which is always followed by state 3, etc.), then
an explicit state machine is probably overkill. The state can be
implicitly represented by where you are in the program.

But writing a state machine is fun, generalizable and has the benefit -
if you use names, not opaque numbers - of making things very explicit.

Mar 17 '08 #32

parsing a file..

Similar topics