473,395 Members | 1,527 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Parsing two formatted text files

Hello,

I am trying to parse two pre-formatted text files and write them to a
different files formatted in a different way. The story about this is I
was hired along with about 20 other people and it seems we are trying
to learn the whole C language in two weeks! To top it all off, I was an
English Major, but I'm trying my best. Ok back to the program. So we
have two files product_catalog.txt and sales_month.txt

The info in product_catalog.txt looks like this:

1010:CD drive external 32x :1MagiCopy:15.5:100
1020:CD drive external 40x :20th Century Fox:16.74:130
1030:CD drive external 48x :3COM:13.48:160
1040:CD drive external 52x :4XEM:15.92:190

We need to write it to another file that is going to look like this

ID Number Description Provider Cost Stock Total
1010 CD Drive 32x 1MagiCopy 15.50 100 1550.00

Since the text file to be read from is preformatted I thought I could
use the fscanf() to to parse each line and assign it into structure
variables, but I am having problems.

Here is my code to read the file:

int readFile (char *filename, struct productData product[], size_t
arrLen)
/* Returns number of products read */
{
FILE *fp;

if ( ( fp = fopen( "product_catalog.txt", "rb+" ) ) == NULL ) {
printf( "File could not be opened.\n" );
} /* end if */

else
{
int i;
for (i=0; i<arrLen && !feof(fp); i++)
{
if (5 != fscanf(fp, "%d %s %s %f %d",
&product[i].idnumber,
product[i].description,
product[i].provider,
&product[i].cost,
&product[i].stock))
{
printf("Invalid file format\n");
fclose(fp);
return 0;
}
}
fclose(fp);
return i;
}
}

The problem seems to be that each field I want to parse seems to be
separated by a colon (:) Is there anyway to tell fscanf() to parse up
until you reach a colon and then stop and start scanning again, or
should I give up this approach and try to tokenize the input stream?
Any help is much appreciated.

Brett

Mar 31 '06 #1
6 2697


bf******@gmail.com wrote On 03/31/06 18:06,:
Hello,

I am trying to parse two pre-formatted text files and write them to a
different files formatted in a different way. The story about this is I
was hired along with about 20 other people and it seems we are trying
to learn the whole C language in two weeks! To top it all off, I was an
English Major, but I'm trying my best. Ok back to the program. So we
have two files product_catalog.txt and sales_month.txt

The info in product_catalog.txt looks like this:

1010:CD drive external 32x :1MagiCopy:15.5:100
1020:CD drive external 40x :20th Century Fox:16.74:130
1030:CD drive external 48x :3COM:13.48:160
1040:CD drive external 52x :4XEM:15.92:190

We need to write it to another file that is going to look like this

ID Number Description Provider Cost Stock Total
1010 CD Drive 32x 1MagiCopy 15.50 100 1550.00
That's not just reformatting. There's a little bit
of computation (deriving the 1550.00), which isn't hard.
Harder -- potentially very hard -- is the translation
that seems to be occurring: How did "drive" become "Drive,"
and where did "external" disappear to, and what rules
govern such transformations?
Since the text file to be read from is preformatted I thought I could
use the fscanf() to to parse each line and assign it into structure
variables, but I am having problems.

Here is my code to read the file:

int readFile (char *filename, struct productData product[], size_t
arrLen)
/* Returns number of products read */
{
FILE *fp;

if ( ( fp = fopen( "product_catalog.txt", "rb+" ) ) == NULL ) {
printf( "File could not be opened.\n" );
} /* end if */

else
{
int i;
for (i=0; i<arrLen && !feof(fp); i++)
{
if (5 != fscanf(fp, "%d %s %s %f %d",
&product[i].idnumber,
product[i].description,
product[i].provider,
&product[i].cost,
&product[i].stock))
{
printf("Invalid file format\n");
fclose(fp);
return 0;
}
}
fclose(fp);
return i;
}
}

The problem seems to be that each field I want to parse seems to be
separated by a colon (:) Is there anyway to tell fscanf() to parse up
until you reach a colon and then stop and start scanning again, or
should I give up this approach and try to tokenize the input stream?
Any help is much appreciated.


"%s" will skip leading white space, grab a string,
and stop when it hits white space again. Hence, it's
no good for your input format, where white spaces can
occur as part of a data field.

You could use "%[^:]" to look for colon-delimited
fields, but the resulting program would be rather fragile.
One lousy line with an extra colon or a missing colon,
and you'll be out of step for the rest of the journey.
or until you trip and fall, whichever comes first.
(fscanf() is no respecter of line boundaries, and will
happily cross them in search of more input.)

Recommended approach: Use fgets() (but not gets()!!!)
to read each line into a big char[] array, and then pick
the line apart with other tools. sscanf() may be a choice
you'd find familiar -- and since sscanf() cannot run off
the end of its input array (and thus inadvertengly bypass
line boundaries), some of the infelicities of fscanf()
disappear.

--
Er*********@sun.com

Mar 31 '06 #2
On 2006-03-31, bf******@gmail.com <bf******@gmail.com> wrote:
[...]
The info in product_catalog.txt looks like this:

1010:CD drive external 32x :1MagiCopy:15.5:100
1020:CD drive external 40x :20th Century Fox:16.74:130
1030:CD drive external 48x :3COM:13.48:160
1040:CD drive external 52x :4XEM:15.92:190

We need to write it to another file that is going to look like this

ID Number Description Provider Cost Stock Total
1010 CD Drive 32x 1MagiCopy 15.50 100 1550.00

Since the text file to be read from is preformatted I thought I could
use the fscanf() to to parse each line and assign it into structure
variables, but I am having problems.

Here is my code to read the file:

int readFile (char *filename, struct productData product[], size_t
arrLen)
/* Returns number of products read */
{
FILE *fp;

if ( ( fp = fopen( "product_catalog.txt", "rb+" ) ) == NULL ) {
printf( "File could not be opened.\n" );
} /* end if */

else
{
int i;
for (i=0; i<arrLen && !feof(fp); i++)
{
if (5 != fscanf(fp, "%d %s %s %f %d",
&product[i].idnumber,
product[i].description,
product[i].provider,
&product[i].cost,
&product[i].stock))
{
printf("Invalid file format\n");
fclose(fp);
return 0;
}
}
fclose(fp);
return i;
}
}

The problem seems to be that each field I want to parse seems to be
separated by a colon (:) Is there anyway to tell fscanf() to parse up
until you reach a colon and then stop and start scanning again, or
should I give up this approach and try to tokenize the input stream?


You put the colons in the format string:

if (5 != fscanf(fp, "%d:%s:%s:%f:%d" ...

But this still won't work quite right, because %s will make fscanf will
stop at the spaces.

You can use %[^:] to mean "series of non-colons" so:

if (5 != fscanf(fp, "%d:%[^:]:%[^:]:%f:%d" ...

should do the trick.

You also have to be careful that badly formatted input data can't
overflow the arrays you're storing the data in. fscanf provides various
format modifiers for this-- it can optionally scan up to a maximum
length, or it can allocate the buffers for you.

e.g.:

if (5 != fscanf(fp, "%d:%64[^:]:%64[^:]:%f:%d" ...

if your buffers for description and provider were 64 bytes long. They'd
get truncated of course, which might not be acceptable. In that case you
could try %a[^:] (see fscanf manual).

The other point is that if you have any choice in the matter C is not
the best language for this task, you'd be much better off with something
else-- Python, Tcl, Perl, that kind of thing. Awk might be the perfect
choice.
Apr 1 '06 #3
Eric Sosman wrote:
bf******@gmail.com wrote On 03/31/06 18:06,:

I am trying to parse two pre-formatted text files and write them to a
different files formatted in a different way. The story about this is I
was hired along with about 20 other people and it seems we are trying
to learn the whole C language in two weeks! To top it all off, I was an
English Major, but I'm trying my best. Ok back to the program. So we
have two files product_catalog.txt and sales_month.txt

The info in product_catalog.txt looks like this:

1010:CD drive external 32x :1MagiCopy:15.5:100
1020:CD drive external 40x :20th Century Fox:16.74:130
1030:CD drive external 48x :3COM:13.48:160
1040:CD drive external 52x :4XEM:15.92:190

We need to write it to another file that is going to look like this

ID Number Description Provider Cost Stock Total
1010 CD Drive 32x 1MagiCopy 15.50 100 1550.00


That's not just reformatting. There's a little bit
of computation (deriving the 1550.00), which isn't hard.
Harder -- potentially very hard -- is the translation
that seems to be occurring: How did "drive" become "Drive,"
and where did "external" disappear to, and what rules
govern such transformations?
Since the text file to be read from is preformatted I thought I could
use the fscanf() to to parse each line and assign it into structure
variables, but I am having problems.

Here is my code to read the file:

int readFile (char *filename, struct productData product[], size_t
arrLen)
/* Returns number of products read */
{
FILE *fp;

if ( ( fp = fopen( "product_catalog.txt", "rb+" ) ) == NULL ) {
printf( "File could not be opened.\n" );
} /* end if */
else
{
int i;
for (i=0; i<arrLen && !feof(fp); i++)
{
if (5 != fscanf(fp, "%d %s %s %f %d",
&product[i].idnumber,
product[i].description,
product[i].provider,
&product[i].cost,
&product[i].stock))
{
printf("Invalid file format\n");
fclose(fp);
return 0;
}
}
fclose(fp);
return i;
}
}

The problem seems to be that each field I want to parse seems to be
separated by a colon (:) Is there anyway to tell fscanf() to parse up
until you reach a colon and then stop and start scanning again, or
should I give up this approach and try to tokenize the input stream?
Any help is much appreciated.


"%s" will skip leading white space, grab a string,
and stop when it hits white space again. Hence, it's
no good for your input format, where white spaces can
occur as part of a data field.

You could use "%[^:]" to look for colon-delimited
fields, but the resulting program would be rather fragile.
One lousy line with an extra colon or a missing colon,
and you'll be out of step for the rest of the journey.
or until you trip and fall, whichever comes first.
(fscanf() is no respecter of line boundaries, and will
happily cross them in search of more input.)

Recommended approach: Use fgets() (but not gets()!!!)
to read each line into a big char[] array, and then pick
the line apart with other tools. sscanf() may be a choice
you'd find familiar -- and since sscanf() cannot run off
the end of its input array (and thus inadvertengly bypass
line boundaries), some of the infelicities of fscanf()
disappear.


I would suggest he keep things as simple as possible. He could use
my ggets() to input the lines, and my toksplit to parse them.
toksplit was published here a few days ago, just search the group
archives. ggets is available on my page at:

<http://cbfalconer.home.att.net/download/ggets.zip>

Then the code will look much like:

char *ln, *tmp;
int ix;
char tok[MAXTOKEN + 1]; /* allow for '0' always */

while (0 == ggets(&ln)) {
tmp = ln; ix = 0;
while (*tmp) {
tmp = toksplit(tmp, ':', tok, MAXTOKEN);
ix++; /* just to keep track of which token in line */
/* code to modify and output from tok */
/* probably best isolated in a separate function */
}
free(ln);
}

Notice that the only configuration constants are MAXTOKEN and what
the token delimiting character (':' here) actually is.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Apr 1 '06 #4
I am making some progress, but not much unfortunately. Using these two
code segments that I found from another post I was able to parse out
each field as a text file, the output looks like this:

Line number: 1
Token: 1010
Token: CD drive external 32x
Token: 1MagiCopy
Token: 15.5
Token: 100

Line number: 2
Token: 1020
Token: CD drive external 40x
Token: 20th Century Fox
Token: 16.74
Token: 130
size_t get_line( FILE *f , char *line, size_t len )
{
char *ptr;
ptr = fgets( line, len, f );
if( NULL == ptr ) {
line[0] = '\0';
return 0;
}
if( NULL != (ptr = strchr(line, DELIMITER)) ) *ptr = '\0';
return strlen(line);
}

while( 0 != get_line( fp, data, sizeof(data)) ) {
count++;
printf( "Line number: %d\n", count );
for( ptr0 = data; NULL != (ptr1 = strtok(ptr0, TOKEN)); ptr0 =
NULL )
printf( "Token: %s\n", ptr1 );
putchar( '\n' );
}
What I was going to do was assign each field value into an array of
structures, but it gives me a segmentation fault, is there another way
to achieve the main objective?

Apr 1 '06 #5
On 2006-04-01, bf******@gmail.com <bf******@gmail.com> wrote:
I am making some progress, but not much unfortunately. Using these two
code segments that I found from another post I was able to parse out
each field as a text file, the output looks like this:

Line number: 1
Token: 1010
Token: CD drive external 32x
Token: 1MagiCopy
Token: 15.5
Token: 100

Line number: 2
Token: 1020
Token: CD drive external 40x
Token: 20th Century Fox
Token: 16.74
Token: 130

size_t get_line( FILE *f , char *line, size_t len )
{
char *ptr;
ptr = fgets( line, len, f );
if( NULL == ptr ) {
line[0] = '\0';
return 0;
}
if( NULL != (ptr = strchr(line, DELIMITER)) ) *ptr = '\0';
return strlen(line);
}

while( 0 != get_line( fp, data, sizeof(data)) ) {
count++;
printf( "Line number: %d\n", count );
for( ptr0 = data; NULL != (ptr1 = strtok(ptr0, TOKEN)); ptr0 =
NULL )
printf( "Token: %s\n", ptr1 );
putchar( '\n' );
} What I was going to do was assign each field value into an array of
structures, but it gives me a segmentation fault, is there another way
to achieve the main objective?


If the main objective is just to print it all out again formatted
differently, you can maybe do that in the loop, and avoid having to
store the data.

But you should be able to fix the segmentation fault! The error might be
in part of the code we can't see-- it looks from "data, sizeof(data)"
that data is an array; where do you declare it? And how's the array of
structures created?

In any case, you reuse the same buffer for each line, so you're going to
have to actually copy the strings out somehow.

Guessing, but the problem may be that you're just copying the pointers,
but not duplicating the actual strings.

for( ptr0 = data; NULL != (ptr1 = strtok(ptr0, TOKEN)); ptr0 = NULL )

records[i].name = ptr1; /* very likely to be wrong */
records[i].name = strdup(ptr1); /* some chance of working */

HTH
Apr 1 '06 #6
bf******@gmail.com wrote:

I am making some progress, but not much unfortunately. Using these
two code segments that I found from another post I was able to
parse out each field as a text file, the output looks like this:


You reply to my posting, but ignore all that I suggested, and
refuse to quote proper context. I see no point in anyone
attempting to assist you further.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
Apr 2 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
6
by: Tuang | last post by:
I've been looking all over in the docs, but I can't figure out how you're *supposed* to parse formatted strings into numbers (and other data types, for that matter) in Python. In C#, you can say...
1
by: Mike Labman | last post by:
I'm parsing an XML document using the XMLDOM object in ASP. I am running into a problem with a section formatted like this: <MasterElement> TitleOfElement <Abbreviation> TOE </Abbreviation>...
35
by: .:mmac:. | last post by:
I have a bunch of files (Playlist files for media player) and I am trying to create an automatically generated web page that includes the last 20 or 30 of these files. The files are created every...
12
by: BGP | last post by:
I am working on a WIN32 API app using devc++4992 that will accept Dow Jones/NASDAQ/etc. stock prices as input, parse them, and do things with it. The user can just cut and paste back prices into a...
2
by: Steven T. Hatton | last post by:
I'm still not completely sure what's going on with C++ I/O regarding the extractors and inserters. The following document seems a bit inconsistent:...
9
by: Mantorok Redgormor | last post by:
If I am parsing a config file that uses '#' for comments and the config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is it recommended to use a) fgetc (parse a character at a...
5
by: msammart | last post by:
Hey, i have a payroll system and i'm tyring to have it so the user can select an option from the menu and then be able to change one of the employee's salaries based on the user ID input. ( data is...
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.