473,320 Members | 1,982 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Looking for advice on how to deal with array of structs

Hi,

I am reading a fairly large file a line at a time, doing some
processing, and filtering out bits of the line. I am storing the
interesting information in a struct and then printing it out. This
works without any problems. I now would like to filter "duplicate"
records. They aren't really duplicate, I can't just qsort as is and
eliminate the matching rows. I have number of fields that will be the
same, but some of the fields will differ and the timestamp may be off
by a second or two. I want to eliminate records that have the fields
that match where the difference between timestamps is less than n
seconds. Again, this is not a problem, I can get seconds since epoch
and compare, or just use difftime. My problem is this, I want to build
an array of structs, that will hold lines that match, sort them on the
date member and then eliminate based on matching timestamps.

For example in the set:
foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
foo,baz ..., Thu Jan 25 01:40:11 EST 2007

I would like to read the first 4 lines into structs, store them in
array, sort them and then print them out, while eliminating the
"duplicate" line 4. I would not want to read the 5th line, yet, because
it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
ignoring the field that has a or b in it (one of the reasons a simple
sort will not work)

My problem comes from not knowing where to create the array, what size
to allocate, and how to re-initialize it when I move to the next set to
sort.

I have included a mix of psuedo code and real code. I have also made
everything generic. My ultimate questions are:

* Is it a good idea to declare the struct instance outside the while
loop and the reinitialize it every time through the loop, or would it
be better to make it local? The real struct is 132 bytes.

* Is this the best way to (re)initialize a struct (init_cr)?

* How do I (re)initialize an array of struct?

* any comments on how I plan to tackle this

Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

typedef struct {
char field1[11];
char field1[11];
time_t date_secs;
} record;

int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
record cr, lr;
int len = 512;
char buf[len+1];

record records[11];

fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);
}

while(fgets(buf, len, fp)) {
init_cr(&cr);

/* split line into fields */
/* process fields and save in struct */

/* if the fields of interest match last fields */
/* save struct in next position in array */
/* increment counter for how many structs are saved */
/* resize the array to accommodate more structs if needed
*/
/* else */
/* print array of structs */
/* reset array ??? */
}
return 0;
}

void init_cr(callrecord *cr) {
cr->field1[0] = '\0';
cr->field2[0] = '\0';
cr->date_secs = 0;
}
Cliff

Jan 25 '07 #1
11 2592
Cliff Martin wrote:
Hi,

I am reading a fairly large file a line at a time, doing some
processing, and filtering out bits of the line. I am storing the
interesting information in a struct and then printing it out. This
works without any problems. I now would like to filter "duplicate"
records. They aren't really duplicate, I can't just qsort as is and
eliminate the matching rows. I have number of fields that will be the
same, but some of the fields will differ and the timestamp may be off
by a second or two. I want to eliminate records that have the fields
that match where the difference between timestamps is less than n
seconds. Again, this is not a problem, I can get seconds since epoch
and compare, or just use difftime. My problem is this, I want to build
an array of structs, that will hold lines that match, sort them on the
date member and then eliminate based on matching timestamps.

For example in the set:
foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
foo,baz ..., Thu Jan 25 01:40:11 EST 2007

I would like to read the first 4 lines into structs, store them in
array, sort them and then print them out, while eliminating the
"duplicate" line 4. I would not want to read the 5th line, yet, because
it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
ignoring the field that has a or b in it (one of the reasons a simple
sort will not work)

My problem comes from not knowing where to create the array, what size
to allocate, and how to re-initialize it when I move to the next set to
sort.
>From your requirements it appears that a linked list will be a better
option than an array of structs. It's easy to keep it sorted as you add
elements to the list or remove them.
I have included a mix of psuedo code and real code. I have also made
everything generic. My ultimate questions are:

* Is it a good idea to declare the struct instance outside the while
loop and the reinitialize it every time through the loop, or would it
be better to make it local? The real struct is 132 bytes.
Depends on the desired lifetime for the structure instance.
* Is this the best way to (re)initialize a struct (init_cr)?

* How do I (re)initialize an array of struct?
Use a FOR loop.
* any comments on how I plan to tackle this

Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

typedef struct {
char field1[11];
char field1[11];
Two objects in scope with the same identifier.
time_t date_secs;
} record;

int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
This is not guaranteed to work on all C implementations.
record cr, lr;
int len = 512;
char buf[len+1];
Nor this.
record records[11];

fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);
Use EXIT_FAILURE instead of 1, unless you have a good reason for using
the latter.
}

while(fgets(buf, len, fp)) {
fgets() will store only len-1 characters into the buffer so sizeof(buf)
would do.
init_cr(&cr);

/* split line into fields */
/* process fields and save in struct */

/* if the fields of interest match last fields */
/* save struct in next position in array */
/* increment counter for how many structs are saved */
/* resize the array to accommodate more structs if needed
*/
Note that you cannot resize a statically allocated array, unless it's a
VLA. Allocate using malloc() and resize with realloc().
/* else */
/* print array of structs */
/* reset array ??? */
If the next iteration will rewrite to the array, then probably this is
not needed.
}
return 0;
}

void init_cr(callrecord *cr) {
cr->field1[0] = '\0';
cr->field2[0] = '\0';
cr->date_secs = 0;
}
I still feel a linked list of structures may serve your purpose better.

Jan 25 '07 #2
typedef struct {
char field1[11];
char field1[11];Two objects in scope with the same identifier.
should be field2, but just for the example. real code has about 12
different identifiers, all of which are unique.
int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
This is not guaranteed to work on all C implementations.
assigning to a file pointer? So I should just open stdin like a regular
file if the fp is NULL? Is the a compiler option to force this usage?
I'm using gcc.
record cr, lr;
int len = 512;
char buf[len+1];
Nor this.
what's wrong with this?
exit(1);
Use EXIT_FAILURE instead of 1, unless you have a good reason for using
the latter.
ok.
while(fgets(buf, len, fp)) {
fgets() will store only len-1 characters into the buffer so sizeof(buf)
would do.
did not know this. Use this style everywhere. I will correct it.
*/Note that you cannot resize a statically allocated array, unless it's a
VLA. Allocate using malloc() and resize with realloc().
What is a VLA - Very Large Array?

Thanks,

Cliff

Jan 25 '07 #3
OK, I am using the command line options -ansi and -pedantic. They tell
me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.

The compiler does not catch the assigning stdin to fp, how can I
enforce that?

Cliff

Jan 25 '07 #4
On Jan 25, 9:16 am, "Cliff Martin" <cliff.mar...@gmail.comwrote:
OK, I am using the command line options -ansi and -pedantic. They tell
me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.
Never mind about the mixing declarations and code, I just had some
lines out of order.

Cliff

Jan 25 '07 #5
Cliff Martin wrote:

(You need to quote more context. Remember not all of us are reading
this on Google Groups.)
OK, I am using the command line options -ansi and -pedantic. They tell
me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.
The preceeding lines are:
int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
So you have a declaration (`record ...`) following a statement
(`fp = stdin;`). You can't do this in C90 (which is what the
-ansi -pedantic options get you, I believe) [you /can/ in C99,
and it's a widely available extension].

Easy fix:

FILE *fp = stdin;

Now it's a declaration.
The compiler does not catch the assigning stdin to fp, how can I
enforce that?
You don't need to. (I think santosh may have meant that you were
mixing declarations and statements: assigning `stdin` to a FILE*
variable seems OK to me.)

--
Chris "electric hedgehog" Dollin
"A facility for quotation covers the absence of original thought." /Gaudy Night/

Jan 25 '07 #6


"santosh" <sa*********@gmail.comwrote in message
news:11**********************@j27g2000cwj.googlegr oups.com...
Cliff Martin wrote:
>Hi,

I am reading a fairly large file a line at a time, doing some
processing, and filtering out bits of the line. I am storing the
interesting information in a struct and then printing it out. This
works without any problems. I now would like to filter "duplicate"
records. They aren't really duplicate, I can't just qsort as is and
eliminate the matching rows. I have number of fields that will be the
same, but some of the fields will differ and the timestamp may be off
by a second or two. I want to eliminate records that have the fields
that match where the difference between timestamps is less than n
seconds. Again, this is not a problem, I can get seconds since epoch
and compare, or just use difftime. My problem is this, I want to build
an array of structs, that will hold lines that match, sort them on the
date member and then eliminate based on matching timestamps.

For example in the set:
foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
foo,baz ..., Thu Jan 25 01:40:11 EST 2007

I would like to read the first 4 lines into structs, store them in
array, sort them and then print them out, while eliminating the
"duplicate" line 4. I would not want to read the 5th line, yet, because
it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
ignoring the field that has a or b in it (one of the reasons a simple
sort will not work)

My problem comes from not knowing where to create the array, what size
to allocate, and how to re-initialize it when I move to the next set to
sort.
>>From your requirements it appears that a linked list will be a better
option than an array of structs. It's easy to keep it sorted as you add
elements to the list or remove them.
A somewhat simpler data structure is an array pointers, each of which points
to a structure.

In this case, insert operations boil down to allocating memory for the new
structure, perhaps realloc()'ing the array of pointers (I'd arrange not to
do this every time), then moving a bunch of pointers down by one element and
putting the new pointer in.

It is likely that memmove()'ing a bunch of pointers is not more expensive
than traversing a linked list to find the right insert point.

However, on a modern machine and with less than 1E6 records, I doubt anyone
would notice a difference.

--
David T. Ashley (dt*@e3ft.com)
http://www.e3ft.com (Consulting Home Page)
http://www.dtashley.com (Personal Home Page)
http://gpl.e3ft.com (GPL Publications and Projects)
Jan 25 '07 #7
Cliff Martin wrote:
>
OK, I am using the command line options -ansi and -pedantic. They
tell me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.
Possibly because you are mixing declarations and code.
>
The compiler does not catch the assigning stdin to fp, how can I
enforce that?
The crystal ball is murky, but appears to be saying 'line 142'.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Jan 25 '07 #8
On 24 Jan 2007 23:52:01 -0800, "santosh" <sa*********@gmail.com>
wrote:
Cliff Martin wrote:
#include <stdio.h>
FILE *fp;
fp = stdin;

This is not guaranteed to work on all C implementations.
Yes it is. Well, all conforming ones except freestanding which don't
have stdio at all, but I don't think that's what you meant.

Copying an _actual_ FILE
FILE fobject = * stdin;
and using it is not guaranteed to work, although usually it does.

Assigning _to_ stdin,out,err is not guaranteed to work, and often
doesn't.

Copying any FILE* including stdin,out,err to another pointer and using
it (consistent with the open and any other uses) is legal, and
sometimes useful, although in this example (snipped) it was not useful
because it was subsequently unconditionally overwritten.
char buf[len+1];

Nor this.
Correct there.
while(fgets(buf, len, fp)) {

fgets() will store only len-1 characters into the buffer so sizeof(buf)
would do.
Up to len-1 characters from the input including the \n if reached,
PLUS the terminating null byte, for a total of at most the size given,
so yes sizeof(buf) would do.

Nit: and the parentheses aren't needed for sizeof primaryexpr, nor
sizeof *ptr or sizeof ptr->field or sizeof obj.field, although they
are for some more complicated expressions and typenames.

- David.Thompson1 at worldnet.att.net
Feb 6 '07 #9
Dave Thompson wrote:
>
.... snip ...
>
Copying any FILE* including stdin,out,err to another pointer and
using it (consistent with the open and any other uses) is legal,
and sometimes useful, although in this example (snipped) it was
not useful because it was subsequently unconditionally overwritten.
No it wasn't (unconditionally overwritten). Look again. That was
the whole point of the extract, since stdin is a text file, and
when overwritten it would be a binary file.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Feb 6 '07 #10
On Tue, 06 Feb 2007 08:12:30 -0500, CBFalconer <cb********@yahoo.com>
wrote:
Dave Thompson wrote:
... snip ...

Copying any FILE* including stdin,out,err to another pointer and
using it (consistent with the open and any other uses) is legal,
and sometimes useful, although in this example (snipped) it was
not useful because it was subsequently unconditionally overwritten.

No it wasn't (unconditionally overwritten). Look again. That was
the whole point of the extract, since stdin is a text file, and
when overwritten it would be a binary file.
From original:
int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
record cr, lr;
int len = 512;
char buf[len+1];

record records[11];

fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);
}

while(fgets(buf, len, fp)) {
<snip rest>

Looks unconditional to me.

Setting a file pointer to either stdin or fopen (argv[i], m) and then
using it is indeed often useful, but not in the code posted.

- David.Thompson1 at worldnet.att.net
Feb 15 '07 #11
Dave Thompson wrote:
>
.... snip ...
>
From original:
int main(int argc, char *argv[]) {
.... snip ...
fp = fopen(argv[1], "r");
<snip rest>

Looks unconditional to me.

Setting a file pointer to either stdin or fopen (argv[i], m) and then
using it is indeed often useful, but not in the code posted.
We are talking about different code.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Feb 15 '07 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Kieran Simkin | last post by:
Hi, I wonder if anyone can help me, I've been headscratching for a few hours over this. Basically, I've defined a struct called cache_object: struct cache_object { char hostname; char ipaddr;...
5
by: Paminu | last post by:
Why make an array of pointers to structs, when it is possible to just make an array of structs? I have this struct: struct test { int a; int b;
26
by: Brett | last post by:
I have created a structure with five fields. I then create an array of this type of structure and place the structure into an array element. Say index one. I want to assign a value to field3 of...
13
by: Alan Silver | last post by:
Hello, MSDN (amongst other places) is full of helpful advice on ways to do data access, but they all seem geared to wards enterprise applications. Maybe I'm in a minority, but I don't have those...
7
by: heddy | last post by:
I have an array of objects. When I use Array.Resize<T>(ref Object,int Newsize); and the newsize is smaller then what the array was previously, are the resources allocated to the objects that are...
232
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first...
6
by: =?Utf-8?B?QWxleGFuZGVyZmU=?= | last post by:
Hi, I have a C# program that uses an unmanaged dll that has a function similar to the signature below : void f(out MyStruct arr, out int num); // num = actual array length returned The array...
5
by: dev_15 | last post by:
Hi, I'm going through some code and thought that this allocates an array of structs but its supposed according to comments to allocate an array of pointer to structs. What does it actually do ...
2
by: jonpb | last post by:
Using .NET 3.5, I need to pass an array of structs as parameter to a C++ unmanaged function. The C++ dll stores some data in an unmanaged cache, the function writes the values into the array of...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.