Safely working out how many rows in an array?

pkirk25

Assume an array of structs that is having rows added at random. By the
time it reaches your function, you have no idea if it has a few hundred
over over 10000 rows.

When your function recieves this array as an argument, is there a safe
way to establish how many rows that are or should I iterate over a
field I know will always be used and use the final value of the
iterator as the value of the array?

Sep 26 '06 #1

Subscribe Reply

2266

Richard Heathfield

pkirk25 said:

Assume an array of structs that is having rows added at random. By the
time it reaches your function, you have no idea if it has a few hundred
over over 10000 rows.

When your function recieves this array as an argument, is there a safe
way to establish how many rows that are or should I iterate over a
field I know will always be used and use the final value of the
iterator as the value of the array?

The trick is to keep track of how many elements the array has, and to pass
that information to functions that need it. Objects are relatively cheap.
Something as small as a size_t is practically free! So don't be squeamish
about using them.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Sep 26 '06 #2

Keith Thompson

"pkirk25" <pa*****@kirks.netwrites:

Assume an array of structs that is having rows added at random. By the
time it reaches your function, you have no idea if it has a few hundred
over over 10000 rows.

When your function recieves this array as an argument, is there a safe
way to establish how many rows that are or should I iterate over a
field I know will always be used and use the final value of the
iterator as the value of the array?

You need to rethink your question.

It's not possible to pass an array as an argument in C. What you can
do is, for example, pass the address of (equivalently: a pointer to)
an array's first element as an argument. This pointer can then be
used to access the elements of the array, but it doesn't tell you how
many there are.

The language does a few things that seemingly conspire to make it
*look* like you're passing the array itself, but you're really not.

As Richard Heathfield wrote, the most straightforward way to do this
is to keep track of the size yourself and pass it to your function as
an extra argument.

Other solutions exist.

You should also read section 6 of the comp.lang.c FAQ, available at
<http://www.c-faq.com>.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Sep 26 '06 #3

pkirk25

Richard Heathfield wrote:
[snip]

>
The trick is to keep track of how many elements the array has, and to pass
that information to functions that need it. Objects are relatively cheap.
Something as small as a size_t is practically free! So don't be squeamish

[snip]

A file has over 300k rows of which less than 100 are likely to be of
interest.

int ScanFile(const FILE *srcFile,
struct realm_structure *rlm,
int *i)
{
/*
1. do something very clever to find a match
2. update the struct so its a useful matrix
3. i++
*/
return 0;
}

Is this what you were thinking?

If i get the iteration of the count wrong in any 1 function, I could
end up confused. But as a way of dealing with the problem, it does
look very good.

Sep 26 '06 #4

Michael Mair

Richard Heathfield wrote:

pkirk25 said:

>>Assume an array of structs that is having rows added at random. By the
time it reaches your function, you have no idea if it has a few hundred
over over 10000 rows.

When your function recieves this array as an argument, is there a safe
way to establish how many rows that are or should I iterate over a
field I know will always be used and use the final value of the
iterator as the value of the array?

The trick is to keep track of how many elements the array has, and to pass
that information to functions that need it. Objects are relatively cheap.
Something as small as a size_t is practically free! So don't be squeamish
about using them.

Indeed.

@OP:
An alternative not using less but more space is using a data structure
different from an array if it is more suitable to your problem. Arrays
do not grow rows magically, so you have to deal with reallocation,
keeping track of the number of rows etc.
If you, for example, use linked lists, then insertion of new rows
is comparatively cheap and iteration costs (but for a constant factor)
the same as for an array.

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Sep 26 '06 #5

Andrea Laforgia

On 26 Sep 2006 16:02:31 -0700, "pkirk25" <pa*****@kirks.netwrote:

>When your function recieves this array as an argument, is there a safe
way to establish how many rows that are [...]

Usually, you pass the "dimension" of the array to your function.
A prototype of that function could be the following:

void yourFunction(int rows[], int rowCount);

Alternatively, you may want to use a "sentinel value".

An example of a an array terminated by a sentinel value is argv, one
of the two parameters of the function main():

int main(int argc, char *argv[])

The standard prescribes that argv[argc] shall be a null pointer, so
the following code is perfectly legal:

#include <stdio.h>

int main(int argc, char *argv[])
{
int count;
for (count=0; argv[count]; count++)
;
printf("argc=%d\n", argc, count);
return 0;
}

Sep 26 '06 #6

Andrea Laforgia

On Wed, 27 Sep 2006 01:27:28 +0200, Andrea Laforgia
<a.********@andrealaforgia.it.invalidwrote:

printf("argc=%d\n", argc, count);

Of course, it is:

printf("argc=%d\n", count);

I would demonstrate that argc == count.

Sep 26 '06 #7

William Ahern

On Tue, 26 Sep 2006 23:06:17 +0000, Richard Heathfield wrote:

pkirk25 said:

>Assume an array of structs that is having rows added at random. By the
time it reaches your function, you have no idea if it has a few hundred
over over 10000 rows.

When your function recieves this array as an argument, is there a safe
way to establish how many rows that are or should I iterate over a field
I know will always be used and use the final value of the iterator as
the value of the array?

The trick is to keep track of how many elements the array has, and to pass
that information to functions that need it. Objects are relatively cheap.
Something as small as a size_t is practically free! So don't be squeamish
about using them.

Put another way, don't ever discard information. Ideally, such information
is kept from the very beginning, and not derived at some intermediate
point.

Sep 26 '06 #8

Andrew Poelstra

On Tue, 2006-26-09 at 16:02 -0700, pkirk25 wrote:

Assume an array of structs that is having rows added at random. By the
time it reaches your function, you have no idea if it has a few hundred
over over 10000 rows.

When your function recieves this array as an argument, is there a safe
way to establish how many rows that are or should I iterate over a
field I know will always be used and use the final value of the
iterator as the value of the array?

I'm not sure if I understand your question, but the usual method for
doing this is to pass the length of the array into the function:

int my_function (char array[], size_t len);

--
Andrew Poelstra <http://www.wpsoftware.net/projects/>

Sep 26 '06 #9

Keith Thompson

Andrew Poelstra <ap*******@false.sitewrites:

On Tue, 2006-26-09 at 16:02 -0700, pkirk25 wrote:
>Assume an array of structs that is having rows added at random. By the
time it reaches your function, you have no idea if it has a few hundred
over over 10000 rows.

When your function recieves this array as an argument, is there a safe
way to establish how many rows that are or should I iterate over a
field I know will always be used and use the final value of the
iterator as the value of the array?

I'm not sure if I understand your question, but the usual method for
doing this is to pass the length of the array into the function:

int my_function (char array[], size_t len);

And keep in mind that, in a parameter declaration (and *only* in a
parameter declaration), "char array[]" really means "char *array".

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Sep 27 '06 #10

pkirk25

Many thanks all.

I curious why people prefer a size_t instead of int? On my PC, a quick
sizeof shows both are same size, though that could be down the the %d
in prinft("%d", sizeof(tSize_t))

Sep 27 '06 #11

Une bévue

Keith Thompson <ks***@mib.orgwrote:

>
As Richard Heathfield wrote, the most straightforward way to do this
is to keep track of the size yourself and pass it to your function as
an extra argument.

i'm still a newb in C, i don't use the same way as Richard Heathfield
wrote, rather i initiate the array with null value, then i can
iterate...

is that way of doing wrong ???
--
une bévue

Sep 27 '06 #12

a.laforgia

pkirk25 ha scritto:

I curious why people prefer a size_t instead of int?

The ISO standard states that "size_t" is the type of sizeof, so you
should use it to specify the "size" of your data. It is not guaranteed
that size_t matches the int type, but only that it is an unsigned type.
Using size_t instead of int is much more correct, since it is portable
and since data size is obviously and unsigned value.

Sep 27 '06 #13

Michael Mair

a.********@gmail.com wrote:

pkirk25 ha scritto:
>>I curious why people prefer a size_t instead of int?

The ISO standard states that "size_t" is the type of sizeof, so you
should use it to specify the "size" of your data. It is not guaranteed
that size_t matches the int type, but only that it is an unsigned type.
Using size_t instead of int is much more correct, since it is portable
and since data size is obviously and unsigned value.

Other examples:
strlen() returns size_t
malloc()'s parameter is of type size_t

There are several functions where one would expect a size_t return
value or parameter; that you find int (or another type) instead, is
mainly for historic reasons.

There is one main disadvantage: size_t is an unsigned integer type.
unsigned types need more care in some places; a classic is the
beloved
for (index = size-1; index >= 0; --index)
way of writing an endless loop.
There is no corresponding signed type which is a pity (POSIX gives
you ssize_t which is useful in many respects).

If I start something new and have the choice, then I go for size_t.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Sep 27 '06 #14

Richard Heathfield

Une bévue said:

Keith Thompson <ks***@mib.orgwrote:

>>
As Richard Heathfield wrote, the most straightforward way to do this
is to keep track of the size yourself and pass it to your function as
an extra argument.

i'm still a newb in C, i don't use the same way as Richard Heathfield
wrote, rather i initiate the array with null value, then i can
iterate...

is that way of doing wrong ???

Show us what you mean, using a small C program that compiles correctly, and
we'll find out together.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Sep 27 '06 #15

Keith Thompson

pe*******@laponie.com.invalid (Une bévue) writes:

Keith Thompson <ks***@mib.orgwrote:
>As Richard Heathfield wrote, the most straightforward way to do this
is to keep track of the size yourself and pass it to your function as
an extra argument.

i'm still a newb in C, i don't use the same way as Richard Heathfield
wrote, rather i initiate the array with null value, then i can
iterate...

is that way of doing wrong ???

It depends on what you mean. I don't know what "initiate the array
with null value" is supposed to mean.

One way to let a function know how many elements an array has is to
use a sentinal value, i.e., assign some unique value to the last
element of the array. If it happens to be an array of pointers, and a
null pointer is not otherwise a valid value, then marking the end of
the array with a null pointer value is a reasonable approach.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Sep 27 '06 #16

bert

Keith Thompson wrote:

pe*******@laponie.com.invalid (Une bévue) writes:
i'm still a newb in C, i don't use the same way as Richard Heathfield
wrote, rather i initiate the array with null value, then i can
iterate...

is that way of doing wrong ???

It depends on what you mean. I don't know what "initiate the array
with null value" is supposed to mean.

I think it means initially filling the whole array with
the end-of-array sentinel value. Then when you add
a new element, it already has a sentinel value after it.
--

Sep 27 '06 #17

Une bévue

bert <be************@btinternet.comwrote:

>
I think it means initially filling the whole array with
the end-of-array sentinel value. Then when you add
a new element, it already has a sentinel value after it.

right, this method is ok ?

nb : this is the case when the max number of elements is known in
advance, otherwise i've a minimum count of elements to start with, when
the elements number is over this min the array is arranged such that i
don't have to malloc for it rather for the next i set to null.
--
une bévue

Sep 27 '06 #18

bert

Une bévue wrote:

bert <be************@btinternet.comwrote:

I think it means initially filling the whole array with
the end-of-array sentinel value. Then when you add
a new element, it already has a sentinel value after it.

right, this method is ok ?

nb : this is the case when the max number of elements is known in
advance, otherwise i've a minimum count of elements to start with, when
the elements number is over this min the array is arranged such that i
don't have to malloc for it rather for the next i set to null.

Well, there's nothing that anybody could call wrong with
this method. True, some people would find sentinel
values not mixing well with dynamic reallocation, but it
still comes down to a matter of personal taste, and what
style or mixture you find readable and maintainable.
--

Sep 28 '06 #19

bert

Une bévue wrote:

bert <be************@btinternet.comwrote:

I think it means initially filling the whole array with
the end-of-array sentinel value. Then when you add
a new element, it already has a sentinel value after it.

right, this method is ok ?

nb : this is the case when the max number of elements is known in
advance, otherwise i've a minimum count of elements to start with, when
the elements number is over this min the array is arranged such that i
don't have to malloc for it rather for the next i set to null.

Nobody could call it a wrong method. Some people find that
sentinel values do not mix well with dynamic reallocation, but
it's still just a matter of personal taste, and what style
or mixture you have found to be readable and maintainable.
--

Sep 28 '06 #20

Malcolm

"pkirk25" <pa*****@kirks.netwrote in message

Many thanks all.

I curious why people prefer a size_t instead of int? On my PC, a quick
sizeof shows both are same size, though that could be down the the %d
in prinft("%d", sizeof(tSize_t))

It's an uglifiaction of the langauge to get round the largely theoretical
problem, what if a memory object is larger than the range of an int?
--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.

Sep 28 '06 #21

Chris Torek

In article <f6********************@bt.com>
Malcolm <re*******@btinternet.comwrote:

>[size_t is] an uglifiaction of the langauge to get round the
largely theoretical problem, what if a memory object is larger
than the range of an int?

Hardly "theoretical", since this is in fact the case on Solaris
on SPARC, MIPS, and Intel x86-64, to name three.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Sep 28 '06 #22

Bill Reid

Michael Mair <Mi**********@invalid.invalidwrote in message
news:4n************@individual.net...

a.********@gmail.com wrote:
pkirk25 ha scritto:

>I curious why people prefer a size_t instead of int?
The ISO standard states that "size_t" is the type of sizeof, so you
should use it to specify the "size" of your data. It is not guaranteed
that size_t matches the int type, but only that it is an unsigned type.
Using size_t instead of int is much more correct, since it is portable
and since data size is obviously and unsigned value.

Other examples:
strlen() returns size_t
malloc()'s parameter is of type size_t

So what's the deal about using say, an unsigned long (which is
what I usually use for data sizes) with parameters that are declared
as size_t?

Reason I ask is I was just fooling around with fread() and
fwrite(), using the same old unsigned longs that I always use
as data sizes, as the size of the elements to read and write.

The puzzling results: fread() works just fine, fwrite() blows
up and causes an exception, and writes extraneous garbage at
the end of the new file, UNTIL I deliberately declare an extra
size_t variable just to assign the existing unsigned long so I can
use THAT as the parameter.

Wazzup wid dat? I don't ever remember seeing something
like that before...

---
William Ernest Reid

Oct 2 '06 #23

Ian Collins

Bill Reid wrote:

Michael Mair <Mi**********@invalid.invalidwrote in message
news:4n************@individual.net...

>>a.********@gmail.com wrote:

>>>pkirk25 ha scritto:

>>>>I curious why people prefer a size_t instead of int?

The ISO standard states that "size_t" is the type of sizeof, so you
should use it to specify the "size" of your data. It is not guaranteed
that size_t matches the int type, but only that it is an unsigned type.
Using size_t instead of int is much more correct, since it is portable
and since data size is obviously and unsigned value.

Other examples:
strlen() returns size_t
malloc()'s parameter is of type size_t

So what's the deal about using say, an unsigned long (which is
what I usually use for data sizes) with parameters that are declared
as size_t?

Reason I ask is I was just fooling around with fread() and
fwrite(), using the same old unsigned longs that I always use
as data sizes, as the size of the elements to read and write.

The puzzling results: fread() works just fine, fwrite() blows
up and causes an exception, and writes extraneous garbage at
the end of the new file, UNTIL I deliberately declare an extra
size_t variable just to assign the existing unsigned long so I can
use THAT as the parameter.

Example please, it shouldn't matter, you can pass unsigned char to
fwrite (as long as the size fits) and it will be promoted to whatever
type size_t is. Odds are your system uses unsigned long for size_t.

--
Ian Collins.

Oct 2 '06 #24

Keith Thompson

"Bill Reid" <ho********@happyhealthy.netwrites:

Michael Mair <Mi**********@invalid.invalidwrote in message
news:4n************@individual.net...
>a.********@gmail.com wrote:
pkirk25 ha scritto:

>>I curious why people prefer a size_t instead of int?

The ISO standard states that "size_t" is the type of sizeof, so you
should use it to specify the "size" of your data. It is not guaranteed
that size_t matches the int type, but only that it is an unsigned type.
Using size_t instead of int is much more correct, since it is portable
and since data size is obviously and unsigned value.

Other examples:
strlen() returns size_t
malloc()'s parameter is of type size_t

So what's the deal about using say, an unsigned long (which is
what I usually use for data sizes) with parameters that are declared
as size_t?

If a prototype is in scope (which usually just means that you have a
"#include" for the appropriate header), this shouldn't be a problem,
as long as the value is within the range of both unsigned long and
size_t. The compiler knows the type of the argument expression
(unsigned long) and the type of the parameter (size_t), and provides
an implicit conversion if it's needed.

(If you *don't* have a prototype in scope, then you'll likely invoke
undefined behavior as soon as you call the function. Don't do that.)

Reason I ask is I was just fooling around with fread() and
fwrite(), using the same old unsigned longs that I always use
as data sizes, as the size of the elements to read and write.

The puzzling results: fread() works just fine, fwrite() blows
up and causes an exception, and writes extraneous garbage at
the end of the new file, UNTIL I deliberately declare an extra
size_t variable just to assign the existing unsigned long so I can
use THAT as the parameter.

My best guess is that you're missing the required "#include
<stdio.h>". If so, you should do (at least) two things: add the
"#include <stdio.h>", and learn how to invoke your compiler so it
warns you about errors like this.

Beyond that, it's impossible to guess what's happening without seeing
some actual code.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Oct 2 '06 #25

Bill Reid

Ian Collins <ia******@hotmail.comwrote in message
news:4o************@individual.net...

Bill Reid wrote:

So what's the deal about using say, an unsigned long (which is
what I usually use for data sizes) with parameters that are declared
as size_t?

Reason I ask is I was just fooling around with fread() and
fwrite(), using the same old unsigned longs that I always use
as data sizes, as the size of the elements to read and write.

The puzzling results: fread() works just fine, fwrite() blows
up and causes an exception, and writes extraneous garbage at
the end of the new file, UNTIL I deliberately declare an extra
size_t variable just to assign the existing unsigned long so I can
use THAT as the parameter.
Example please, it shouldn't matter, you can pass unsigned char to
fwrite (as long as the size fits) and it will be promoted to whatever
type size_t is. Odds are your system uses unsigned long for size_t.

I believe it uses unsigned int which is the same 32 bits used for
unsigned long (and short too I think!). I keep using long because
I can't help but believe that long is a bigger number...

I'm not giving an example because I figured out what I was doing
wrong and man is my face red...I may describe it in another post,
but the actual code, NEVER!

---
William Ernest Reid

Oct 3 '06 #26

Bill Reid

Keith Thompson <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...

"Bill Reid" <ho********@happyhealthy.netwrites:

>
So what's the deal about using say, an unsigned long (which is
what I usually use for data sizes) with parameters that are declared
as size_t?

If a prototype is in scope (which usually just means that you have a
"#include" for the appropriate header), this shouldn't be a problem,
as long as the value is within the range of both unsigned long and
size_t. The compiler knows the type of the argument expression
(unsigned long) and the type of the parameter (size_t), and provides
an implicit conversion if it's needed.

Yeah, it was never a problem before, but you know...

"There can be only one possible explanation. It must be due to human
error. These kinds of things have cropped up before, and it was always
due to human error."

(If you *don't* have a prototype in scope, then you'll likely invoke
undefined behavior as soon as you call the function. Don't do that.)

Reason I ask is I was just fooling around with fread() and
fwrite(), using the same old unsigned longs that I always use
as data sizes, as the size of the elements to read and write.

The puzzling results: fread() works just fine, fwrite() blows
up and causes an exception, and writes extraneous garbage at
the end of the new file, UNTIL I deliberately declare an extra
size_t variable just to assign the existing unsigned long so I can
use THAT as the parameter.

My best guess is that you're missing the required "#include
<stdio.h>". If so, you should do (at least) two things: add the
"#include <stdio.h>", and learn how to invoke your compiler so it
warns you about errors like this.

Nope...

"I think you missed it."

Beyond that, it's impossible to guess what's happening without seeing
some actual code.

Not gonna happen, I figured out what the problem was. I was
stupidly free()ing the entire structure and associated text editor buffer
before writing it to the file. Strangely, that didn't seem to affect the
text in the buffer at all, but somehow goofed up the part of the
structure with the buffer size.

So it's all fixed now and works fine...just remember in the future that
when I ask a really stupid question, I'll probably figure it out myself on
my own in a week or so...

---
William Ernest Reid

Oct 3 '06 #27

Christopher Layne

Bill Reid wrote:

I believe it uses unsigned int which is the same 32 bits used for
unsigned long (and short too I think!). I keep using long because
I can't help but believe that long is a bigger number...

If you need an integer type for "sizes" or "lengths" start using size_t. Then
the issue of "is long, long enough?" becomes a non-issue. This is a reason
why size_t exists.

Oct 3 '06 #28

Safely working out how many rows in an array?

Similar topics