Proper way to input a dynamically-allocated string

Michel Rouzic

I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Dec 9 '05 #1

Subscribe Post Reply

2195

Dag-Erling Smørgrav

"Michel Rouzic" <Mi********@yahoo.fr> writes:

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

One common idiom is the following:

#include <stdio.h>
#include <stdlib.h>

void *
frealloc(void *ptr, size_t sz)
{
void *tmp;

if ((tmp = realloc(ptr, sz)) != NULL)
return (tmp);
free(ptr);
return (NULL);
}

char *
getline(FILE *f)
{
char *str = NULL;
size_t sz = 0;
int ch;

for (size_t len = 0; ; ++len) {
ch = fgetc(f);
if (ch == EOF && !len)
return (NULL);
if (len == sz)
str = frealloc(str, sz = sz * 2 + 1);
if (ch == EOF || ch == '\n') {
str[len] = '\0';
return (str);
} else {
str[len] = ch;
}
}
}

However, on average, about 25% of the allocated memory will be wasted.
You can fix that by replacing

return (str);
with
return (frealloc(str, len + 1));

but you may still end up losing quite a bit to heap fragmentation,
depending on how good your system's malloc() implementation is.

DES
--
Dag-Erling Smørgrav - de*@des.no

Dec 9 '05 #2

Eric Sosman

Michel Rouzic wrote:

I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.

--
Eric Sosman
es*****@acm-dot-org.invalid

Dec 9 '05 #3

slebetman

Michel Rouzic wrote:

I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done? From stdin is a bit of a problem. The usual answer is to use a buffer

to temporarily store the string:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 100

int main () {
char buffer[BUFFER_SIZE];
char *inputString = NULL;
size_t inLen = 0;

while (fgets(buffer, BUFFER_SIZE, stdin) != NULL) {
inLen += strlen(buffer);
if (inputString == NULL) {
inputString = malloc(inLen);
inputString[0] = '\0';
} else {
inputString = realloc(inputString, inLen);
}
if (inputString == NULL) {
/* malloc or realloc failed */
exit(-1);
}
strcat(inputString, buffer);
/* check for newline */
if(inputString[inLen-1] == '\n') {

/* process input here */

/* then remember to free inputString */
free(inputString);
inputString = NULL;
}
}
}

Dec 9 '05 #4

Michel Rouzic

Eric Sosman wrote:

Michel Rouzic wrote:
I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.

That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring[i]=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';

Dec 9 '05 #5

Keith Thompson

"Michel Rouzic" <Mi********@yahoo.fr> writes:
[snip]

That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring[i]=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';

The speed of that code while reading input probably isn't going to be
much of an issue, but the multiple calls to realloc() might cause
excessive heap fragmentation, which could cause problems elsewhere in
your program. (The standard does use the term "heap", but you get the
idea.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Dec 9 '05 #6

Eric Sosman

Michel Rouzic wrote:

Eric Sosman wrote:
Michel Rouzic wrote:

I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.

That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring[i]=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';

That's the general idea. As written, though, it's not
very robust: it's oblivious to realloc() failures and to
end-of-file or errors on the standard input. Pay attention
to the Sixth Commandment at

http://www.lysator.liu.se/c/ten-commandments.html

Other observations: `int' should probably be `size_t',
and see Keith Thompson's response for one of the reasons a
character-at-a-time expansion may not work well. ("Others
will occur to your thought." -- Gandalf)

--
Eric Sosman
es*****@acm-dot-org.invalid

Dec 9 '05 #7

Michel Rouzic

Keith Thompson wrote:

"Michel Rouzic" <Mi********@yahoo.fr> writes:
[snip]
That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring[i]=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';

The speed of that code while reading input probably isn't going to be
much of an issue, but the multiple calls to realloc() might cause
excessive heap fragmentation, which could cause problems elsewhere in
your program. (The standard does use the term "heap", but you get the
idea.)

does it mean that the elements of my array won't be contigous?

Dec 9 '05 #8

Michel Rouzic

Eric Sosman wrote:

Michel Rouzic wrote:
Eric Sosman wrote:
Michel Rouzic wrote:
I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.

That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring[i]=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';

That's the general idea. As written, though, it's not
very robust: it's oblivious to realloc() failures and to
end-of-file or errors on the standard input. Pay attention
to the Sixth Commandment

"If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the
checks triple the size of thy code..."

ok, I guess I should do that but.... well... as it says, it makes the
code much longer and makes it harder to read and, well, I wouldn't know
what to do with an error anyways, I mean, if the return code ain't
right, i might display something like "the return code ain't right\n"
and don't know what I should do.

realloc() failures? never heard of that (maybe cuz im quite new to all
that), what can happen with it? how can you get an EOF or some error
from stdin?

you know, I care to know about why I should check for errors, and about
what can cause them and what I should do about it, but so far (and you
can understand that) I like to keep my code as simple as possible,
mostly that I consider that my program should work only as far as it is
used correctly (like, if a program is supposed to have a .wav file in
input, and the user put a .mp3 or anything else instead, I don't wanna
bother with doing stuff that will tell him "you need to input a .wav
file", i rather let him have a segmentation fault)

Dec 9 '05 #9

Michel Rouzic

Eric Sosman wrote:

Other observations: `int' should probably be `size_t'

as for the size_t thing, well i could cast it for realloc, like this :
mystring=realloc(mystring, (size_t) i+1); other than that, I think i
should leave it to int, unless it is ok to do iterations and refer to
some element of an array by a size_t, which i doubt

Dec 9 '05 #10

Jordan Abel

On 2005-12-09, Michel Rouzic <Mi********@yahoo.fr> wrote:

Eric Sosman wrote:
Michel Rouzic wrote:
> Eric Sosman wrote:
>
>>Michel Rouzic wrote:
>>
>>
>>>I know it must sound like a newbie question, but I never really had to
>>>bother with that before, and I didn't even find an answer in the c.l.c
>>>FAQ
>>>
>>>I'd like to know what's the really proper way for input a string in an
>>>array of char that's dynamically allocated. I mean, I wish not to see
>>>any such things as char mystring[100]; I don't want to see any number
>>>(if possible) I just want to declare something like char *mystring; and
>>>then I don't know how allocate it with just as many chars (with the
>>>space for the \0 of course) as you get from stdin.
>>>
>>>I'd really like to know once for all what's the smartest way of
>>>inputing strings from stdin and storing them in a way so they take just
>>>the needed space and I don't want to see any number such as 100 or
>>>10,000 or even 4,294,967,296 in my code. Any way it can be done?
>>
>> Let's suppose you're reading complete '\n'-terminated lines,
>>the way fgets() does but with no explicit length limit. You
>>could do something like this (pseudocode, no error checking):
>>
>> buffer = <empty>
>> do {
>> expand buffer with realloc()
>> append next input character
>> } while (character wasn't '\n');
>> expand buffer with realloc()
>> append '\0'
>>
>> For efficiency's sake you'd probably want to avoid quite
>>so many trips in and out of the memory allocator, so a refinement
>>would be to start with a roomier buffer and expand by more than
>>one character at a time if necessary. (My own function for doing
>>this -- everybody writes one eventually -- begins with 100 characters
>>and adds half the buffer's current size each time it needs to expand:
>>100, 150, 225, ...)
>>
>> Once you've read the entire line you can, if you like, realloc()
>>the buffer one final time to trim it to the exact size. I find
>>that's seldom worth the bother: your program is probably going to
>>process the line and free() or re-use the buffer pretty soon.
>
>
> That's the method I like the best. I wouldn't bother with make larger
> buffers anyways, would make things too complicated, and things don't
> have to be sooo efficient when it comes to inputting strings, what
> matters the most is the result. i think I made your idea work pretty
> good, tell me if you think yous potted anything wrong with it, I GDBed
> it and the content of the memory looked fine
>
> int i;
> char *mystring;
>
> i=0;
> mystring=NULL;
> do
> {
> mystring=realloc(mystring, i+1);
> mystring[i]=getchar();
> i++;
> }
> while (mystring[i-1]!='\n');
> mystring[i-1]='\0';
That's the general idea. As written, though, it's not
very robust: it's oblivious to realloc() failures and to
end-of-file or errors on the standard input. Pay attention
to the Sixth Commandment

"If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the
checks triple the size of thy code..."

unless you _really_ don't care whether it succeeds or fails

what are you going to do if a printf fails? what if you just want to
continue?
realloc() failures? never heard of that (maybe cuz im quite new to all
that), what can happen with it?
it returns a null pointer because it didn't find enough memory
how can you get an EOF or some error from stdin?
someone types the EOF character, or interrupts [sending a signal on read
can possibly cause a read to fail in addition to calling a signal
handler]
you know, I care to know about why I should check for errors, and about
what can cause them and what I should do about it, but so far (and you
can understand that) I like to keep my code as simple as possible,
mostly that I consider that my program should work only as far as it is
used correctly (like, if a program is supposed to have a .wav file in
input, and the user put a .mp3 or anything else instead, I don't wanna
bother with doing stuff that will tell him "you need to input a .wav
file", i rather let him have a segmentation fault)

if you do something that can cause a segmentation fault, it could very
well cause worse.

Dec 9 '05 #11

grayhag

Eric Sosman wrote:

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

The way it was done in K&P Programming Practice,
where they started with a one char sized buffer
and doubled the size every time there was no space left.
I`s a good "golden middle" type idea, i think.

Dec 9 '05 #12

Jordan Abel

On 2005-12-10, grayhag <ue***@yahoo.com> wrote:

Eric Sosman wrote:
For efficiency's sake you'd probably want to avoid quite so many
trips in and out of the memory allocator, so a refinement would be to
start with a roomier buffer and expand by more than one character at
a time if necessary. (My own function for doing this -- everybody
writes one eventually -- begins with 100 characters and adds half the
buffer's current size each time it needs to expand: 100, 150, 225,
...)

The way it was done in K&P Programming Practice, where they started
with a one char sized buffer and doubled the size every time there was
no space left. I`s a good "golden middle" type idea, i think.

doubling it each time actually reduces the worst-case complexity [of the
entire operation] from O(n^2) to O(n). I'm not sure what complexity you
end up with adding less than the full previous size. probably something
like nlogn or n^1.5

Dec 9 '05 #13

Flash Gordon

Michel Rouzic wrote:

Eric Sosman wrote:
Other observations: `int' should probably be `size_t'
as for the size_t thing, well i could cast it for realloc, like this :
mystring=realloc(mystring, (size_t) i+1);

Why on earth would you think that? You *really* need to start working
through a decent text book.
other than that, I think i
should leave it to int, unless it is ok to do iterations and refer to
some element of an array by a size_t, which i doubt

Again, what on earth makes you think that? Of course you can use a
size_t variable for indexing in to an array.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Dec 9 '05 #14

Flash Gordon

Michel Rouzic wrote:

Keith Thompson wrote:

<snip>

The speed of that code while reading input probably isn't going to be
much of an issue, but the multiple calls to realloc() might cause
excessive heap fragmentation, which could cause problems elsewhere in
your program. (The standard does use the term "heap", but you get the
idea.)

does it mean that the elements of my array won't be contigous?

No, it means the free space in your heap will be fragmented. Any memory
block returned by *alloc is always contiguous.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Dec 9 '05 #15

Flash Gordon

Michel Rouzic wrote:

<snip>

"If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the
checks triple the size of thy code..."

ok, I guess I should do that but.... well... as it says, it makes the
code much longer and makes it harder to read and, well, I wouldn't know
what to do with an error anyways, I mean, if the return code ain't
right, i might display something like "the return code ain't right\n"
and don't know what I should do.
If nothing else you can terminate the program with an error message
saying that it has failed.
realloc() failures? never heard of that (maybe cuz im quite new to all
that), what can happen with it?
There might not be a large enough block of free memory because you have
fragmented it. Or you might just have run out of memory, or hit a limit
enforced by the OS. No resource is *ever* infinite.
how can you get an EOF or some error
from stdin?
On most systems there is a way for the user to signal EOF on stdin. Or a
file might be being piped in to stdin.
you know, I care to know about why I should check for errors, and about
what can cause them and what I should do about it, but so far (and you
can understand that) I like to keep my code as simple as possible,
mostly that I consider that my program should work only as far as it is
used correctly (like, if a program is supposed to have a .wav file in
input, and the user put a .mp3 or anything else instead, I don't wanna
bother with doing stuff that will tell him "you need to input a .wav
file", i rather let him have a segmentation fault)

It might not cause a segmentation violation. It might overwrite critical
data instead.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Dec 9 '05 #16

Inso Haggath

Jordan Abel wrote:

On 2005-12-10, grayhag <ue***@yahoo.com> wrote:
Eric Sosman wrote:
For efficiency's sake you'd probably want to avoid quite so many
trips in and out of the memory allocator, so a refinement would be to
start with a roomier buffer and expand by more than one character at
a time if necessary. (My own function for doing this -- everybody
writes one eventually -- begins with 100 characters and adds half the
buffer's current size each time it needs to expand: 100, 150, 225,
...)

The way it was done in K&P Programming Practice, where they started
with a one char sized buffer and doubled the size every time there was
no space left. I`s a good "golden middle" type idea, i think.

doubling it each time actually reduces the worst-case complexity [of the
entire operation] from O(n^2) to O(n). I'm not sure what complexity you
end up with adding less than the full previous size. probably something
like nlogn or n^1.5

Maybe adding a diminishing multiplier for additive part.
It`ll get lower and lower by a certain percentile
each new allocation.

--
Make it as simple as possible, but not simpler
Albert Einstein

Dec 9 '05 #17

Eric Sosman

Jordan Abel wrote:

On 2005-12-10, grayhag <ue***@yahoo.com> wrote:
Eric Sosman wrote:

For efficiency's sake you'd probably want to avoid quite so many
trips in and out of the memory allocator, so a refinement would be to
start with a roomier buffer and expand by more than one character at
a time if necessary. (My own function for doing this -- everybody
writes one eventually -- begins with 100 characters and adds half the
buffer's current size each time it needs to expand: 100, 150, 225,
...)

The way it was done in K&P Programming Practice, where they started
with a one char sized buffer and doubled the size every time there was
no space left. I`s a good "golden middle" type idea, i think.

doubling it each time actually reduces the worst-case complexity [of the
entire operation] from O(n^2) to O(n). I'm not sure what complexity you
end up with adding less than the full previous size. probably something
like nlogn or n^1.5

Still O(n), just with a different multiplier. Assume you
start with a buffer of B characters and grow it by a fraction
r > 1 whenever it fills up. Then (ignoring the rounding off
to integer sizes), you get successive buffers of size B, B*r,
B*r^2, ... until after k expansions you eventually get to
B*r^k >= n.

You've copied (potentially) the contents of all the
smaller buffers, hence you may have copied as many as

B + B*r + B*r^2 + ... + B*r^(k-1)
= B * (r^k - 1) / (r - 1)

characters. Noting that k is log_base_r(n/B) + x, 0 <= x < 1,
the total characters copied come to:

B * (r^(log_base_r(n/B) + x) - 1) / (r - 1)
= B * (n/B * r^x - 1) / (r - 1)
= (n * r^x - B) / (r - 1)
= O(n)

"Next time, we're gonna do ... FRACTIONS!" -- Tom Lehrer

--
Eric Sosman
es*****@acm-dot-org.invalid

Dec 9 '05 #18

Michel Rouzic

Flash Gordon wrote:

Michel Rouzic wrote:
Keith Thompson wrote:

<snip>
The speed of that code while reading input probably isn't going to be
much of an issue, but the multiple calls to realloc() might cause
excessive heap fragmentation, which could cause problems elsewhere in
your program. (The standard does use the term "heap", but you get the
idea.)

does it mean that the elements of my array won't be contigous?

No, it means the free space in your heap will be fragmented. Any memory
block returned by *alloc is always contiguous.

oh, thats what I thought. But, what are the consequences, I mean, I'll
have some memory occupied, some free space, and then my string, so, as
for the free space, does it mean it could only be used for something
smll enough to fit it, or otherwise it will just be wasted space?

Dec 10 '05 #19

Michel Rouzic

Flash Gordon wrote:

Michel Rouzic wrote:

<snip>
"If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the
checks triple the size of thy code..."

ok, I guess I should do that but.... well... as it says, it makes the
code much longer and makes it harder to read and, well, I wouldn't know
what to do with an error anyways, I mean, if the return code ain't
right, i might display something like "the return code ain't right\n"
and don't know what I should do.

If nothing else you can terminate the program with an error message
saying that it has failed.
realloc() failures? never heard of that (maybe cuz im quite new to all
that), what can happen with it?

There might not be a large enough block of free memory because you have
fragmented it. Or you might just have run out of memory, or hit a limit
enforced by the OS. No resource is *ever* infinite.
> how can you get an EOF or some error
from stdin?

On most systems there is a way for the user to signal EOF on stdin. Or a
file might be being piped in to stdin.
you know, I care to know about why I should check for errors, and about
what can cause them and what I should do about it, but so far (and you
can understand that) I like to keep my code as simple as possible,
mostly that I consider that my program should work only as far as it is
used correctly (like, if a program is supposed to have a .wav file in
input, and the user put a .mp3 or anything else instead, I don't wanna
bother with doing stuff that will tell him "you need to input a .wav
file", i rather let him have a segmentation fault)

It might not cause a segmentation violation. It might overwrite critical
data instead.

um... i dont think you know what i'm refering to. The example I took is
the one of my program that reads .wav files to deal with them, without
checking that it actually is a .wav file. basically, it will just look
at a precise place in a file for a 32-bit integer telling how many
bytes are to be read in the file. If you try to input a non .wav file
instead, the 32-bit integer read will be bogus, and is likely to have a
value much higher than the number of bytes left in the file, so the
program will try to read even after the it has read the whole file,
thus causing a segmentation fault.

so basically, as I said, if the user wants to input an mp3 file instead
of a .wav, it's at his own risk. And if you want to input an EOF
character at some point, well, it's at your own risk too, maybe one day
i'll bother with making some stuff to check that kind of foolishness,
but so far i've got more prioritary things to do than this kinda of
stuff (like making sure my program does what it's supposed to do)

Dec 10 '05 #20

Michel Rouzic

Flash Gordon wrote:

Michel Rouzic wrote:
Eric Sosman wrote:
Other observations: `int' should probably be `size_t'

as for the size_t thing, well i could cast it for realloc, like this :
mystring=realloc(mystring, (size_t) i+1);

Why on earth would you think that? You *really* need to start working
through a decent text book.
> other than that, I think i
should leave it to int, unless it is ok to do iterations and refer to
some element of an array by a size_t, which i doubt

Again, what on earth makes you think that? Of course you can use a
size_t variable for indexing in to an array.

ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

Dec 10 '05 #21

pete

Michel Rouzic wrote:

Flash Gordon wrote:
Michel Rouzic wrote:
Eric Sosman wrote:
> Other observations: `int' should probably be `size_t'

as for the size_t thing,
well i could cast it for realloc, like this :
mystring=realloc(mystring, (size_t) i+1);

Why on earth would you think that?
You *really* need to start working
through a decent text book.
other than that, I think i
should leave it to int,
unless it is ok to do iterations and refer to
some element of an array by a size_t, which i doubt

Again, what on earth makes you think that? Of course you can use a
size_t variable for indexing in to an array.

ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

The integer parameter type of realloc is size_t,
so casting an int argument to type size_t, does nothing.

void *realloc(void *ptr, size_t size);

Do you have some resources available to learn about size_t?

--
pete

Dec 10 '05 #22

websnarf

Michel Rouzic wrote:

I know it must sound like a newbie question, but I never really had to
bother with that before,
Its not. Even experienced programmers seem not to know the proper
answer to this question (hint:fgets() is hardly adequate.)
[....] and I didn't even find an answer in the c.l.c FAQ
Not much of a surprise there.
I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.
You have to understand, this is a foreign concept to many if not most
of the readers of this newsgroup. Every string container must have a
size, and "the C way" is to declare that size up front. You can search
the archives of this newsgroup to endless examples of this. The C
library is almost completely useless on this issue as well.
I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

You can read my solution to this problem here:

http://www.pobox.com/~qed/userInput.html

The key point is that the C standard library does not provide
provisions for reading a line of dynamically sized string. 1) gets()
is a deterministic overflow and 2) fgets() is inadequate. So no matter
what, for a really correct and useful solution you have to roll your
own algorithm (but it is doable as the link above demonstrates.)

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Dec 10 '05 #23

Keith Thompson

"Michel Rouzic" <Mi********@yahoo.fr> writes:

Flash Gordon wrote:
Michel Rouzic wrote:
> Eric Sosman wrote:
>> Other observations: `int' should probably be `size_t'
>
> as for the size_t thing, well i could cast it for realloc, like this :
> mystring=realloc(mystring, (size_t) i+1);

Why on earth would you think that? You *really* need to start working
through a decent text book.
> other than that, I think i
> should leave it to int, unless it is ok to do iterations and refer to
> some element of an array by a size_t, which i doubt

Again, what on earth makes you think that? Of course you can use a
size_t variable for indexing in to an array.

ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

Why *should* you use an int?

The second argument of realloc() is of type size_t. You can use an
int if like (it will be implicitly converted if you have a proper
"#include <stdlib.h>"), but there's no good reason to do so.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Dec 10 '05 #24

Maineline News

sl*******@yahoo.com wrote:

Michel Rouzic wrote:
I know it must sound like a newbie question, but I never really
had to bother with that before, and I didn't even find an answer
in the c.l.c FAQ

I'd like to know what's the really proper way for input a string
in an array of char that's dynamically allocated. I mean, I wish
not to see any such things as char mystring[100]; I don't want
to see any number (if possible) I just want to declare something
like char *mystring; and then I don't know how allocate it with
just as many chars (with the space for the \0 of course) as you
get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they
take just the needed space and I don't want to see any number
such as 100 or 10,000 or even 4,294,967,296 in my code. Any way
it can be done?

From stdin is a bit of a problem. The usual answer is to use a
buffer to temporarily store the string:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

.... snip code ...

Or you can get my ggets routine at:
<http://cbfalconer.home.att.net/download/ggets.zip>

--
"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Dec 10 '05 #25

Richard Heathfield

we******@gmail.com said:

Michel Rouzic wrote:
I know it must sound like a newbie question, but I never really had to
bother with that before,
Its not. Even experienced programmers seem not to know the proper
answer to this question (hint:fgets() is hardly adequate.)

In my experience, you're wrong; how to do this is common knowledge amongst
experienced programmers.

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

You have to understand, this is a foreign concept to many if not most
of the readers of this newsgroup. Every string container must have a
size, and "the C way" is to declare that size up front. You can search
the archives of this newsgroup to endless examples of this.

Most people who ask questions here are newbies, which is why they're asking
questions; that's why we tend to give them simple answers. Nevertheless,
the "how do I get an entire line of input" thing has been asked and
satisfactorily answered many times here.
The C
library is almost completely useless on this issue as well.

That's like saying the toolkit you get with a new bicycle is useless. Well,
yes, it's not brilliant - but it's probably enough to get you up and
rolling on Christmas Day. Serious users will want better in due course, and
quite a few solutions to this problem have been presented in this newsgroup
in the past.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Dec 10 '05 #26

Mark McIntyre

On 9 Dec 2005 19:20:43 -0800, in comp.lang.c , we******@gmail.com
wrote:

Michel Rouzic wrote:
I know it must sound like a newbie question, but I never really had to
bother with that before,

Its not. Even experienced programmers seem not to know the proper
answer to this question

Don't be silly.

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100];

You have to understand, this is a foreign concept to many if not most
of the readers of this newsgroup.

Absolute rubbish. Of course, I do realise you're trolling. But you
really are a chump.
----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

Dec 10 '05 #27

slebetman

Michel Rouzic wrote:

Flash Gordon wrote:

Again, what on earth makes you think that? Of course you can use a
size_t variable for indexing in to an array.
ok cool, so why shouldn't I use an int for the size in a realloc,

Apart from the obvious fact that realloc is defined to accept size_t
instead of int, you should consider size_t to be self documenting. That
is, it is immediately obvious what the variable means:

int len;

Oh it's a number of how long something is.

size_t len;

Oh it is how long/large something is in memory.

or why again shouldn't I cast it to size_t?

Because, for very large arrays (I myself haven't seen arrays larger
than 2GB, but it is possible) on 64bit (or 128bit? someday...)
platforms, size_t may or may not be the same size as int. You don't
know, I don't know. But your compiler knows. Using size_t causes your
compiler to treat it properly. On 32bit platforms size_t may end up
being the same as an unsigned int anyway so don't worry about it
generating different code. Leave it to your compiler to decide.

Dec 10 '05 #28

slebetman

Michel Rouzic wrote:

Flash Gordon wrote:
Michel Rouzic wrote:
Keith Thompson wrote:

<snip>
> The speed of that code while reading input probably isn't going to be
> much of an issue, but the multiple calls to realloc() might cause
> excessive heap fragmentation, which could cause problems elsewhere in
> your program. (The standard does use the term "heap", but you get the
> idea.)

does it mean that the elements of my array won't be contigous?

No, it means the free space in your heap will be fragmented. Any memory
block returned by *alloc is always contiguous.

oh, thats what I thought. But, what are the consequences, I mean, I'll
have some memory occupied, some free space, and then my string, so, as
for the free space, does it mean it could only be used for something
smll enough to fit it, or otherwise it will just be wasted space?

The consequences is that sooner or later malloc/realloc will fail
because it can't find a contigous area of memory as large as the one
you requested. Coupled with your refusal to handle realloc failures,
this will result in a program crash.

Dec 10 '05 #29

Flash Gordon

Michel Rouzic wrote:

Flash Gordon wrote:
Michel Rouzic wrote:
<snip>
It might not cause a segmentation violation. It might overwrite critical
data instead.
um... i dont think you know what i'm refering to.

I do and you are WRONG.
The example I took is
the one of my program that reads .wav files to deal with them, without
checking that it actually is a .wav file. basically, it will just look
at a precise place in a file for a 32-bit integer telling how many
bytes are to be read in the file. If you try to input a non .wav file
instead, the 32-bit integer read will be bogus, and is likely to have a
value much higher than the number of bytes left in the file, so the
program will try to read even after the it has read the whole file,
thus causing a segmentation fault.
There is absolutely NO guarantee that is will cause a segmentation
fault. For a start that is a term that is not defined in the C standard,
secondly it is a term not applicable to all systems, thirdly almost any
action that could cause a segmentation fault could *also* overwrite some
critical data used by your application, such the FILE structures,
possibly causeing corruption of files on disk.
so basically, as I said, if the user wants to input an mp3 file instead
of a .wav, it's at his own risk. And if you want to input an EOF
character at some point, well, it's at your own risk too, maybe one day
i'll bother with making some stuff to check that kind of foolishness,
but so far i've got more prioritary things to do than this kinda of
stuff (like making sure my program does what it's supposed to do)

One of the *first* things to worry about is making sure that your input
data is correct. As well as the reasons I've also mentioned, i.e. risk
of doing nasty things to your system, which are REAL risks, although the
most likely problem is corrupting either output OR input file (yes, the
input file CAN be corrupted). There is also the risk that the format
gets extended and a wav file contains things you don't handle properly,
causing your program to corrupt things despite being given a real wav file.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Dec 10 '05 #30

slebetman

Michel Rouzic wrote:

Flash Gordon wrote:
It might not cause a segmentation violation. It might overwrite critical
data instead.

<snip>

so basically, as I said, if the user wants to input an mp3 file instead
of a .wav, it's at his own risk. And if you want to input an EOF
character at some point, well, it's at your own risk too, maybe one day
i'll bother with making some stuff to check that kind of foolishness,
but so far i've got more prioritary things to do than this kinda of
stuff (like making sure my program does what it's supposed to do)

The program crashing due to memory allocation error is the programmer's
foolishness for not checking the return value of realloc. No need to
punish the user for your own faults.

Checking for errors is not foolishness. It is the responsibility of the
programmer, more so in fact than the other 'priority' things like
adding features. This is because unchecked errors will cause those
wonderful features you've developed to fail at the most unexpected
times - like when demoing your app to your client.

When you program in C error handling, memory allocation etc. is the
responsibility of the programmer. This is because C is really nothing
more than 'high level assembly'. If you find this uncomfortable, and if
you insist on not checking errors, then don't write in C. Languages
like Tcl or Perl is more suitable. All the low level errors are already
handled by the people who wrote the interpreters in C so you don't have
to. Errors in scripting languages don't have the serious consequences
like in C.

Dec 10 '05 #31

Christian Bau

In article <e4************@news.flash-gordon.me.uk>,
Flash Gordon <sp**@flash-gordon.me.uk> wrote:

One of the *first* things to worry about is making sure that your input
data is correct. As well as the reasons I've also mentioned, i.e. risk
of doing nasty things to your system, which are REAL risks, although the
most likely problem is corrupting either output OR input file (yes, the
input file CAN be corrupted). There is also the risk that the format
gets extended and a wav file contains things you don't handle properly,
causing your program to corrupt things despite being given a real wav file.

If the application that is programmed in such a careless way is
important and widespread, then some attacker will figure out how to
construct a file that will not only crash the computer, but will make it
do exactly what the attacker wants it to do.

Dec 10 '05 #32

slebetman

Christian Bau wrote:

In article <e4************@news.flash-gordon.me.uk>,
Flash Gordon <sp**@flash-gordon.me.uk> wrote:
One of the *first* things to worry about is making sure that your input
data is correct. As well as the reasons I've also mentioned, i.e. risk
of doing nasty things to your system, which are REAL risks, although the
most likely problem is corrupting either output OR input file (yes, the
input file CAN be corrupted). There is also the risk that the format
gets extended and a wav file contains things you don't handle properly,
causing your program to corrupt things despite being given a real wav file.

If the application that is programmed in such a careless way is
important and widespread, then some attacker will figure out how to
construct a file that will not only crash the computer, but will make it
do exactly what the attacker wants it to do.

Yes! The infamous "buffer overflow".

Dec 10 '05 #33

Malcolm

"Michel Rouzic" <Mi********@yahoo.fr> wrote

ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

size_t is an uglification that will run through all your code, wrecking its
readability and elegance, as every memory size, and hence every array index,
and hence every count, has to be a size_t.

There are many subtle problems with the use of unsigned integers. Java
eliminated them, for very good reasons.
The problem with using integers, on the other hand, is largely theoretical.
The maximum memory size allowed by a compiler may exceed the size of an
integer.
It is perfectly plausible that a company may have more than 32767 employees.
It is also perfectly plausible that a C program may have to run on a machine
where int is 16 bits. It is not plausible that you will want to run the
payroll for a company with more that 30,000 employees on a machine with
16-bit integers. Hence we can happily use an int to hold the count of
employees, or a long if really paranoid.

Dec 10 '05 #34

Ben Pfaff

"Malcolm" <re*******@btinternet.com> writes:

"Michel Rouzic" <Mi********@yahoo.fr> wrote
ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

size_t is an uglification that will run through all your code, wrecking its
readability and elegance, as every memory size, and hence every array index,
and hence every count, has to be a size_t.

I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.
--
"Given that computing power increases exponentially with time,
algorithms with exponential or better O-notations
are actually linear with a large constant."
--Mike Lee

Dec 11 '05 #35

Malcolm

"Ben Pfaff" <bl*@cs.stanford.edu> wrote

size_t is an uglification that will run through all your code, wrecking
its
readability and elegance, as every memory size, and hence every array
index,
and hence every count, has to be a size_t.

I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.

I'm going to do a thread on size_t sometime soon.

You have illustrated the problem however. Once you allow size_t, almost
every integer becomes a size_t, because most integers count something.

Dec 11 '05 #36

pete

Malcolm wrote:

"Ben Pfaff" <bl*@cs.stanford.edu> wrote
size_t is an uglification that will run through all your code, wrecking
its
readability and elegance, as every memory size, and hence every array
index,
and hence every count, has to be a size_t.

I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.

I'm going to do a thread on size_t sometime soon.

You have illustrated the problem however.
Once you allow size_t, almost
every integer becomes a size_t,
because most integers count something.

Type int is good for return error codes or status codes.
Functions that do comparing, return type int.
A lot of stdio functions, return type int.

The number of nodes in a list, isn't tied to size_t.
I use long unsigned for counting those.

--
pete

Dec 11 '05 #37

Flash Gordon

Malcolm wrote:

"Ben Pfaff" <bl*@cs.stanford.edu> wrote
size_t is an uglification that will run through all your code, wrecking
its
readability and elegance, as every memory size, and hence every array
index,
and hence every count, has to be a size_t.

I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.

I'm going to do a thread on size_t sometime soon.

You have illustrated the problem however. Once you allow size_t, almost
every integer becomes a size_t, because most integers count something.

What is the problem with that? In any case, a lot of integers in code I
write are not counting the size of C objects (they might be scaled costs
which can even be negative, for example).
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Dec 11 '05 #38

Joe Wright

Ben Pfaff wrote:

"Malcolm" <re*******@btinternet.com> writes:

"Michel Rouzic" <Mi********@yahoo.fr> wrote
ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

size_t is an uglification that will run through all your code, wrecking its
readability and elegance, as every memory size, and hence every array index,
and hence every count, has to be a size_t.

I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.

I have seldom defined a variable of type size_t. On DJGPP..

typedef long unsigned int size_t;

...is its declaration. In limits.h I find..

#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

...and so see no compelling reason to type anything size_t rather than int.

It is interesting to have functions prototyped with size_t parameters to
indicate positive values. Otherwise, int works perfectly well for me.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Dec 11 '05 #39

Malcolm

"Joe Wright" <jw*****@comcast.net> wrote

#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

..and so see no compelling reason to type anything size_t rather than int.

It is interesting to have functions prototyped with size_t parameters to
indicate positive values. Otherwise, int works perfectly well for me.

Take this function

/*
trivial function that counts number of occurrences of ch in str
*/
mystrcount(const char *str, int ch)

Now basically this function is alwaysgoing to return small integers.
However, technically, someone could pass it a massive string, all set to one
character. Then an int would overflow, if size_t were bigger than an int.

Thus the function must return a size_t.

That means that the higher-level logic which calls it must also be written
with size_t, and the ugliness propagates

Dec 11 '05 #40

Joe Wright

Malcolm wrote:

"Joe Wright" <jw*****@comcast.net> wrote
#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

..and so see no compelling reason to type anything size_t rather than int.

It is interesting to have functions prototyped with size_t parameters to
indicate positive values. Otherwise, int works perfectly well for me.

Take this function

/*
trivial function that counts number of occurrences of ch in str
*/
mystrcount(const char *str, int ch)

Now basically this function is alwaysgoing to return small integers.
However, technically, someone could pass it a massive string, all set to one
character. Then an int would overflow, if size_t were bigger than an int.

Size is not interesting. The maximum value of size_t and int are the
same (on my 32-bit Intel machine). There is no case (that I know of) for
an object representation to be larger than 2^31 bytes because the high
order 2 Gigs is reserved for the OS.
Thus the function must return a size_t.
If the value of size_t cannot exceed INT_MAX, what's the point?
That means that the higher-level logic which calls it must also be written
with size_t, and the ugliness propagates

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Dec 11 '05 #41

Ben Pfaff

Joe Wright <jw*****@comcast.net> writes:

Ben Pfaff wrote:
"Malcolm" <re*******@btinternet.com> writes:
"Michel Rouzic" <Mi********@yahoo.fr> wrote

ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?
size_t is an uglification that will run through all your code,
wrecking its readability and elegance, as every memory size, and
hence every array index, and hence every count, has to be a size_t.

I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.

I have seldom defined a variable of type size_t. On DJGPP..

typedef long unsigned int size_t;

..is its declaration. In limits.h I find..

#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

..and so see no compelling reason to type anything size_t rather than int.

If you're only interested in your implementation, then that's a
reasonable point of view.
--
"If I've told you once, I've told you LLONG_MAX times not to
exaggerate."
--Jack Klein

Dec 11 '05 #42

Jordan Abel

On 2005-12-11, Ben Pfaff <bl*@cs.stanford.edu> wrote:

Joe Wright <jw*****@comcast.net> writes:
Ben Pfaff wrote:
"Malcolm" <re*******@btinternet.com> writes:

"Michel Rouzic" <Mi********@yahoo.fr> wrote

>ok cool, so why shouldn't I use an int for the size in a realloc, or
>why again shouldn't I cast it to size_t?
>

size_t is an uglification that will run through all your code,
wrecking its readability and elegance, as every memory size, and
hence every array index, and hence every count, has to be a size_t.
I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.

I have seldom defined a variable of type size_t. On DJGPP..

typedef long unsigned int size_t;

..is its declaration. In limits.h I find..

#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

..and so see no compelling reason to type anything size_t rather than int.

If you're only interested in your implementation, then that's a
reasonable point of view.

plus, that's SSIZE_MAX, which applies to the posix type ssize_t, not the
C type size_t. SIZE_MAX is probably twice that, being unsigned.

Dec 11 '05 #43

Keith Thompson

"Malcolm" <re*******@btinternet.com> writes:

"Joe Wright" <jw*****@comcast.net> wrote
#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

..and so see no compelling reason to type anything size_t rather than int.

It is interesting to have functions prototyped with size_t parameters to
indicate positive values. Otherwise, int works perfectly well for me.

Take this function

/*
trivial function that counts number of occurrences of ch in str
*/
mystrcount(const char *str, int ch)

Now basically this function is alwaysgoing to return small integers.
However, technically, someone could pass it a massive string, all set to one
character. Then an int would overflow, if size_t were bigger than an int.

Thus the function must return a size_t.

That means that the higher-level logic which calls it must also be written
with size_t, and the ugliness propagates

I fail to see what's ugly about it. What's wrong with using size_t to
represent sizes? That's what it's for.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Dec 11 '05 #44

Keith Thompson

Joe Wright <jw*****@comcast.net> writes:

Malcolm wrote:
"Joe Wright" <jw*****@comcast.net> wrote
#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

..and so see no compelling reason to type anything size_t rather than int.

It is interesting to have functions prototyped with size_t
parameters to indicate positive values. Otherwise, int works
perfectly well for me.

Take this function
/*
trivial function that counts number of occurrences of ch in str
*/
mystrcount(const char *str, int ch)
Now basically this function is alwaysgoing to return small
integers. However, technically, someone could pass it a massive
string, all set to one character. Then an int would overflow, if
size_t were bigger than an int.

Size is not interesting. The maximum value of size_t and int are the
same (on my 32-bit Intel machine). There is no case (that I know of)
for an object representation to be larger than 2^31 bytes because the
high order 2 Gigs is reserved for the OS.

Yes, *on your 32-bit Intel machine*. That's fine if you don't care
about portability. (Personally, I like my code to be reasonably
portable even if it's only going to run on one platform.)

Thus the function must return a size_t.

If the value of size_t cannot exceed INT_MAX, what's the point?

Clarity and portability.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Dec 11 '05 #45

Jordan Abel

> Joe Wright <jw*****@comcast.net> writes:

"Joe Wright" <jw*****@comcast.net> wrote

#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

Size is not interesting. The maximum value of size_t and int are the
same (on my 32-bit Intel machine).

That is not true. the macro you are looking at is not the one that
defines the maximum value for size_t.

Note teh extra "S". That is talking about the posix typedef "ssize_t".

Dec 11 '05 #46

Michel Rouzic

Malcolm wrote:

"Michel Rouzic" <Mi********@yahoo.fr> wrote
ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?
size_t is an uglification that will run through all your code, wrecking its
readability and elegance, as every memory size, and hence every array index,
and hence every count, has to be a size_t.

There are many subtle problems with the use of unsigned integers.

Yeah I know that. Once i performed a division by an unsigned int minus
another unsigned int, the problem was that the first one was always
smaller than the second, so my result was about one billion times as
small as it was expected to be. I tend to avoid using unsigned integers
since then, most of the time i'm not even ever getting close to 2
billions in my integer values..
Java
eliminated them, for very good reasons.
The problem with using integers, on the other hand, is largely theoretical.
The maximum memory size allowed by a compiler may exceed the size of an
integer.
It is perfectly plausible that a company may have more than 32767 employees.
It is also perfectly plausible that a C program may have to run on a machine
where int is 16 bits. It is not plausible that you will want to run the
payroll for a company with more that 30,000 employees on a machine with
16-bit integers. Hence we can happily use an int to hold the count of
employees, or a long if really paranoid.

I tend to write int32_t instead of int, and include stdint.h, so I know
I make sure that on every platform i'll be using 32 bit integers. some
people reported that it couldnt compile with their compiler tho, I
guess it's some C99 stuff.

Dec 18 '05 #47

websnarf

Michel Rouzic wrote:

Malcolm wrote:
Java
eliminated them, for very good reasons.
The problem with using integers, on the other hand, is largely theoretical.
The maximum memory size allowed by a compiler may exceed the size of an
integer.
It is perfectly plausible that a company may have more than 32767 employees.
It is also perfectly plausible that a C program may have to run on a machine
where int is 16 bits. It is not plausible that you will want to run the
payroll for a company with more that 30,000 employees on a machine with
16-bit integers. Hence we can happily use an int to hold the count of
employees, or a long if really paranoid.

I tend to write int32_t instead of int, and include stdint.h, so I know
I make sure that on every platform i'll be using 32 bit integers. some
people reported that it couldnt compile with their compiler tho, I
guess it's some C99 stuff.

Yeah, there's no reason to do that. I've written a stdint.h
alternative called "pstdint.h" which you can obtain here:
http://www.pobox.com/~qed/pstdint.h . Its a plug-in replacement for
stdint.h (its just missing a few things like WINT_MAX which nobody
should care about anyways) that has a much higher chance of being
portable to most people's platforms.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Dec 19 '05 #48

Stan Milam

Michel Rouzic wrote:

Malcolm wrote:
"Michel Rouzic" <Mi********@yahoo.fr> wrote
ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

size_t is an uglification that will run through all your code, wrecking its
readability and elegance, as every memory size, and hence every array index,
and hence every count, has to be a size_t.

There are many subtle problems with the use of unsigned integers.

Yeah I know that. Once i performed a division by an unsigned int minus
another unsigned int, the problem was that the first one was always
smaller than the second, so my result was about one billion times as
small as it was expected to be. I tend to avoid using unsigned integers
since then, most of the time i'm not even ever getting close to 2
billions in my integer values..

Java
eliminated them, for very good reasons.
Java is inferior
The problem with using integers, on the other hand, is largely theoretical.
The maximum memory size allowed by a compiler may exceed the size of an
integer.
It is perfectly plausible that a company may have more than 32767 employees.
It is also perfectly plausible that a C program may have to run on a machine
where int is 16 bits. It is not plausible that you will want to run the
payroll for a company with more that 30,000 employees on a machine with
16-bit integers. Hence we can happily use an int to hold the count of
employees, or a long if really paranoid.

I tend to write int32_t instead of int, and include stdint.h, so I know
I make sure that on every platform i'll be using 32 bit integers. some
people reported that it couldnt compile with their compiler tho, I
guess it's some C99 stuff.

Mar 25 '06 #49

Proper way to input a dynamically-allocated string

Similar topics