Buffer or Realloc?

bwaichu

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

I don't think the code below is optimal since the size of the "buffer"
becomes extremely small when I realloc it. I also spend time
re-allocating
it. And taking this to the next step, if I wrote to a linked list or a
red black
tree, I would have to allocate memory. In that case, how do I
determine
the best size?

Note: The code below contains some posix items. Please ignore those
items. I have also left out some error coding below as well.
The
below was written to just learn the regex API.

int
main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
regmatch_t pmatch[1];
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));
if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
break;
}
len = pmatch[0].rm_eo-pmatch[0].rm_so;
/* is this optimal or should I use a fixed sized buffer? */
match = realloc(match, len+1);
strlcpy(match, &string[offset],
len+1);
(void)printf("matched string: %s\n", match);
offset += pmatch[0].rm_eo;

}
(void)printf("original string: %s\n", string);
free(match);
regfree(&preg);
exit(0);
}

Aug 27 '06 #1

Subscribe Post Reply

3831

Bill Pursell

bw*****@yahoo.com wrote:

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

I generally try to avoid fixed buffers larger than BUFSIZ, my
thinking being that stack limitations will generally be more
restrictive than anything else. (That's not a language issue,
however.) For me, it's more of an aesthetic thing: if 100s
of variable declarations in the same scope are bad, then
declaring big arrays is also bad.

One thing that is odd in your code is to downsize with
realloc. Generally, programs are greedy: once they've
acquired the memory, it's easiest to hold onto it. Malloc
a buffer that you think should handle the general case
and only realloc if it turns out to be too small. Realloc
by doubling the size and then hold onto that memory
forever. If that's feasible, of course. If you just consumed
98% of the machines memory to do an operation that
is a 1 in a 10^276 case in a long-running program, it
makes sense to release the memory. It depends on
the situation.

--
Bill Pursell

Aug 28 '06 #2

Michael Mair

bw*****@yahoo.com schrieb:

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

If it is conceivable that your "large enough" buffer can be
too small and if it is detrimental for your programme because
you caught this case but cannot handle whatever ought to be
in the buffer because of its lack of size, then consider
allocating and reallocating memory.

There are systems where you only have freestanding implementations,
i.e. lack <stdlib.h>. There, you obviously have to make do with
fixed size buffers.

I don't think the code below is optimal since the size of the "buffer"
becomes extremely small when I realloc it. I also spend time
re-allocating
it. And taking this to the next step, if I wrote to a linked list or a
red black
tree, I would have to allocate memory. In that case, how do I
determine
the best size?

Use a "reasonable" start size and choose a "reasonable"
minimum size.
"Reasonable" depends on your problem domain and the resources
you have at your beck and call.
Ideally, the allocated buffer is never resized at all but does
not claim a substantial amount of your memory.
Do not reallocate all the time; if you run into malloc()
failure somewhere else in your programme, then you could try
as fallback to resize allocated buffers to minimum and malloc()
again.

One thing to keep in mind: Try out your realloc() fallback for
very small initial buffer sizes and your error handling strategies
for very large initial buffer sizes (e.g. try to
malloc((size_t)-1)). Without test, the code is not worth anything
but as a show of goodwill...

Note: The code below contains some posix items. Please ignore those
items. I have also left out some error coding below as well.
The
below was written to just learn the regex API.

You could have left them out; providing compiling C code is your
best bet for getting good answers.

You forgot to
#include <stdlib.h>
#include <stdio.h>

int
main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
regmatch_t pmatch[1];
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));

As you do not know the value of BUFSIZ, you might allocate
either 1 or one million bytes.
Go with a fixed initial value.
sizeof (char) always equals 1. If you want to base the allocation
on the type of match, then use
match = calloc(YOUR_INIT_BUFFER_SIZE, sizeof *match);

You also forgot to check whether calloc() succeeded. This is
outright stupid. No, I do not want to imply that you are stupid.
But it does not cost you anything to "prove" that you may go
on from here and failure to prove that may lead to a segmentation
fault at best and ungracious programme failure if an important
customer is standing right beside you together with your boss
on the other end of the scale.

>
if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
break;
}
len = pmatch[0].rm_eo-pmatch[0].rm_so;
/* is this optimal or should I use a fixed sized buffer? */
match = realloc(match, len+1);

Once again, you discount the possibility of allocation failure.
The right way is
char *tmp;
....
tmp = realloc(match, len+1);
if (tmp == 0) {
/* Your error handling here; match is still available */
}
match = tmp;

You believe that there is no bug in the regex functions.
This is very trustful and certainly honours the writers of that
library -- however, if len is assigned a negative value
(which becomes a large positive value), then you may run
out of memory or inadvertently free(match) if len+1 == 0.

This is the part I meant above: Write a fallback and test it.
This is no paranoia but
- good practice for you; recovering gracefully from a malloc()
failure is hard
- may help you catch hard-to-catch bugs
- makes other people trust _your_ code -- if I evaluate code
and find it to be of the "malloc() always succeeds variety",
I am less than impressed.

strlcpy(match, &string[offset],
len+1);
(void)printf("matched string: %s\n", match);
offset += pmatch[0].rm_eo;

}
(void)printf("original string: %s\n", string);
free(match);
regfree(&preg);
exit(0);
}

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Aug 28 '06 #3

bwaichu

Thanks for all the great suggestions. I still need to think through
how I want to
handle the buffering. I can do bound checking and use an initial
buffer of 256 or 512. If the len of the string is larger, than I can
just double up or higher. I can do the check before the reallocation.
If the string is not larger than, I can skip the reallocation.

<OT>
My final re-write will just be moving this into my http server as a
function or two. I want to parse and search http requests and send
responses. I am buffering the sockets with a fixed buffer of 8196. I
want to dump the finds into a structure that will either be part of a
linked list or a red black tree. I have started to use threads, so
that might change as well. The biggest challenge will be buffering and
writing regex against URIs.
</OT>

Here's the code cleaned up with some error checking (buffering will be
redone over the next day or two):

#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>

int
main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
char *new_match;
regmatch_t pmatch;
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));
if (match == NULL)
errx(-1, "calloc() could not allocate space!\n");

if (regcomp(&preg, "hello[41]", REG_BASIC) != 0)
puts("regcomp() failed to compile\n");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, &pmatch,0);
if (test !=0) {
break;
}
len = pmatch.rm_eo-pmatch.rm_so;
if ( (new_match = realloc(match, len+1)) == NULL) {
break;
}
match = new_match;
strlcpy(match, &string[offset],
len+1);
(void)printf("matched string: %s\n", match);
offset += pmatch.rm_eo;
}
(void)printf("original string: %s\n", string);
free(match);
match = NULL;
regfree(&preg);
exit(EXIT_SUCCESS);
}

Aug 28 '06 #4

Eric Sosman

bw*****@yahoo.com wrote On 08/27/06 18:37,:

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

There's no hard-and-fast rule, just as there's no
hard-and-fast rule for choosing between arrays and trees.
Each has its virtues and its drawbacks.

Some considerations for allocating "buffers," where
the somewhat vague term is understood to mean "places to
store varying amounts of stuff:"

- If you know an upper bound on the size of the stuff
(or if you're willing to report failure if the stuff
is larger), a fixed-size buffer is attractive because
of its simplicity.

- If the upper bound is large, it may be a good idea to
avoid declaring the buffer as an `auto' variable.
Many systems have less `auto' memory than dynamic or
static memory.

- A statically-allocated buffer may well be the ultimate
in simplicity, but it has some disadvantages. First,
it exists and occupies memory even when it's not being
used; for a large buffer this may be too wasteful.
Second (like any static object), it creates difficulties
if you want to write functions that are re-entrant or
can be called recursively.

- How many instances of "stuff" must the program keep track
of simultaneously? If there will be many of them, it is
often a good idea to put each in a dynamic allocation of
the minimum size, possibly by using realloc() on a buffer
obtained from malloc(), or possibly by copying from the
initial buffer to a new buffer specially allocated for
each instance.

- If an instance of "stuff" can be very large, consider
breaking it up into smaller pieces and storing those
pieces in another data structure. It may be easier to
manage a linked list of a thousand 1-megabyte chunks
than to find storage for a gigabyte array.

- If you decide to malloc() a buffer and then expand it as
needed with realloc(), it is usually better to expand by
a "magnification factor" than by a fixed amount. That is,
it is usually better to increase the buffer size by 50%
or 100% than to add 100 or 1000 bytes at every expansion.

- A popular technique used by many malloc() implementations
(but not by all) is to add about 8 bytes of bookkeeping
data to each allocation, and to round the total up to a
multiple of 8. (Or perhaps the magic number is 4 bytes,
or maybe 16 -- it depends.) Consequence: If you make a
large number of small allocations, a sizeable fraction of
the total memory will be expended in "overhead" rather
than in "payload." Corollary: If you've got a 200-byte
buffer that holds only 199 bytes of "stuff," shrinking
it with realloc() probably won't do much.

- Some malloc() implementations add their 8 bytes (or so)
of overhead but then round up to the next power of two.
If you request 1024 bytes, you'll wind up reserving 2048
for an "efficiency" of only 50%. To guard against this
effect, avoid power-of-two requests; if it makes sense
in terms of the rest of the program, aim for allocations
that are (nominally) a little less than a power of two
in size.

--
Er*********@sun.com

Aug 28 '06 #5

websnarf

bw*****@yahoo.com wrote:

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

Fixed size arrays are good for things that whose contents *CANNOT*
exceed the size of that array. A string usually is never that kind of
thing.

So usually the decision is quite simple: if you are dealing with
mutable strings *NEVER* use a fixed size array (this is a definitive
mark of an amateur and is usually accompanied by either buffer overflow
errors or ruthless truncation), but instead use the reallocing buffer
strategy. There is a common tendency in C programmers to conflate char
arrays and strings -- the two are different things conceptually and
this has implications for what you are actually doing.

If you just need a block buffer or something like that, or you are
transforming your algorithm to run through chunks of a stream at a
time, then obviously you can just use a fixed sized buffer.

I don't think the code below is optimal since the size of the "buffer"
becomes extremely small when I realloc it. I also spend time
re-allocating
it. And taking this to the next step, if I wrote to a linked list or a
red black
tree, I would have to allocate memory. In that case, how do I
determine
the best size?

Note: The code below contains some posix items. Please ignore those
items. I have also left out some error coding below as well.
The
below was written to just learn the regex API.

int
main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
regmatch_t pmatch[1];
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));

if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
break;
}
len = pmatch[0].rm_eo-pmatch[0].rm_so;
/* is this optimal or should I use a fixed sized buffer? */
match = realloc(match, len+1);

This is wrong is what it is. realloc may return with NULL. That means
that the line above will leak the original contents of match, any time
realloc happens to fail. This call is also unnecessary if it decreases
the size of the allocation in match.

strlcpy(match, &string[offset],
len+1);

Obviously this will fail, any time match is NULL. Also you should use
memcpy(), since its generally faster.

(void)printf("matched string: %s\n", match);
offset += pmatch[0].rm_eo;

}
(void)printf("original string: %s\n", string);
free(match);
regfree(&preg);
exit(0);
}

malloc, calloc and realloc are fairly slow function calls. So one
thing you should generally do is try to minimize the number of times
you call such functions. When reusing a buffer like you are doing
above, you only want to realloc more space if you need it.

There is a simple numeric size strategy for keeping the number of
reallocations down to a minimum while wasting a constant factor in
space (less than 100% extra) that you can use that is usually an
optimal trade off between speed and space: always realloc to the least
power of 2 greater than the size that you need,and never decrease the
size of a buffer with realloc. On a 32 bit machine this will means
that you never call realloc on the same buffer more than 32 times, and
you will never use more than an extra 100% over largest size that you
would have ever needed.

So I would recode your snippet something like as follows:

size_t mlen = 0;

match = NULL;

if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
break;
}
len = pmatch[0].rm_eo-pmatch[0].rm_so;
if (len mlen) {
char *tmpmatch;
while (mlen <= len) mlen = (mlen>0) ? 2*mlen : 1;
tmpmatch = realloc (match, mlen);
if (NULL == tmpmatch) break; /* Out of memory? */
match = tmpmatch;
}
memcpy (match, &string[offset], len);
match[len] = '\0';
[etc ...]

Once you get used to this idea, you will find that it gets kind of
repetitve and tiresome to implement this same idea over and over again.
You will probably want to throw the concept into a library somewhere
and wonder why someone hasn't done this before you.

Well someone has: http://bstring.sf.net/

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 28 '06 #6

Flash Gordon

we******@gmail.com wrote:

bw*****@yahoo.com wrote:
>Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

Fixed size arrays are good for things that whose contents *CANNOT*
exceed the size of that array. A string usually is never that kind of
thing.

So usually the decision is quite simple: if you are dealing with
mutable strings *NEVER* use a fixed size array (this is a definitive
mark of an amateur and is usually accompanied by either buffer overflow
errors or ruthless truncation),

OK, so for the string I've got to prepare as part of a message to the UK
Government gateway where the specification says the string has a maximum
length of 10 characters I should not use a fixed size buffer but a
reallocating buffer? Presumable so that I don't immediately allocate the
maximum size I should start at 1 byte and increase the size 1 byte at a
time so avoid wasting memory?

You should not say to *NEVER* do something as invariably someone can
come up with a *real* example where that rule does not apply, such as
the one I just mentioned. I've got plenty of other interfaces where I am
either preparing a string (and yes, I do want a C string for simplicity
of manipulation such as using sprintf to create it) which has either a
*fixed* final length or a small but definite upper bound.

but instead use the reallocing buffer
strategy. There is a common tendency in C programmers to conflate char
arrays and strings -- the two are different things conceptually and
this has implications for what you are actually doing.

True. Life would be much easier if C had a true string type that
automatically resized, but it doesn't, and when you know a reasonable
and definite upper bound on size a fixed length array is perfectly
adequate to the job.

If you just need a block buffer or something like that, or you are
transforming your algorithm to run through chunks of a stream at a
time, then obviously you can just use a fixed sized buffer.

True.

<snip>

malloc, calloc and realloc are fairly slow function calls. So one
thing you should generally do is try to minimize the number of times
you call such functions. When reusing a buffer like you are doing
above, you only want to realloc more space if you need it.

In addition you can hit memory fragmentation issues.
--
Flash Gordon
See, I don't always disagree with everything you say.

Aug 28 '06 #7

Michael Mair

Flash Gordon schrieb:

we******@gmail.com wrote:
>bw*****@yahoo.com wrote:

>>Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

Fixed size arrays are good for things that whose contents *CANNOT*
exceed the size of that array. A string usually is never that kind of
thing.

So usually the decision is quite simple: if you are dealing with
mutable strings *NEVER* use a fixed size array (this is a definitive
mark of an amateur and is usually accompanied by either buffer overflow
errors or ruthless truncation),

OK, so for the string I've got to prepare as part of a message to the UK
Government gateway where the specification says the string has a maximum
length of 10 characters I should not use a fixed size buffer but a
reallocating buffer? Presumable so that I don't immediately allocate the
maximum size I should start at 1 byte and increase the size 1 byte at a
time so avoid wasting memory?

You should not say to *NEVER* do something as invariably someone can
come up with a *real* example where that rule does not apply, such as
the one I just mentioned. I've got plenty of other interfaces where I am
either preparing a string (and yes, I do want a C string for simplicity
of manipulation such as using sprintf to create it) which has either a
*fixed* final length or a small but definite upper bound.

In such cases, there often is some sort of inbetween:
Think of a piece of code that generates C identifiers which can be
restricted to a certain length, say 6 or 31 characters.
I'd generate them full-length, with the realloc() approach -- obviously
starting with a sufficiently large initial buffer -- and then shorten
them by a well-defined algorithm to the required length while retaining
uniqueness within a set of "visible" identifiers.
It is even possible that the final, _immutable_ version is copied to
a fixed-length array.
BTW: The "1 byte" remark IMO is insulting the intelligence of the
involved participants and was rather unnecessary -- even though
"*NEVER*" and "*ALWAYS*" make my teeth ache...

but instead use the reallocing buffer

>strategy. There is a common tendency in C programmers to conflate char
arrays and strings -- the two are different things conceptually and
this has implications for what you are actually doing.

True. Life would be much easier if C had a true string type that
automatically resized, but it doesn't, and when you know a reasonable
and definite upper bound on size a fixed length array is perfectly
adequate to the job.

Which is different how from what Paul wrote?
If there is a definite upper bound, you typically do not do
"anything wild" with the arrays/strings.
<snip>

>malloc, calloc and realloc are fairly slow function calls. So one
thing you should generally do is try to minimize the number of times
you call such functions. When reusing a buffer like you are doing
above, you only want to realloc more space if you need it.

In addition you can hit memory fragmentation issues.

Indeed. A "reasonably" large starting buffer with the option for
reallocation often is one of the better ways to go. Determining
"reasonable" of course may be a hard problem subject to changes
throughout the programme's lifetime :-)
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Aug 28 '06 #8

bwaichu

Michael Mair wrote:

>
Indeed. A "reasonably" large starting buffer with the option for
reallocation often is one of the better ways to go. Determining
"reasonable" of course may be a hard problem subject to changes
throughout the programme's lifetime :-)

That is where I am struggling a bit. I know my system uses 512 blocks,
so I am considering allocating 512 bytes from the onset and increasing
it as mulitple of 2 if it needs to be larger. However, BUFSIZ is
defined
on my system to be 1024. But starting at 1024 might be overkill.

So how do you manage buffer size and speed with portability? If I were
to
move to another system with a different block size, any code written to
improve performance wouldn't have an impact. Should I start with the
BUFSIZ since it's in the standard library and check for boundaries on
that number and mulitple it up by a factor of 2?

<OT>
The problem is that I doubt anything in this example program would
exceed
80 characters. The problem is that I am on a 512 block system, so
wouldn't
it cost more to write allocated memory that isn't divisble with the
block size?
</OT>

<code correction>
Before someone tangents the discussion, the regcomp part of the code
should exit on failure.

if (regcomp(&preg, "hello[41]", REG_BASIC) != 0) {
puts("regcomp() failed to compile\n");
exit(EXIT_FAILURE);
}

</code correction>

Aug 28 '06 #9

Flash Gordon

Michael Mair wrote:

Flash Gordon schrieb:
>we******@gmail.com wrote:
>>bw*****@yahoo.com wrote:

Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

Fixed size arrays are good for things that whose contents *CANNOT*
exceed the size of that array. A string usually is never that kind of
thing.

So usually the decision is quite simple: if you are dealing with
mutable strings *NEVER* use a fixed size array (this is a definitive
mark of an amateur and is usually accompanied by either buffer overflow
errors or ruthless truncation),

OK, so for the string I've got to prepare as part of a message to the
UK Government gateway where the specification says the string has a
maximum length of 10 characters I should not use a fixed size buffer
but a reallocating buffer? Presumable so that I don't immediately
allocate the maximum size I should start at 1 byte and increase the
size 1 byte at a time so avoid wasting memory?

You should not say to *NEVER* do something as invariably someone can
come up with a *real* example where that rule does not apply, such as
the one I just mentioned. I've got plenty of other interfaces where I
am either preparing a string (and yes, I do want a C string for
simplicity of manipulation such as using sprintf to create it) which
has either a *fixed* final length or a small but definite upper bound.

In such cases, there often is some sort of inbetween:
Think of a piece of code that generates C identifiers which can be
restricted to a certain length, say 6 or 31 characters.
I'd generate them full-length, with the realloc() approach -- obviously
starting with a sufficiently large initial buffer -- and then shorten
them by a well-defined algorithm to the required length while retaining
uniqueness within a set of "visible" identifiers.
It is even possible that the final, _immutable_ version is copied to
a fixed-length array.

Since malloced blocks have a certain amount of overhead in terms of
<OT>heap management structures</OTand overhead in terms of calling
malloc/free plus the extra work (however small) of ensuring you don't
leak memory and handle malloc failures I find it far better to use a
fixed length buffer when there is something putting a definite and not
unreasonable limit on the size. Also your suggestion can still lead to
memory fragmentation.

BTW: The "1 byte" remark IMO is insulting the intelligence of the
involved participants and was rather unnecessary -- even though
"*NEVER*" and "*ALWAYS*" make my teeth ache...

Possibly I should have started at two bytes so you could fit more than
an empty string ;-)

> but instead use the reallocing buffer

>>strategy. There is a common tendency in C programmers to conflate char
arrays and strings -- the two are different things conceptually and
this has implications for what you are actually doing.

True. Life would be much easier if C had a true string type that
automatically resized, but it doesn't, and when you know a reasonable
and definite upper bound on size a fixed length array is perfectly
adequate to the job.

Which is different how from what Paul wrote?
If there is a definite upper bound, you typically do not do
"anything wild" with the arrays/strings.

In part I was agreeing with Paul (it can happen), but also pointing out
that he was ignoring the fact that in many situations there are definite
and reasonably small upper bounds, in which case a fixed sized buffer is
the easiest thing to get right.

<snip>

>>malloc, calloc and realloc are fairly slow function calls. So one
thing you should generally do is try to minimize the number of times
you call such functions. When reusing a buffer like you are doing
above, you only want to realloc more space if you need it.

In addition you can hit memory fragmentation issues.

Indeed. A "reasonably" large starting buffer with the option for
reallocation often is one of the better ways to go. Determining
"reasonable" of course may be a hard problem subject to changes
throughout the programme's lifetime :-)

This depends very much on the likely patter of memory usage and the
lifetime of the program but is definitely the right way to go in some
situations.
--
Flash Gordon

Aug 29 '06 #10

bwaichu

The below compiles and passes lint, but that doesn't mean my approach
is good or that I did not over look something. The style could use
some work.

Please dig in and tear it apart.
Here's my first attempt at a resize buffering function:

/* len is the length of the string */
void *
adj_buffer(char **buffer, size_t len) {

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

if (len BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;
}
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
}
else {
num = len / BUFSIZ;
num = (size_t)pow(2, num - 1);
}
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
}
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
free(f_buffer);
buffer = NULL;
return (NULL);
}
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));
return f_buffer;
}

Aug 29 '06 #11

Ian Collins

bw*****@yahoo.com wrote:

The below compiles and passes lint, but that doesn't mean my approach
is good or that I did not over look something. The style could use
some work.

Please dig in and tear it apart.
Here's my first attempt at a resize buffering function:

/* len is the length of the string */
void *
adj_buffer(char **buffer, size_t len) {

Any reason to return void* when this works with char*?

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

if (len BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;

You can't get here if len BUFSIZ.

}
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
}
else {
num = len / BUFSIZ;
num = (size_t)pow(2, num - 1);

What is this bit doing?

Assuming you are allocating chunks of BUFSIZ bytes, why not just num+1?

}
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
}

Why not test against let at the top?

if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
free(f_buffer);
buffer = NULL;
return (NULL);
}
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));

Do you want to do this? I'd have though you would want to preserve the
contents of the original buffer.

return f_buffer;
}

--
Ian Collins.

Aug 29 '06 #12

bwaichu

/* len is the length of the string */
char *
adj_buffer(char **buffer, size_t len) {

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

/* test for a multiple of the buffer
* This is where I am not sure on the best approach. I know that
BUFSIZ is
* 1024, but the system's block size is 512. Should I just multiply
against
* 512? But what if I port to a system with a different block
count?
*/

/* if len is less than BUFSIZ just return all ready allocated buffer */

if (len BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;
}
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
}
else {
num = len / BUFSIZ;

/* corrected for when num truncates to 1; any advantage to using a
multiple of 2? */

num = (size_t)pow(2, (num = 1) ? 2 : (num - 1));
}
} else
return f_buffer;

/* test to avoid overflow after determining num */

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
}
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
free(f_buffer);
buffer = NULL;
return (NULL);
}
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));
return f_buffer;
}

Aug 29 '06 #13

bwaichu

Made a typo in the above. Here's a more correct version:

/* len is the length of the string */
char *
adj_buffer(char **buffer, size_t len) {

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

if (len BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;
}
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
}
else {
num = len / BUFSIZ;
num = (size_t)pow(2, (num == 1) ? 2 : (num - 1));
}
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
}
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
free(f_buffer);
buffer = NULL;
return (NULL);
}
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));
return f_buffer;
}

Aug 29 '06 #14

SM Ryan

"bw*****@yahoo.com" <bw*****@yahoo.comwrote:
# Is it generally better to set-up a buffer (fixed sized array) and read
# and write to that

It's generally better to not overrun your buffer. Otherwise if
you've written your code to expect all code in one string all at
once, then your alloc/realloc the buffer to be big enough. If
you've written your code to stream through the file, then you
use a fixed size buffer.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
JUSTICE!
Justice is dead.

Aug 29 '06 #15

Michael Mair

bw*****@yahoo.com schrieb:

Michael Mair wrote:

>>Indeed. A "reasonably" large starting buffer with the option for
reallocation often is one of the better ways to go. Determining
"reasonable" of course may be a hard problem subject to changes
throughout the programme's lifetime :-)

That is where I am struggling a bit. I know my system uses 512 blocks,
so I am considering allocating 512 bytes from the onset and increasing
it as mulitple of 2 if it needs to be larger. However, BUFSIZ is
defined on my system to be 1024. But starting at 1024 might be
overkill.

Some of the things you should think about
- overall memory of the system
- average memory for your programme
- standard use cases: The stuff users do 90% of the time
- corner cases: Do you wish to support a constellation which
means heavy abuse of the original intent of your programme?
If no: Fixed size buffer and dying gracefully may be an option
if your standard use cases permit that.
Example: You write a programme for text files and someone
gives you a stream of 1e10 random bytes.
If yes: Split the stuff into non-standard but reasonable use
cases and conceivable yet very improbable extreme cases.
If everything else permits, go for a reasonable upper limit
for the reasonable use cases.

As for starting at a power of 2, reread Eric Sosman's
reply: <1156776725.625170@news1nwk>

So how do you manage buffer size and speed with portability? If I were
to
move to another system with a different block size, any code written to
improve performance wouldn't have an impact. Should I start with the
BUFSIZ since it's in the standard library and check for boundaries on
that number and mulitple it up by a factor of 2?

I would not start with BUFSIZ -- it is a value entirely outside
of your control.
If you think that you will deal only with 80 character lines then
think of the users -- if one of them uses text files for, say,
LaTeX input, then they may write whole paragraphs in one line.
If the first two limits above permit it, then it is perfectly
permissible to start at 500 or 1000 elements in your buffer.
If your system gives you only 32K of memory to play with, then
finding out the malloc() overhead and allocating 128-OVERHEAD
and (2*last+OVERHEAD) when increasing may be much more reasonable.

<OT>
The problem is that I doubt anything in this example program would
exceed
80 characters. The problem is that I am on a 512 block system, so
wouldn't
it cost more to write allocated memory that isn't divisble with the
block size?
</OT>

If this slows down your programme and you can prove that by measuring
and you really need your programme to run faster, then start at block
size - OVERHEAD -- usually, you do not need the additional bit of
speed and measuring wastes more time then you and your users can
ever gain by optimized buffer sizes.
This does not apply if your application is something used millions
of times by the system and there are many copies of the system out
there.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Aug 29 '06 #16

CBFalconer

"bw*****@yahoo.com" wrote:

>
The below compiles and passes lint, but that doesn't mean my
approach is good or that I did not over look something. The
style could use some work.

Please dig in and tear it apart.

Here's my first attempt at a resize buffering function:

.... snip code ...

Take a look at my ggets function, available at:

<http://cbfalconer.home.att.net/download/>

which is written in purely standard C and is portable.

--
Chuck F (cb********@yahoo.com) (cb********@maineline.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.netUSE maineline address!

Aug 29 '06 #17

websnarf

bw*****@yahoo.com wrote:

The below compiles and passes lint,

That's because lint is bad at math.

[...] but that doesn't mean my approach
is good or that I did not over look something. The style could use
some work.

Style is far less relevant to its functionality, and in this case its
especially true.

Please dig in and tear it apart.

Here's my first attempt at a resize buffering function:

/* len is the length of the string */
void *
adj_buffer(char **buffer, size_t len) {

Ok, so you get the old buffer and the new length you want, but you
don't have any indication of what the old length used to be. So you
are not going to be able to determine the case of when the buffer is
already big enough (and you should do nothing) from inside this
routine. This pushes partial memory management issues upward into your
calling code rather than having it all done at the lower levels. I
would prefer a design where all memory management issues were solved at
once in a single function.

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

if (len BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;

This condition can never happen. You may as well omit it.

}
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;

This is not necessarily a power of 2. This looks like its taking an
exact division whenever you are aligned or something. You should drop
this case.

}
else {
num = len / BUFSIZ;
num = (size_t)pow(2, num - 1);

Ok, this is raising the number to a power of 2. Its not finding a
power of 2 greater than
the number, which is different.

The formula you want is:

num = (size_t) pow (2.0, ceil (log (len / (double) BUZSIZ) /
log(2)));

However, this is horrendously expensive to calculate. Instead do this:

int i;
for (num=BUFSIZ; num < len; num += num) {}
num /= BUFSIZ;

Which should be caparatively faster.

}
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
}
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
free(f_buffer);
buffer = NULL;

Lint didn't call you on this? This is an assignment with no
discernable effect. And given what you are doing you probably mean
*buffer = NULL;

return (NULL);
}
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));

bzero is not ANSI.

return f_buffer;
}

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 29 '06 #18

bwaichu

we******@gmail.com wrote:

bw*****@yahoo.com wrote:
Ok, so you get the old buffer and the new length you want, but you
don't have any indication of what the old length used to be. So you
are not going to be able to determine the case of when the buffer is
already big enough (and you should do nothing) from inside this
routine. This pushes partial memory management issues upward into your
calling code rather than having it all done at the lower levels. I
would prefer a design where all memory management issues were solved at
once in a single function.

What is the best way to manage the current size of the buffer? The
first pass through here, I know, what the initial buffer size was, but
the second time through here, I do not, so I agree with your assessment
above. What is the best way to calculate the size
of a memory allocated buffer? I assume I should do it as part of this
function, so it's self-contained.

Ok, this is raising the number to a power of 2. Its not finding a
power of 2 greater than
the number, which is different.

The formula you want is:

num = (size_t) pow (2.0, ceil (log (len / (double) BUZSIZ) /
log(2)));

However, this is horrendously expensive to calculate. Instead do this:

int i;
for (num=BUFSIZ; num < len; num += num) {}
num /= BUFSIZ;

Which should be caparatively faster.

I really like that approach. I think I went too complicated at the
onset.

if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
free(f_buffer);
buffer = NULL;

Lint didn't call you on this? This is an assignment with no
discernable effect. And given what you are doing you probably mean
*buffer = NULL;

I meant (same difference):

f_buffer = NULL;

Maybe I am using lint incorrectly. What flags would have caught that?

>
return (NULL);
}
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));

bzero is not ANSI.

Would memset (which is ANSI) be a better alternative or should I skip
this step?

memset(f_buffer, 0, sizeof(f_buffer));

?

Cheers,

Brian

Aug 29 '06 #19

Flash Gordon

bw*****@yahoo.com wrote:

Made a typo in the above. Here's a more correct version:

/* len is the length of the string */
char *
adj_buffer(char **buffer, size_t len) {

char *f_buffer;
char *new_buffer;
size_t num = 0;

new_buffer = NULL;
f_buffer = *buffer;

Why not initialise new_buffer and f_buffer on definition?

if (len BUFSIZ) {
if (len / BUFSIZ == 0) {
num = 1;

It can still never reach this line. Just as Ian said before.

}
else if ( len % BUFSIZ == 0 ){
num = len / BUFSIZ;
}
else {
num = len / BUFSIZ;
num = (size_t)pow(2, (num == 1) ? 2 : (num - 1));

Why would anyone use a floating point maths function to calculate how
much to increase buffer size by? That is just asking for things to slow
down.

}
} else
return f_buffer;

if (num && SIZE_MAX / num < BUFSIZ) {
errno = ENOMEM;
err(1, "overflow");
}
if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
free(f_buffer);
buffer = NULL;
return (NULL);
}
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));

Why use bzero which is not standard C when there is a perfectly standard
memset function? In any case, as Ian suggested, I doubt that this is
what you want to do. Just as Ian doubted it.

return f_buffer;
}

--
Flash Gordon

Aug 29 '06 #20

bwaichu

One problem I see with the above is that I am not setting any ceiling
on how big the buffer can get. Should I restrict size or let realloc()
handle that?

Here's my second attempt at building a function to handle buffers:

/* adj the size of the buffer based on the length of the input string
*/
char *
adj_buffer(char **buffer, size_t len, size_t *curr_buff_size) {

char *f_buffer;
char *new_buffer;
size_t num;

f_buffer = *buffer;
new_buffer = NULL;

if ( *curr_buff_size < len) {
for(num = BUFSIZ; num < len; num += BUFSIZ)
; /* nothing */
}
else
return f_buffer;

if ( (new_buffer = realloc(f_buffer, num)) == NULL) {
free(f_buffer);
f_buffer = NULL;
return (NULL);
}
f_buffer = new_buffer;
memset(f_buffer, 0, num);
*curr_buff_size = num;
return f_buffer;
}

Aug 29 '06 #21

Ian Collins

bw*****@yahoo.com wrote:

One problem I see with the above is that I am not setting any ceiling
on how big the buffer can get. Should I restrict size or let realloc()
handle that?

Above what? You haven't provided any context. Please do in future.

Here's my second attempt at building a function to handle buffers:

/* adj the size of the buffer based on the length of the input string
*/
char *
adj_buffer(char **buffer, size_t len, size_t *curr_buff_size) {

char *f_buffer;
char *new_buffer;
size_t num;

f_buffer = *buffer;
new_buffer = NULL;

if ( *curr_buff_size < len) {
for(num = BUFSIZ; num < len; num += BUFSIZ)
; /* nothing */

num is now inappropriately named as it is a size.

Why not stick with a simple division?

}
else
return f_buffer;

if ( (new_buffer = realloc(f_buffer, num)) == NULL) {
free(f_buffer);
f_buffer = NULL;
return (NULL);
}
f_buffer = new_buffer;
memset(f_buffer, 0, num);

Again, are you sure you want to do this? If so, please explain why.

*curr_buff_size = num;
return f_buffer;
}

--
Ian Collins.

Aug 29 '06 #22

bwaichu

Ian Collins wrote:

if ( *curr_buff_size < len) {
for(num = BUFSIZ; num < len; num += BUFSIZ)
; /* nothing */

num is now inappropriately named as it is a size.

You lost me. Is this just a question of style? I can call it
something
else. It's just the size of what I will pass to realloc below. I
probably
should just call it size to mirror the function.

I have since decided to increase the buffer as:

num +=num

which should reduce my calls, but I endup with a large buffer.

Why not stick with a simple division?

Division of what? Can you provide an example?

}
else
return f_buffer;

if ( (new_buffer = realloc(f_buffer, num)) == NULL) {
free(f_buffer);
f_buffer = NULL;
return (NULL);
}
f_buffer = new_buffer;
memset(f_buffer, 0, num);

Again, are you sure you want to do this? If so, please explain why.

Are you talking about filling up the allocated space with 0's? If so,
the only reason why I am doing it is so that I am not passing allocated
memory filled up with garbage. I am just writing this as if I had
called
calloc again. What is the con of doing this?

Aug 29 '06 #23

Ian Collins

bw*****@yahoo.com wrote:

Ian Collins wrote:

>>> if ( *curr_buff_size < len) {
for(num = BUFSIZ; num < len; num += BUFSIZ)
; /* nothing */

num is now inappropriately named as it is a size.

You lost me. Is this just a question of style? I can call it
something
else. It's just the size of what I will pass to realloc below. I
probably
should just call it size to mirror the function.

That's a more appropriate name.

I have since decided to increase the buffer as:

num +=num

which should reduce my calls, but I endup with a large buffer.

>>Why not stick with a simple division?

Division of what? Can you provide an example?

Like you had before,

num = len / BUFSIZ;

if( len % BUFSIZ != 0 ) ++num;

>>
Again, are you sure you want to do this? If so, please explain why.

Are you talking about filling up the allocated space with 0's? If so,
the only reason why I am doing it is so that I am not passing allocated
memory filled up with garbage. I am just writing this as if I had
called
calloc again. What is the con of doing this?

You destroy the data that was in the original buffer, whereas realloc
preserves it.

--
Ian Collins.

Aug 29 '06 #24

CBFalconer

"bw*****@yahoo.com" wrote:

>
One problem I see with the above is that I am not setting any
ceiling on how big the buffer can get. Should I restrict size or
let realloc() handle that?

There is no 'above' here, so your post is meaningless. Google is
not usenet, it is only a poor imitation of an interface to usenet.
You should always ensure your articles can stand by themselves,
which means adequate quoting. See my sig below (which is slightly
out of date) and especially the referenced URLs.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Aug 29 '06 #25

bwaichu

Ian Collins wrote:

Division of what? Can you provide an example?
Like you had before,

num = len / BUFSIZ;

if( len % BUFSIZ != 0 ) ++num;

What is the advantage of one over the other? If I use the above,
I have to check to make sure I don't overflow. But I immediately
know how big num has to be. I can also avoid a call to realloc
if I know I am going to overflow.

With incrementing the for loop, I have to waste an unknown number of
loops to arrive at the correct size, but I can avoid the num * size
idiom.

I changed it because I wanted to get away from the num * size idiom.
It seems less error prone to just pass a size. And the code is easier
to read. I also don't have to worry about testing to see if I am
bigger
than SIZE_MAX as I can never become bigger than size_t.

Now, if I can figure out a way to avoid starting at BUFSIZ when I know
the
size of the original buffer is greater, than I might be able to save
some
for loops. This is where I am tempted to go back to the old approach.

Are you talking about filling up the allocated space with 0's? If so,
the only reason why I am doing it is so that I am not passing allocated
memory filled up with garbage. I am just writing this as if I had
called
calloc again. What is the con of doing this?
You destroy the data that was in the original buffer, whereas realloc
preserves it.

So your suggestion is that if I am really just wrapping a function
around realloc,
then I should shouldn't change the behavior of realloc? I can agree
with that
approach. It's something I overlooked.

Aug 29 '06 #26

bwaichu

CBFalconer wrote:

There is no 'above' here, so your post is meaningless. Google is
not usenet, it is only a poor imitation of an interface to usenet.
You should always ensure your articles can stand by themselves,
which means adequate quoting. See my sig below (which is slightly
out of date) and especially the referenced URLs.

I have been clicking show options. I meant below. Typo on my part.

I used to use Pine, but I figured this would be easier.

Aug 29 '06 #27

websnarf

bw*****@yahoo.com wrote:

we******@gmail.com wrote:
bw*****@yahoo.com wrote:
Ok, so you get the old buffer and the new length you want, but you
don't have any indication of what the old length used to be. So you
are not going to be able to determine the case of when the buffer is
already big enough (and you should do nothing) from inside this
routine. This pushes partial memory management issues upward into your
calling code rather than having it all done at the lower levels. I
would prefer a design where all memory management issues were solved at
once in a single function.

What is the best way to manage the current size of the buffer? The
first pass through here, I know, what the initial buffer size was, but
the second time through here, I do not, so I agree with your assessment
above. What is the best way to calculate the size
of a memory allocated buffer? I assume I should do it as part of this
function, so it's self-contained.

Well, you should *pass in* the old size as an additional parameter.
Unfortunately, C does not let you extract what the allocation size is
for a given pointer which has been allocated, so you have to track it
by yourself.

Since the size gets modified by your reallocating it, you want to pass
in the pointer to the variable holding the length so that you can
change it on the way out. I would recommend something like the
following:

int strSizeAdjust (char ** str, int * memLength, int minSize) {
int i;
char * newStr;

if (*memLength minSize) return 0; /* No size adjustment
needed */
for (i=*memLength; i <= minSize; i++) {};
newStr = (char *) realloc (*str, memLength * sizeof (char));
if (NULL == newStr) return -__LINE__; /* Out of memory error */
*str = newStr;
*memLength = i;
return 1; /* A size adjustment was made */
}

if ( (new_buffer = realloc(f_buffer, num * BUFSIZ)) == NULL) {
free(f_buffer);
buffer = NULL;
Lint didn't call you on this? This is an assignment with no
discernable effect. And given what you are doing you probably mean
*buffer = NULL;

I meant (same difference):

f_buffer = NULL;

Maybe I am using lint incorrectly. What flags would have caught that?

Dunno, its been a while since I last used the real lint. I'm just
surpised it didn't catch a "do nothing" operation.

return (NULL);
}
f_buffer = new_buffer;
bzero(f_buffer, sizeof(f_buffer));
bzero is not ANSI.

Would memset (which is ANSI) be a better alternative or should I skip
this step?

memset(f_buffer, 0, sizeof(f_buffer));

You can skip the step, and if performance matters to you, you probably
should. sizeof() is the wrong operator to be using here. The length
of the buffer being pointered to is num*BUFSIZ, not sizeof(f_buffer).

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 30 '06 #28

SM Ryan

"bw*****@yahoo.com" <bw*****@yahoo.comwrote:
# One problem I see with the above is that I am not setting any ceiling
# on how big the buffer can get. Should I restrict size or let realloc()
# handle that?

# if ( *curr_buff_size < len) {
# for(num = BUFSIZ; num < len; num += BUFSIZ)
# ; /* nothing */
# }

Increasing buffer size by a constant factor rather than a constant
increment keeps the algorithm linear in time and space. If you
increase the buffer by a factor of f, f>1, you risk overshooting
the space by a (f-1) times the file size. The code is linear, but
it can still have a large factor on its linearity.

On virtual memory systems with most files, this will work within
the address space. As 64-bit pointer vm machines become popular,
this will fit even easier. So you can have something that is likely
to work on typical files. If that's good enough, you're done.

However it will not work on all systems with all files. If you
want that, you have to stream through the file, possibly setting
up buffer frames and other techniques.

If you're willing to restrict your code to a subset of systems
such as unix, then you can also use memory mapping of files. This
essentially uses kernel paging to allocate a sufficiently large
buffer and fill it with file in one operation.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
She broke your heart and inadvertently drove men to deviant lifestyles.

Aug 30 '06 #29

Buffer or Realloc?

Similar topics