469,366 Members | 2,236 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,366 developers. It's quick & easy.

Malcolm's new book

The webpages for my new book are now up and running.

The book, Basic Algorithms, describes many of the fundamental algorithms
used in practical programming, with a bias towards graphics. It includes
mathematical routines from the basics up, including floating point
arithmetic, compression techniques, including the GIF and JPEG file formats,
hashing, red black trees, 3D and 3D graphics, colour spaces, machine
learning with neural networks, hidden Markov models, and fuzzy logic,
clustering, fast memory allocators, and expression parsing.

(Follow the links)

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Jul 24 '07
263 7974
Ed Jensen wrote:
Keith Thompson <ks***@mib.orgwrote:
>But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it.

Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?
The situations in which the system might make malloc believe that it has
acquired enough memory to satisfy the caller, while in reality, the system
has lied to malloc, are cases of huge allocation requests, or at least,
those significantly higher than the amount of physical memory available. In
such cases calloc, by attempting to write to it's acquired memory, will
likely trigger the operating system's memory manager to kill the process,
or another one, all that much earlier, than in the case of malloc, where
the same is likely to happen when the user functions try to actually use
all the memory that malloc apparently returned with.

Also consider the performance implications of writing to, possibly,
gigabytes of memory, especially on a multiuser system.

So calloc doesn't really solve the problem Keith was talking about.

Aug 22 '07 #201
Ed Jensen wrote, On 22/08/07 17:16:
Keith Thompson <ks***@mib.orgwrote:
>But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it.

Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?
Not necessarily. It is common for the OS to provided zeroed memory so
calloc will not have to write to it if the memory has been freshly
obtained from the OS, which is when there is a potential problem.

Also, calloc does not resize blocks, and although it is not obvious from
the quoted material the original discussion was about growing buffers
using realloc.
--
Flash Gordon
Aug 22 '07 #202
Ed Jensen <ej*****@visi.comwrites:
Keith Thompson <ks***@mib.orgwrote:
>But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it.

Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?
I don't know. It would if the calloc() implementation is smart enough
to detect an error while writing to the allocated memory, causing it
to deallocate the memory and return a null pointer. Possibly it does
so. Possibly calloc() can't do this, because it doesn't have the
opportunity to detect the write error before the OS starts killing
processes. And possibly zeroing the allocated memory doesn't require
physically writing to it.

I could probably do some research and answer the question for some
particular system, but other systems could behave differently, so I
won't bother.

And I *shouldn't* have to use calloc() rather than malloc() if
malloc() (assuming it works properly) exactly meets my requirements.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 22 '07 #203
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
>Keith Thompson wrote:
>>CBFalconer <cb********@yahoo.comwrites:
... snip ...
>>>>
That is precisely the purpose of ggets(char **). It is simple and
can't crash. Take a look at:

<http://cbfalconer.home.att.net/download/>

It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.

I would resist any such change. The chances of running into such a
malloc failure are extremely low, especially since the memory is
used as soon as allocated. The idea is to avoid any limits, rather
than make special adjustments for remote possibilities. A maximum
limit would also complicate the error-returning problem.

If a malloc failure is so unlikely, why do you bother to check whether
malloc returns a null pointer?

I just tried your test program, "./tggets /dev/zero". The process
grew to over a gigabyte before it died. I don't think it actually
crashed, but it easily could have (I'm not going to try it on a system
that I share with anybody else). (/dev/zero acts as an endless source
of null characters.)
No, when realloc fails the new data is returned to the stream and
the function returns what it had received with an error marker.
This allows the user to examine it, free the memory, and go back to
getting input.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 22 '07 #204
CBFalconer <cb********@yahoo.comwrites:
[...]
No, when realloc fails the new data is returned to the stream and
the function returns what it had received with an error marker.
This allows the user to examine it, free the memory, and go back to
getting input.
Only if realloc reports its failure by returning a null pointer.

But even if realloc works properly, ggets doesn't provide a way for
the user to ask it not to attempt to allocate more than N bytes. If
my program allocates all the memory that it's permitted to, it might
have some bad impact on the rest of the system. I might reasonably
want to read a text file that may have very long lines (allocating
approximately only as much memory as necessary to hold each line), but
reject any line over, say, a megabyte. ggets doesn't let me exercise
that control.

Of course, since the code is public domain, I can always add such a
capability myself.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 22 '07 #205
pete <pf*****@mindspring.comwrote:
>
I suspect that fgetc and fputc exist, at in part,
to simplify the standard's description of input and output.
No, fgetc and fputc long predated the standard. Back in the dark ages,
getc and putc were *only* implemented as macros with fgetc and fputc
being the corresponding functions.

-Larry Jones

Even though we're both talking english, we're not speaking the same language.
-- Calvin
Aug 22 '07 #206
la************@ugs.com wrote:
>
pete <pf*****@mindspring.comwrote:

I suspect that fgetc and fputc exist, at in part,
to simplify the standard's description of input and output.

No, fgetc and fputc long predated the standard.
Back in the dark ages,
getc and putc were *only* implemented as macros with fgetc and fputc
being the corresponding functions.
Thank you.

--
pete
Aug 22 '07 #207
Keith Thompson wrote:
Ed Jensen <ej*****@visi.comwrites:
>Keith Thompson <ks***@mib.orgwrote:
>>But, as we've discussed here before, malloc doesn't behave
properly on all systems. On some systems, malloc can return a
non-null result even if the memory isn't actually available.
The memory isn't actually allocated until you try to write to it.

Wouldn't simply switching to calloc() solve that problem, since
it allocates memory and writes to all of it?

I don't know. It would if the calloc() implementation is smart
enough to detect an error while writing to the allocated memory,
causing it to deallocate the memory and return a null pointer.
Possibly it does so. Possibly calloc() can't do this, because it
doesn't have the opportunity to detect the write error before the
OS starts killing processes. And possibly zeroing the allocated
memory doesn't require physically writing to it.
Why should it? What's to prevent the system from using 'copy on
non-zero write' rather than 'copy on write' as the solution?

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 23 '07 #208
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
[...]
>No, when realloc fails the new data is returned to the stream and
the function returns what it had received with an error marker.
This allows the user to examine it, free the memory, and go back to
getting input.

Only if realloc reports its failure by returning a null pointer.
If it doesn't you don't have a C system.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 23 '07 #209
Flash Gordon <sp**@flash-gordon.me.ukwrote:
>Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?

Not necessarily. It is common for the OS to provided zeroed memory so
calloc will not have to write to it if the memory has been freshly
obtained from the OS, which is when there is a potential problem.
That's interesting. Would malloc() followed by memset() (to manually
zero the entire contents of the allocated memory) resolve the problem?

I assume you would need some OS-specific code to catch a failed
memset() (assuming the "real" allocation happens during the write, and
you'd need to catch the failure somehow).
Also, calloc does not resize blocks, and although it is not obvious from
the quoted material the original discussion was about growing buffers
using realloc.
That's also interesting. I've never tried to realloc() memory
allocated with calloc() before, and thus never gave it any thought.
Aug 23 '07 #210
Keith Thompson said:

<snip>
But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.
It has been suggested to him already on several occasions. If he
rejected all the other suggestions, I don't reckon he's about to change
his mind. It's a shame, however, since it renders ggets unrecommendable
as a serious routine for use in production code.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 23 '07 #211
Ed Jensen wrote, On 23/08/07 15:10:
Flash Gordon <sp**@flash-gordon.me.ukwrote:
>>Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?
Not necessarily. It is common for the OS to provided zeroed memory so
calloc will not have to write to it if the memory has been freshly
obtained from the OS, which is when there is a potential problem.

That's interesting.
I just realised that what I wrote could be misinterpreted. It is common
for the OS to provide zeroed memory, but the reason has nothing to do
with calloc it is to prevent you getting possibly sensitive data from
some other program. Given that this occurs malloc could be written as:
IF memory available in free list THEN
zero memory and return pointer to it
ELSE
Get memory from OS and return pointer to it without zeroing it
Would malloc() followed by memset() (to manually
zero the entire contents of the allocated memory) resolve the problem?
Depends. The compiler could be clever enough to replace that with a call
to calloc.
I assume you would need some OS-specific code to catch a failed
memset() (assuming the "real" allocation happens during the write, and
you'd need to catch the failure somehow).
Since you have to go the system specific route, you might as well go the
system specific route of finding a way to disable lazy allocation.
>Also, calloc does not resize blocks, and although it is not obvious from
the quoted material the original discussion was about growing buffers
using realloc.

That's also interesting. I've never tried to realloc() memory
allocated with calloc() before, and thus never gave it any thought.
You can, although any extra memory will not be zeroed obviously.

The original discussion was about a buffer allocated with malloc then
grown with realloc (no zeroing of memory involved). Someone suggested
using calloc to sidestep the lazy allocation problem.
--
Flash Gordon
Aug 23 '07 #212

"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.
The problem is you are asking for something inherently very difficult. To
accept lines of arbitrary size, but reject "maliciously long lines". If
you've got some sort of model of the input you expect then you can maybe
discriminate, but this is highly advanced AI programming, not low-level code
for an input function.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 23 '07 #213
Malcolm McLean wrote, On 23/08/07 20:37:
>
"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.
The problem is you are asking for something inherently very difficult.
To accept lines of arbitrary size, but reject "maliciously long lines".
If you've got some sort of model of the input you expect then you can
maybe discriminate, but this is highly advanced AI programming, not
low-level code for an input function.
Keith's point is that if the user of the library function could specify
a maximum size (possibly 0 meaning unlimited) then the user of the
library function could decide on some suitable upper bound.
--
Flash Gordon
Aug 23 '07 #214

"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
Keith's point is that if the user of the library function could specify a
maximum size (possibly 0 meaning unlimited) then the user of the library
function could decide on some suitable upper bound.
-1 for unlimited. Demands for zero-length objects should be honoured. But
then the parameter cannot be a size_t :-)

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 23 '07 #215
Malcolm McLean wrote:
>
"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.
The problem is you are asking for something inherently very difficult. To
accept lines of arbitrary size, but reject "maliciously long lines". If
you've got some sort of model of the input you expect then you can maybe
discriminate, but this is highly advanced AI programming, not low-level
code for an input function.
I don't see what AI-like about this at all. The approximate expected line
length is something that the caller should know. A generic, reusable
routine like ggets/getline cannot know about this. However, what it can do,
is accept a parameter specifying an upper limit on the amount of memory to
attempt to allocate, or the number of characters to attempt to read. If you
*really* want unlimited size, you could signal your intention by passing a
special value like zero. This would allow the get line function to be
tailored at each invocation, depending on what the application deems
reasonable at that point.

Aug 23 '07 #216
Flash Gordon <sp**@flash-gordon.me.ukwrites:
Malcolm McLean wrote, On 23/08/07 20:37:
>"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>>But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.
The problem is you are asking for something inherently very
difficult. To accept lines of arbitrary size, but reject
"maliciously long lines". If you've got some sort of model of the
input you expect then you can maybe discriminate, but this is highly
advanced AI programming, not low-level code for an input function.

Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the user of
the library function could decide on some suitable upper bound.
Exactly. It's not difficult at all.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 23 '07 #217

"santosh" <sa*********@gmail.comwrote in message
news:fa**********@aioe.org...
Malcolm McLean wrote:
I don't see what AI-like about this at all. The approximate expected line
length is something that the caller should know. A generic, reusable
routine like ggets/getline cannot know about this. However, what it can
do,
is accept a parameter specifying an upper limit on the amount of memory to
attempt to allocate, or the number of characters to attempt to read. If
you
*really* want unlimited size, you could signal your intention by passing a
special value like zero. This would allow the get line function to be
tailored at each invocation, depending on what the application deems
reasonable at that point.
Let's say the input is English-language sentences. Upt o about 2000
characters is no problem. Above that, it could be malicious or it could be a
legit sentence.
By Markov modelling English text I can filter out a lot of garbage type
inputs. That leaves legitimate long sentences and malicious ones composed
with the Markov model or a similar one. So we can do a semantic check -
certain verbs can take only certain subjects and objects, for instanxce. A
few violations such as "she shot the bolt" we can ignore, but lots "a rabbit
shot dark dreams furiously" we can reject. Eventually we accept only genuine
English sentences, unless attacker is really very good indeed.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm


Aug 23 '07 #218
Malcolm McLean wrote, On 23/08/07 21:44:
>
"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
>Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the user of
the library function could decide on some suitable upper bound.
-1 for unlimited. Demands for zero-length objects should be honoured.
But then the parameter cannot be a size_t :-)
No, I said 0 for unlimited because that is exactly what I meant. Asking
for at most 0 bytes of input is not sensible IMHO. There is also a
long-standing tradition (I'm not specifically referring to C here) of
using a 0 limit to mean unlimited. Also it allows you to use the correct
type and pass in any valid size.

There are reasons why doing a malloc(0) and getting back a pointer where
no memory has been allocated can be useful which is probably why some
implementations did it.
--
Flash Gordon
Aug 23 '07 #219
Malcolm McLean wrote:
>
"santosh" <sa*********@gmail.comwrote in message
news:fa**********@aioe.org...
>Malcolm McLean wrote:
I don't see what AI-like about this at all. The approximate expected line
length is something that the caller should know. A generic, reusable
routine like ggets/getline cannot know about this. However, what it can
do,
is accept a parameter specifying an upper limit on the amount of memory
to attempt to allocate, or the number of characters to attempt to read.
If you
*really* want unlimited size, you could signal your intention by passing
a special value like zero. This would allow the get line function to be
tailored at each invocation, depending on what the application deems
reasonable at that point.
Let's say the input is English-language sentences. Upt o about 2000
characters is no problem. Above that, it could be malicious or it could be
a legit sentence.
By Markov modelling English text I can filter out a lot of garbage type
inputs. That leaves legitimate long sentences and malicious ones composed
with the Markov model or a similar one. So we can do a semantic check -
certain verbs can take only certain subjects and objects, for instanxce. A
few violations such as "she shot the bolt" we can ignore, but lots "a
rabbit shot dark dreams furiously" we can reject. Eventually we accept
only genuine English sentences, unless attacker is really very good
indeed.
Right, but the point was that the actual input function should provide the
means to retrieve lines of any length, including unlimited. The ggets
routine earlier in the thread provides no mechanism to tell it to stop at a
particular value, presumably determined in the caller, by sophisticated
Markov modelling or just common sense.

Aug 23 '07 #220
>Malcolm McLean wrote, On 23/08/07 21:44:
>-1 for unlimited. Demands for zero-length objects should be honoured.
But then the parameter cannot be a size_t :-)
In article <9l************@news.flash-gordon.me.uk>
Flash Gordon <sp**@flash-gordon.me.ukwrote:
>No, I said 0 for unlimited because that is exactly what I meant. Asking
for at most 0 bytes of input is not sensible IMHO.
Indeed. However, Malcolm McLean's irrational fear of "size_t" aside,
passing -1 would work perfectly: (size_t)-1 is SIZE_MAX, which is
the largest possible value. If the size of the input line exceeds
SIZE_MAX, we have a paradox. :-)
>There is also a long-standing tradition (I'm not specifically
referring to C here) of using a 0 limit to mean unlimited.
This tends to depend on the situation. Sometimes a limit of zero
means "not allowed to do it"; sometimes that makes no sense and it
means "unlimited".
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Aug 23 '07 #221
"Malcolm McLean" <re*******@btinternet.comwrites:
"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
>Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the user
of the library function could decide on some suitable upper bound.
-1 for unlimited. Demands for zero-length objects should be
honoured. But then the parameter cannot be a size_t :-)
No, I'd use 0 for unlimited, since it's a common convention and C does
not support zero-sized objects.

But even if you choose to use -1, (size_t)-1 is a perfectly reasonable
way to specify an effectively unlimited line length.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 23 '07 #222
Peter J. Holzer wrote:
>
On 2007-08-21 23:56, pete <pf*****@mindspring.comwrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
>
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
Is the correct way to cast the char to unsigned char,
or is it to
reinterpret the char as an unsigned char?
neither is the "obvious" way.
I have no argument against that.

--
pete
Aug 24 '07 #223
Keith Thompson said:
"Malcolm McLean" <re*******@btinternet.comwrites:
>"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
>>Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the user
of the library function could decide on some suitable upper bound.
-1 for unlimited. Demands for zero-length objects should be
honoured. But then the parameter cannot be a size_t :-)

No, I'd use 0 for unlimited, since it's a common convention and C does
not support zero-sized objects.

But even if you choose to use -1, (size_t)-1 is a perfectly reasonable
way to specify an effectively unlimited line length.
In the library I'm working on right now, I use (size_t)-1 to indicate
"whatever" - in data capture, it means "the programmer doesn't have a
particular upper limit in mind", in array access it means "on the end,
please", and so on. Appropriate #defines disambiguate these meanings.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 24 '07 #224
Keith Thompson wrote:
>
.... snip ...
>
That's a good point.

But I can reduce the risk of that kind of crash by limiting the
amount of memory I allocate to some "reasonable" size; for example,
I might want handle very long lines, but reject lines longer than
a megabyte. Even if malloc and realloc misbehave, that could still
be a useful feature. ggets, in its current form, doesn't let me
do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.
I think I already gave my reasons for disagreeing.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 25 '07 #225
Flash Gordon wrote:
Malcolm McLean wrote, On 23/08/07 20:37:
>"Keith Thompson" <ks***@mib.orgwrote in message
>>But I can reduce the risk of that kind of crash by limiting the
amount of memory I allocate to some "reasonable" size; for
example, I might want handle very long lines, but reject lines
longer than a megabyte. Even if malloc and realloc misbehave,
that could still be a useful feature. ggets, in its current
form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and
useful feature, not necessarily as a bug report.
The problem is you are asking for something inherently very
difficult. To accept lines of arbitrary size, but reject
"maliciously long lines". If you've got some sort of model of
the input you expect then you can maybe discriminate, but this
is highly advanced AI programming, not low-level code for an
input function.

Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the
user of the library function could decide on some suitable upper
bound.
I wrote ggets{} to replace gets{}. It maintains the simplicity -
you supply only the address of a pointer, which will receive the
pointer to the next input line. The only other thing to worry
about is the return value, which can be 0 (good), EOF (EOF) or
positive non-zero (I/O error). Now you have to remember to arrange
to free() that pointer at some time. You can also copy it
elsewhere, embed it in a linked list, etc. etc.

However, use is always totally safe. The input action will never
overwrite anything. If you put any limits on it, sooner or later
those will bite. Or they are one more parameter to "get right"
before calling. The simplest parameter is no parameter. It is
fairly hard to get that one wrong.

What you can do, without noticeable harm (except to force the user
to initialize something) is say that ggets() will free the pointer
at the beginning of execution, whenever it is non-NULL. I don't
like it, because it complicates the usage, and I have a problem
remembering to do anything, including creating an object and
presetting it to NULL each time it is passed to ggets() (after
freeing, after any earlier ggets() call). You can see how the
specification gets out of hand very rapidly.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 25 '07 #226
CBFalconer <cb********@yahoo.comwrites:
Flash Gordon wrote:
[...]
>Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the
user of the library function could decide on some suitable upper
bound.

I wrote ggets{} to replace gets{}. It maintains the simplicity -
you supply only the address of a pointer, which will receive the
pointer to the next input line. The only other thing to worry
about is the return value, which can be 0 (good), EOF (EOF) or
positive non-zero (I/O error). Now you have to remember to arrange
to free() that pointer at some time. You can also copy it
elsewhere, embed it in a linked list, etc. etc.

However, use is always totally safe. The input action will never
overwrite anything. If you put any limits on it, sooner or later
those will bite. Or they are one more parameter to "get right"
before calling. The simplest parameter is no parameter. It is
fairly hard to get that one wrong.
[...]

A program using ggets(), reading from an arbitrary input file, can
attempt to allocate an arbitrarily large amount of memory. It will
eventually fail cleanly (assuming malloc and realloc work the way
they're required to), but even so, allocating as much memory as you
can may have negative consequences. I can write a program that calls
malloc() in a loop to see how much I can allocate, but I wouldn't want
to run it on a shared system.

I'm not suggesting changing the default behavior, just providing a way
for the user to change it. You could either provide a routine to set
a maximum size for future calls (though that could introduce issues
for threaded environments), or provide an additional function that
lets you specify a limit. (The behavior on exceeding the limit would
have to be defined.)

Heck, since it's public domain, I might go ahead and make some changes
myself. Naturally you're under no obligation to accept them; and if I
distribute it myself I'll certainly give you credit. I'll do this in
my copious free time, of course, so don't hold your breath.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 26 '07 #227
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
>Flash Gordon wrote:
[...]
>>Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the
user of the library function could decide on some suitable upper
bound.

I wrote ggets{} to replace gets{}. It maintains the simplicity -
you supply only the address of a pointer, which will receive the
pointer to the next input line. The only other thing to worry
about is the return value, which can be 0 (good), EOF (EOF) or
positive non-zero (I/O error). Now you have to remember to arrange
to free() that pointer at some time. You can also copy it
elsewhere, embed it in a linked list, etc. etc.

However, use is always totally safe. The input action will never
overwrite anything. If you put any limits on it, sooner or later
those will bite. Or they are one more parameter to "get right"
before calling. The simplest parameter is no parameter. It is
fairly hard to get that one wrong.
[...]

A program using ggets(), reading from an arbitrary input file, can
attempt to allocate an arbitrarily large amount of memory. It will
eventually fail cleanly (assuming malloc and realloc work the way
they're required to), but even so, allocating as much memory as you
can may have negative consequences. I can write a program that calls
malloc() in a loop to see how much I can allocate, but I wouldn't want
to run it on a shared system.

I'm not suggesting changing the default behavior, just providing a way
for the user to change it. You could either provide a routine to set
a maximum size for future calls (though that could introduce issues
for threaded environments), or provide an additional function that
lets you specify a limit. (The behavior on exceeding the limit would
have to be defined.)

Heck, since it's public domain, I might go ahead and make some changes
myself. Naturally you're under no obligation to accept them; and if I
distribute it myself I'll certainly give you credit. I'll do this in
my copious free time, of course, so don't hold your breath.
Since it is PD you can do whatever you wish. However I request
that, if you change the header file in any way, you also change the
routines name.

Note the simplicity and safety of the demo file reverse.c.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 26 '07 #228
On Fri, 03 Aug 2007 09:55:56 -0700, ms*******@yahoo.com wrote:
On Aug 3, 1:18 am, "Malcolm McLean" <regniz...@btinternet.comwrote:
size_t didn't use to be part of the language. It's a relatively new
invention designed to solve the problem of memory buffers bigger than the
range of an int. Of course such buffers should seldom arise, if you obey the
convention that int is the "natural" integer size of the machine.

I understand this is a software group, and people here may not be as
familiar with hardware as either a hardware group or assembly language
group would be. But that statement is totally incorrect. Historically
memory address range has almost always been larger than native data
For micros, minis, and (early) "data processing" machines, _usually_.
size. One particular bit-slice processor had a native data size of 4-
bits, yet an address bus 20 bits wide. The popular 8085 has a 16-bit
address bus but only an 8-bit native data size. The 8086 that the IBM-
Also Intel 8080, Motorola 6809, and MOS Tech (IIRC) 6502.
PC was based on had a 20-bit address bus with a 16-bit native data
size. I could go on with further examples. The point is that it is
actually very rare that the "natural" integer size is as large as the
addressable memory range. Thus the need for a special type to hold the
size of address ranges.
I wouldn't say _very_ rare, but certainly far from universal.
FYI the one counter-example I can think of is the Data General Eclipse
S/140 that had a 15 bit address bus and a 16 bit data size, but it
used memory paging to access more than 32K words. And for those who
may wonder why not just add that 1 extra bit, bit #0 (the MSB) was
used for indirect addressing.
I think (some?) HP minis also had 16b data 15+1b address.

A much more widespread counterexample was IBM S/360 (and clones) with
32b data word (but support for 16b and 8b) and 24b address initially,
only later growing to 31b. Another mini (perhaps supermini) case was
DEC PDP-6 and -10 with 36b data and 18(+5)b address. Motorola 68k (in
original Apple Macintosh) was also nominal 32b data 24b address.

And there were a lot of "scientific" machines like IBM 704/709 et seq,
Univac, CDC, Cyber, Cray with data 36b to 72b but address much less.

- formerly david.thompson1 || achar(64) || worldnet.att.net
Aug 26 '07 #229
Keith Thompson wrote:
>
CBFalconer <cb********@yahoo.comwrites:
Malcolm McLean wrote:
>
... snip ...
>
The function is reasonable drop-in for fgets() in a non-security,
non-safety critical environment. It means the programmer doesn't
have to worry about buffer size. A malicious user can crash things
by passing a massive line to the function, but we don't all have
to consider that possibility.
That is precisely the purpose of ggets(char **). It is simple and
can't crash. Take a look at:

<http://cbfalconer.home.att.net/download/>

It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.
I don't think there's any point in attempting
to publish portable code for nonconforming implementations of C.

--
pete
Aug 26 '07 #230
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
>Malcolm McLean wrote:
>>>
... snip ...
>>>
The function is reasonable drop-in for fgets() in a non-security,
non-safety critical environment. It means the programmer doesn't
have to worry about buffer size. A malicious user can crash things
by passing a massive line to the function, but we don't all have
to consider that possibility.

That is precisely the purpose of ggets(char **). It is simple and
can't crash. Take a look at:

<http://cbfalconer.home.att.net/download/>

It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.
<OT>
Do the said systems at least deliver a signal to the process chosen to be
terminated? This is one of the purposes behind the existence of signals?
Perhaps ENOMEM or similar?
</OT>

<snip>

Aug 26 '07 #231
On 2007-08-26 17:36, santosh <sa*********@gmail.comwrote:
Keith Thompson wrote:
>It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.
However, there is nothing the programmer can do about that. The system
administrator can: He can turn off overcommitment (if the system allows
it - in Linux that was only added in 2.5.x), he can add more swapspace
and/or impose memory limits on individual processes.

><OT>
Do the said systems at least deliver a signal to the process chosen to be
terminated?
On Linux it is SIGKILL - which is one of the two signals which cannot be
caught or ignored.
This is one of the purposes behind the existence of signals?
Yes. In fact, there are some other signals (SIGXCPU, SIGXFSZ) used to
signal excessive resource usage which can be caught or ignored. Using
SIGKILL for exceeding memory usage was probably a bad design decision.
After all, a process can free memory, but it can't decrease CPU usage.

A special signal (something like SIGLOWMEM) might be useful. It should be
sent to processes before the situation gets really desparate, and maybe
only to processes which explicitely request it. Unfortunately, I don't
know of any system which does this.
Perhaps ENOMEM or similar?
ENOMEM is no name for a signal, but for a value for errno. After a
failure of malloc, errno may indeed contain that value.

hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Aug 26 '07 #232
Peter J. Holzer wrote:
On 2007-08-26 17:36, santosh <sa*********@gmail.comwrote:
>Keith Thompson wrote:
>>It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

However, there is nothing the programmer can do about that. The system
administrator can: He can turn off overcommitment (if the system allows
it - in Linux that was only added in 2.5.x), he can add more swapspace
and/or impose memory limits on individual processes.

>><OT>
Do the said systems at least deliver a signal to the process chosen to be
terminated?

On Linux it is SIGKILL - which is one of the two signals which cannot be
caught or ignored.
Which is, I suppose, as good as no signal at all.
>This is one of the purposes behind the existence of signals?

Yes. In fact, there are some other signals (SIGXCPU, SIGXFSZ) used to
signal excessive resource usage which can be caught or ignored. Using
SIGKILL for exceeding memory usage was probably a bad design decision.
After all, a process can free memory, but it can't decrease CPU usage.

A special signal (something like SIGLOWMEM) might be useful. It should be
sent to processes before the situation gets really desparate, and maybe
only to processes which explicitely request it. Unfortunately, I don't
know of any system which does this.
Yes any catchable signal indicating resource constraint would do. It would
give the program a chance to deallocate resources and continue running, or
terminate cleanly, without a coredump.
>Perhaps ENOMEM or similar?

ENOMEM is no name for a signal, but for a value for errno. After a
failure of malloc, errno may indeed contain that value.
Thanks for correcting that slip-up.

Aug 26 '07 #233
Peter J. Holzer <hj*********@hjp.atwrote:
>
A special signal (something like SIGLOWMEM) might be useful. It should be
sent to processes before the situation gets really desparate, and maybe
only to processes which explicitely request it. Unfortunately, I don't
know of any system which does this.
AIX does - it sends SIGDANGER (whose default action is ignore) to all
processes when memory runs low. It only starts sending SIGKILLs when
the situation gets really desperate and it starts with processes that
are large users of memory that *don't* have handlers for SIGDANGER.

-Larry Jones

Another casualty of applied metaphysics. -- Hobbes
Aug 26 '07 #234
CBFalconer <cb********@yahoo.comwrites:
Keith Thompson wrote:
[...]
>Heck, since it's public domain, I might go ahead and make some changes
myself. Naturally you're under no obligation to accept them; and if I
distribute it myself I'll certainly give you credit. I'll do this in
my copious free time, of course, so don't hold your breath.

Since it is PD you can do whatever you wish. However I request
that, if you change the header file in any way, you also change the
routines name.
Hmm. What I was thinking of doing was a drop-in replacement for ggets
and fgets; changing the routine names would make that impossible.
Note the simplicity and safety of the demo file reverse.c.
(It's freverse.c.) When I run "./freverse < /dev/zero", the process
grows continuously until I kill it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 26 '07 #235
pete <pf*****@mindspring.comwrites:
Keith Thompson wrote:
[...]
>I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.

I don't think there's any point in attempting to publish portable
code for nonconforming implementations of C.
Sure, but in this case what I'm suggesting is adding a new feature
that could be useful even on conforming implementations; it's merely a
bit more important on certain non-conforming systems.

If the new feature catered to non-conforming systems but caused
problems on conforming systems, I'd agree with you (though sometimes
such things are still necessary, alas).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 26 '07 #236
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
>Keith Thompson wrote:

[...]
>>Heck, since it's public domain, I might go ahead and make some
changes myself. Naturally you're under no obligation to accept
them; and if I distribute it myself I'll certainly give you
credit. I'll do this in my copious free time, of course, so
don't hold your breath.

Since it is PD you can do whatever you wish. However I request
that, if you change the header file in any way, you also change
the routines name.

Hmm. What I was thinking of doing was a drop-in replacement for
ggets and fgets; changing the routine names would make that
impossible.
>Note the simplicity and safety of the demo file reverse.c.

(It's freverse.c.) When I run "./freverse < /dev/zero", the process
grows continuously until I kill it.
No, its patiently trying to find the EOF marker. :-) Try "man
/dev/zero" too.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 27 '07 #237
CBFalconer <cb********@yahoo.comwrites:
Keith Thompson wrote:
>CBFalconer <cb********@yahoo.comwrites:
>>Keith Thompson wrote:
[...]
>>>Heck, since it's public domain, I might go ahead and make some
changes myself. Naturally you're under no obligation to accept
them; and if I distribute it myself I'll certainly give you
credit. I'll do this in my copious free time, of course, so
don't hold your breath.

Since it is PD you can do whatever you wish. However I request
that, if you change the header file in any way, you also change
the routines name.

Hmm. What I was thinking of doing was a drop-in replacement for
ggets and fgets; changing the routine names would make that
impossible.
>>Note the simplicity and safety of the demo file reverse.c.

(It's freverse.c.) When I run "./freverse < /dev/zero", the process
grows continuously until I kill it.

No, its patiently trying to find the EOF marker. :-) Try "man
/dev/zero" too.
That's exactly the point. I know how /dev/zero works; did you really
think I didn't? (For anyone who doesn't know, "/dev/zero" is a
pseudo-file on Unix-like systems that, on reading, appears as an
endless stream of '\0' characters.)

As a programmer, I may not have control over the content of the files
I read, particularly stdin. If I use gets(), that means I run the
risk of a buffer overflow. ggets() is a vast improvement, but I still
run the risk of attempting to allocate a potentially unbounded amount
of memory. ggets() doesn't give the programmer the ability to set an
upper bound on the amount of memory allocated.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 27 '07 #238
On 2007-08-26 22:21, Keith Thompson <ks***@mib.orgwrote:
CBFalconer <cb********@yahoo.comwrites:
>Note the simplicity and safety of the demo file reverse.c.

(It's freverse.c.) When I run "./freverse < /dev/zero", the process
grows continuously until I kill it.
You aren't patient enough. It will stop growing when it has consumed all
available space:

% limit addressspace 200M
% ./freverse </dev/zero
Reversing stdin to stdout
0 chars in 0 lines
../freverse < /dev/zero 4.62s user 0.58s system 93% cpu 5.550 total

The problem with arbitrary limits is that they are, well, arbitrary.

You may think that you never need to reverse lines longer than 1000000
characters. But then somebody comes along with an input file with a line
of 1000001 characters. With a hard coded limit that won't work. So for
every limit you want a way to configure it at run-time (commandline
switch, config file, environment variable, etc.) which adds extra
complexity. So I think you should think twice whether this is necessary
or wether restricting resource usage by external means is good enough.

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Aug 27 '07 #239

"Peter J. Holzer" <hj*********@hjp.atwrote in message
news:sl************************@zeno.hjp.at...
However, there is nothing the programmer can do about that. The system
administrator can: He can turn off overcommitment (if the system allows
it - in Linux that was only added in 2.5.x), he can add more swapspace
and/or impose memory limits on individual processes.
I think that's the real answer. Users and processes should have a memory
budget. If a user wants an especially large memory space, for instance to
reverse an encylopedia, he has got to request it specially.

However that's got to be done at the system level. At the moment we do have
to deal with systems that will gobble endless resources.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 27 '07 #240
Keith Thompson wrote:
>
.... snip ...
>
As a programmer, I may not have control over the content of the
files I read, particularly stdin. If I use gets(), that means I
run the risk of a buffer overflow. ggets() is a vast improvement,
but I still run the risk of attempting to allocate a potentially
unbounded amount of memory. ggets() doesn't give the programmer
the ability to set an upper bound on the amount of memory
allocated.
Nobody removed fgets() :-)

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 27 '07 #241
CBFalconer <cb********@yahoo.comwrites:
Keith Thompson wrote:
... snip ...
>>
As a programmer, I may not have control over the content of the
files I read, particularly stdin. If I use gets(), that means I
run the risk of a buffer overflow. ggets() is a vast improvement,
but I still run the risk of attempting to allocate a potentially
unbounded amount of memory. ggets() doesn't give the programmer
the ability to set an upper bound on the amount of memory
allocated.

Nobody removed fgets() :-)
Good point. But fgets doesn't let me read a million-character line
without first allocating a million bytes to hold it.

The capability I'm proposing is to be able to read a line up to some
large size N without first allocating a full N bytes of space, where N
is perhaps determined not by the likely size of an input line but by
how much memory I'm willing to allocate. Neither fgets (which
requires me to allocate N bytes first) nor ggets (which will happily
allocate 10*N bytes for certain inputs) lets me do this, at least not
directly.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 27 '07 #242
Keith Thompson wrote:
CBFalconer <cb********@yahoo.comwrites:
>Keith Thompson wrote:
... snip ...
>>As a programmer, I may not have control over the content of the
files I read, particularly stdin. If I use gets(), that means I
run the risk of a buffer overflow. ggets() is a vast improvement,
but I still run the risk of attempting to allocate a potentially
unbounded amount of memory. ggets() doesn't give the programmer
the ability to set an upper bound on the amount of memory
allocated.
Nobody removed fgets() :-)

Good point. But fgets doesn't let me read a million-character line
without first allocating a million bytes to hold it.

The capability I'm proposing is to be able to read a line up to some
large size N without first allocating a full N bytes of space, where N
is perhaps determined not by the likely size of an input line but by
how much memory I'm willing to allocate. Neither fgets (which
requires me to allocate N bytes first) nor ggets (which will happily
allocate 10*N bytes for certain inputs) lets me do this, at least not
directly.
It is fairly simple code. Show us what ktgets() looks like.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Aug 27 '07 #243
Joe Wright <jo********@comcast.netwrites:
Keith Thompson wrote:
>CBFalconer <cb********@yahoo.comwrites:
>>Keith Thompson wrote:
... snip ...
As a programmer, I may not have control over the content of the
files I read, particularly stdin. If I use gets(), that means I
run the risk of a buffer overflow. ggets() is a vast improvement,
but I still run the risk of attempting to allocate a potentially
unbounded amount of memory. ggets() doesn't give the programmer
the ability to set an upper bound on the amount of memory
allocated.
Nobody removed fgets() :-)
Good point. But fgets doesn't let me read a million-character line
without first allocating a million bytes to hold it.
The capability I'm proposing is to be able to read a line up to some
large size N without first allocating a full N bytes of space, where N
is perhaps determined not by the likely size of an input line but by
how much memory I'm willing to allocate. Neither fgets (which
requires me to allocate N bytes first) nor ggets (which will happily
allocate 10*N bytes for certain inputs) lets me do this, at least not
directly.
It is fairly simple code. Show us what ktgets() looks like.
It doesn't exist, and it's entirely possible that it never will.

I should take a closer look at Richard Heathfield's fgetline(); I
think it already does exactly what I'm suggesting ggets should be
extended to do. But it's a bit more complex to use.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 27 '07 #244
Joe Wright wrote:
Keith Thompson wrote:
>CBFalconer <cb********@yahoo.comwrites:
.... snip ...
>>
>>Nobody removed fgets() :-)

Good point. But fgets doesn't let me read a million-character
line without first allocating a million bytes to hold it.

The capability I'm proposing is to be able to read a line up to
some large size N without first allocating a full N bytes of
space, where N is perhaps determined not by the likely size of
an input line but by how much memory I'm willing to allocate.
Neither fgets (which requires me to allocate N bytes first) nor
ggets (which will happily allocate 10*N bytes for certain inputs)
lets me do this, at least not directly.

It is fairly simple code. Show us what ktgets() looks like.
Just take the ggets source, change the name, and add a size_t
parameter. It only comes into play when expanding the storage.
Now you have some more problems, including:

1. How to signal that this is a truncated line.
2. What to do with terminal '\n's.
3. How to train the users.

None of which I want to have anything to do with.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 28 '07 #245
Malcolm McLean said:

<snip>
In practise fgets() is too hard for the average programmer to use
correctly, as has time after time been demonstrated here.
No, it isn't. What is often demonstrated here is *ignorance* of the
proper way to use fgets. Ignorance is curable (although not in all
cases, it appears).
I caused
outrage by suggesting that most programs would be safer if they
replaced fgets() with gets(). However I was right.
No, your suggestion is incorrect. No program is safe if it calls gets.
That is not to say that all programs calling fgets are safe, of course
- but replacing an fgets call with a gets call is just plain stupid.
The real answer is not to use gets() but something like ggets().
No, it isn't, for reasons which were pointed out when ggets first
appeared five years ago and which have been reiterated at various times
ever since.

Malcolm, I really really wish you'd stop talking such junk all the time.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 29 '07 #246

"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
news:7d************@news.flash-gordon.me.uk...
You missed one major reason why it is not suitable for a lot of use. On
memory exhaustion it throws away the probably large amount of input it has
received. Personally I would consider that completely unacceptable for a
lot of uses.
It would be better if ggets() took some action against the half-read data
currently in the stream, to prevent it being called agai nand possibly
returning a wrong result.
The last thing you want is wrong but reasonable-seeming results, which is
what partially-read lines are not too unlikely to generate.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 29 '07 #247
"Malcolm McLean" <re*******@btinternet.comwrites:
In practise fgets() is too hard for the average programmer to use
correctly, as has time after time been demonstrated here. I caused
outrage by suggesting that most programs would be safer if they
replaced fgets() with gets(). However I was right.
Let me guess. You were alone in this belief on that occasion as well?

--
Ben.
Aug 29 '07 #248

"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
"Malcolm McLean" <re*******@btinternet.comwrites:
>In practise fgets() is too hard for the average programmer to use
correctly, as has time after time been demonstrated here. I caused
outrage by suggesting that most programs would be safer if they
replaced fgets() with gets(). However I was right.

Let me guess. You were alone in this belief on that occasion as well?
Yes. About two years later Steve Summit edited to FAQ to point out that
fgets() is problematic if lines overflow the buffer length, which was the
point I was making all along. Never said I was right.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 30 '07 #249
"Malcolm McLean" <re*******@btinternet.comwrites:
"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>"Malcolm McLean" <re*******@btinternet.comwrites:
>>In practise fgets() is too hard for the average programmer to use
correctly, as has time after time been demonstrated here. I caused
outrage by suggesting that most programs would be safer if they
replaced fgets() with gets(). However I was right.

Let me guess. You were alone in this belief on that occasion as well?
Yes. About two years later Steve Summit edited to FAQ to point out
that fgets() is problematic if lines overflow the buffer length, which
was the point I was making all along. Never said I was right.
No. fgets is not problematic. fgets behaves in a defined non problematic
way.
Aug 30 '07 #250

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by anonymous | last post: by
12 posts views Thread by Guido Mureddu | last post: by
16 posts views Thread by Robert Zurer | last post: by
11 posts views Thread by www.douglassdavis.com | last post: by
6 posts views Thread by Hello | last post: by
1 post views Thread by jrw133 | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.