473,789 Members | 2,500 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

General method for dynamically allocating memory for a string

I have searched the internet for malloc and dynamic malloc; however, I still
don't know or readily see what is general way to allocate memory to char *
variable that I want to assign the substring that I found inside of a
string.

Any ideas?
Aug 30 '06
94 4778
jacob navia <ja***@jacob.re mcomp.frwrites:
Joe Wright wrote:
>Richard Heathfield wrote:
[ snip all ]
Some time ago now I attacked the GC problem, as I saw it, with what
I called GE or Garbage Elimination.
It has to do with wrapping malloc, calloc, realloc and free such
that they all come to GE first. GE is basically a manager of a
linked list of allocation data, including addresses and sizes.
GE.h adds 'size_t size(void *p);' to the vocabulary, size of the
allocation.
User calls to *alloc are simply recorded in the list and then passed
to their libc namesakes. Calls to free() are looked up in the list
and if found result in a call to libc free() and deletion from the
list. If the user calls free(x) and x is not in the list, we simply
return, a NOP.
What do you think of it?

I am afraid you are actually repeating the code of malloc.
The system function malloc has been probably coded with great care to
do exactly that:

Find a free block in the list of free blocks!

You are just adding a layer that does the same.
But that layer in the system malloc() is not exposed to the user. The
only *portable* way to do this kind of thing is to add a wrapper
around *alloc() and free(). Such a wrapper can detect errors that
malloc() can't (such as free()ing the same pointer twice).

It could significantly hurt performance if you keep track of
allocations with a simple linear linked list.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 2 '06 #81
>Some time ago now I attacked the GC problem, as I saw it, with what I
>called GE or Garbage Elimination.

It has to do with wrapping malloc, calloc, realloc and free such that
they all come to GE first. GE is basically a manager of a linked list of
allocation data, including addresses and sizes.

GE.h adds 'size_t size(void *p);' to the vocabulary, size of the allocation.

User calls to *alloc are simply recorded in the list and then passed to
their libc namesakes. Calls to free() are looked up in the list and if
found result in a call to libc free() and deletion from the list. If the
user calls free(x) and x is not in the list, we simply return, a NOP.

What do you think of it?
I'd like to see it have a debugging mode with these characteristics :

(1) Attempting to free something not in the list results in a call to abort().
Or perhaps just logs it.

(2) Allocated memory is initialized to a machine-specific bit pattern
that matches neither all bits 0 nor the bit pattern of a null
pointer, and which is likely to cause program aborts when used as
a pointer accidentally. It should be pessimally aligned. This
pattern should be known to users of the platform so it can easily
be spotted in memory dumps by a debugger. Example: on 32-bit
machines where a null pointer does not use this pattern (no,
0xdeadbeef is *NOT* always used as the bit pattern for a null
pointer, even on machines with 32-bit pointers): 0xdeadbeef.

(3) Deallocated memory is scribbled on with a machine-specific bit
pattern different from the one used in (2) above. The objective
here is to find errors like:

free(foo);
free(foo->next);
where freed memory is used. Example: on 32-bit machines where a
null pointer does not use this pattern: 0xf3eebee1.

Another useful mode is to have (2) and (3) use random bit patterns.
Also, code like this could easily have a memory-leak-detection feature.

Sep 3 '06 #82
Keith Thompson wrote:
jacob navia <ja***@jacob.re mcomp.frwrites:
>Joe Wright wrote:
>>Richard Heathfield wrote:
[ snip all ]
Some time ago now I attacked the GC problem, as I saw it, with what
I called GE or Garbage Elimination.
It has to do with wrapping malloc, calloc, realloc and free such
that they all come to GE first. GE is basically a manager of a
linked list of allocation data, including addresses and sizes.
GE.h adds 'size_t size(void *p);' to the vocabulary, size of the
allocation.
User calls to *alloc are simply recorded in the list and then passed
to their libc namesakes. Calls to free() are looked up in the list
and if found result in a call to libc free() and deletion from the
list. If the user calls free(x) and x is not in the list, we simply
return, a NOP.
What do you think of it?
I am afraid you are actually repeating the code of malloc.
The system function malloc has been probably coded with great care to
do exactly that:

Find a free block in the list of free blocks!

You are just adding a layer that does the same.

But that layer in the system malloc() is not exposed to the user. The
only *portable* way to do this kind of thing is to add a wrapper
around *alloc() and free(). Such a wrapper can detect errors that
malloc() can't (such as free()ing the same pointer twice).

It could significantly hurt performance if you keep track of
allocations with a simple linear linked list.
I could use some sort of hash I guess but according to the second rule
(for experts only) I haven't optimized it yet. :-)

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Sep 3 '06 #83
av
On Sat, 02 Sep 2006 20:34:19 GMT, Keith Thompson wrote:
>jacob navia <ja***@jacob.re mcomp.frwrites:
>Joe Wright wrote:
>>Richard Heathfield wrote:
[ snip all ]
Some time ago now I attacked the GC problem, as I saw it, with what
I called GE or Garbage Elimination.
It has to do with wrapping malloc, calloc, realloc and free such
that they all come to GE first. GE is basically a manager of a
linked list of allocation data, including addresses and sizes.
GE.h adds 'size_t size(void *p);' to the vocabulary, size of the
allocation.
User calls to *alloc are simply recorded in the list and then passed
to their libc namesakes. Calls to free() are looked up in the list
and if found result in a call to libc free() and deletion from the
list. If the user calls free(x) and x is not in the list, we simply
return, a NOP.
What do you think of it?

I am afraid you are actually repeating the code of malloc.
The system function malloc has been probably coded with great care to
do exactly that:

Find a free block in the list of free blocks!

You are just adding a layer that does the same.

But that layer in the system malloc() is not exposed to the user. The
only *portable* way to do this kind of thing is to add a wrapper
around *alloc() and free(). Such a wrapper can detect errors that
malloc() can't (such as free()ing the same pointer twice).

It could significantly hurt performance if you keep track of
allocations with a simple linear linked list.
wrong, in the 90% of case (how malloc and free have to use) e.g.
something like

{ p1=malloc(n1);
p2=malloc(n2);
p3=malloc(n3);
p4=malloc(n4);
...NO MALLOC HERE
free(p4);
free(p3);
free(p2);
free(p1);
}

the search of p1, p2, p3, p4 is O(1) if there is a search like

free(void* k)
{int i;
...
for(i=last; i>=0; --i)
if(vettore[i]==k) break;
if(i<0) \\ error;

}
Sep 3 '06 #84
av
On Sat, 02 Sep 2006 20:03:01 +0100, Flash Gordon wrote:
>jacob navia wrote:
>Joe Wright wrote:
>>Richard Heathfield wrote:

[ snip all ]

Some time ago now I attacked the GC problem, as I saw it, with what I
called GE or Garbage Elimination.

It has to do with wrapping malloc, calloc, realloc and free such that
they all come to GE first. GE is basically a manager of a linked list
of allocation data, including addresses and sizes.

GE.h adds 'size_t size(void *p);' to the vocabulary, size of the
allocation.

User calls to *alloc are simply recorded in the list and then passed
to their libc namesakes. Calls to free() are looked up in the list and
if found result in a call to libc free() and deletion from the list.
If the user calls free(x) and x is not in the list, we simply return,
a NOP.
for me, free a pointer that is not realised from malloc is an error
that has to be segnaled
>>What do you think of it?

I think that if an invalid pointer is passed it should abort the program
since something serious has gone wrong and you don't know how much else
has been messed up.
>I am afraid you are actually repeating the code of malloc.
The system function malloc has been probably coded with great care to do
exactly that:

Find a free block in the list of free blocks!

You are just adding a layer that does the same.


No, his layer does *not* do the same as the C library. Most
implementation s will crash at some point after you have called free with
an invalid pointer. Joe Wright's wrapper will prevent that.
for me a crash that show an error is better than the "silent running
wrong"
Sep 3 '06 #85

In article <ln************ @nuthaus.mib.or g>, Keith Thompson <ks***@mib.orgw rites:
Joe Wright wrote:
Some time ago now I attacked the GC problem, as I saw it, with what
I called GE or Garbage Elimination.
It has to do with wrapping malloc, calloc, realloc and free such
that they all come to GE first. GE is basically a manager of a
linked list of allocation data, including addresses and sizes.
GE.h adds 'size_t size(void *p);' to the vocabulary, size of the
allocation.
User calls to *alloc are simply recorded in the list and then passed
to their libc namesakes. Calls to free() are looked up in the list
and if found result in a call to libc free() and deletion from the
list. If the user calls free(x) and x is not in the list, we simply
return, a NOP.
My inclination would be to have an optional callback for this and
other error conditions; that would let the library user log the
error, or abort, or whatever seems appropriate for the application.

But the general idea is a good one, and many of the projects I've
worked on had similar allocation wrappers. (They also often support
things like generating allocation reports at program termination, to
help detect leaks.)
It could significantly hurt performance if you keep track of
allocations with a simple linear linked list.
It *could*, particularly due to caching effects (linked lists generally
have poor cache behavior), but there's ample room for optimization if
that becomes an issue, and I suspect that for most C programs it
won't. If a performance-critical program is doing dynamic memory
allocation in the inner loop, that suggests the program might stand a
bit of optimization.

IME, C programs tend to do significantly less dynamic allocation than
programs written in languages with more dynamic-allocation sugar,
presumably for the obvious reason - the programmer has to write all
the allocation code explicitly.

And the tradeoff advantages are sufficient to justify a significant
performance cost in many cases anyway; for example, just duplicating
the malloc housekeeping data somewhere not adjacent to the allocated
storage itself, and using it as a preliminary check, prevents dup-free
heap-smashing attacks, which have been very successful against a wide
range of programs. It's a cheap security measure.

--
Michael Wojcik mi************@ microfocus.com

Poe said that poetry was exact.
But pleasures are mechanical
and know beforehand what they want
and know exactly what they want. -- Elizabeth Bishop
Sep 3 '06 #86
av <av@ala.awrites :
On Sat, 02 Sep 2006 20:34:19 GMT, Keith Thompson wrote:
>>jacob navia <ja***@jacob.re mcomp.frwrites:
[...]
>>Find a free block in the list of free blocks!

You are just adding a layer that does the same.

But that layer in the system malloc() is not exposed to the user. The
only *portable* way to do this kind of thing is to add a wrapper
around *alloc() and free(). Such a wrapper can detect errors that
malloc() can't (such as free()ing the same pointer twice).

It could significantly hurt performance if you keep track of
allocations with a simple linear linked list.

wrong, in the 90% of case (how malloc and free have to use) e.g.
something like

{ p1=malloc(n1);
p2=malloc(n2);
p3=malloc(n3);
p4=malloc(n4);
...NO MALLOC HERE
free(p4);
free(p3);
free(p2);
free(p1);
}

the search of p1, p2, p3, p4 is O(1) if there is a search like
[snip]

Yes, that can happen in *some* cases -- which is why I wrote that it
*could* significantly hurt performance.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Sep 3 '06 #87
Joe Wright wrote:
Richard Heathfield wrote:
Some time ago now I attacked the GC problem, as I saw it, with what I
called GE or Garbage Elimination.

It has to do with wrapping malloc, calloc, realloc and free such that
they all come to GE first. GE is basically a manager of a linked list of
allocation data, including addresses and sizes.

GE.h adds 'size_t size(void *p);' to the vocabulary, size of the allocation.

User calls to *alloc are simply recorded in the list and then passed to
their libc namesakes. Calls to free() are looked up in the list and if
found result in a call to libc free() and deletion from the list. If the
user calls free(x) and x is not in the list, we simply return, a NOP.

What do you think of it?
This does not solve the general problem of garbage collection. GC
typically eliminates the need for free() by automatically recycling
memory any time that memory stops being referenced by any pointer that
is ultimately reachable from within the program's current run state.
>From what I can tell, what you've done is simply enhanced the memory
model by adding a size(void *p) function, and have a method for
detecting bogus calls to free() (including double frees.) There is no
end of ways to enhance C's memory model, and this is certainly one of
them, but this is not GC. True GCs are somewhat difficult to implement
in C and ultimately rely on platform specific behavior (to access all
live run-time autos, and all writable data areas, for example). Though
they certainly exist (the Boehm GC mechanism, for example.)

That all being said, I fully endorse the idea of enhancing C's memory
model. A big reason why people have run away from C and gone to other
langauges is because the whole malloc/free, programming model is very
hard to sustain especially given the bear-bones support that the
language gives you. But in doing such enhancements, you should not
target an attempt to duplicate GC, since that truly changes programming
paradigms -- C has way too much cruft in it to support a true paradigm
shift without substantial change to the language (comparable to what
C++ or Objective C did.)

What you want to do is to go ahead and embrace C's basic way of doing
things (after all that's the environment) we are in, but attack all its
main weaknesses. I.e., when a GC advocate says "C's memory model is
bad because of reason/example x" you want to be able to respond, "No,
that's easily detectable and correctable with an enhanced C memory
manager that does y". With this in mind, let us return to your "GE"
enhancement.

In your case, rather than *IGNORING* attempts to free garbage, you
should generate some sort of diagnostic to provide the programmer with
information telling them that they did something wrong. In fact you
can do this:

/* In some include file somewhere, say "estdlib.h" */
#define free(x) free_enhanced ((x), __FILE__, __LINE__)
void free_enhanced (void *, const char *, int);

/* In some module, say estdlib.c */
void free_enhanced (void * ptr, const char * file, int line) {
if (wasLegallyAllo cated (ptr))
specialfree (ptr); /* find the real header, then free */
else
diagnostic_badf ree (file, line, ptr);
}

So that in your error log, or message or whatever you decide to do, you
know exactly which call to free is failing. Its very important for the
programmer to know that this bad thing is going in in his/her program
as this may be symptomatic of more serious problems in the program, and
simply avoiding this one anomily is likely to be a bandage that just
doesn't do the surgeon's job.

Another really easy to implement feature is to simply keep track of the
total amount of memory currently allocated, as well as the lifetime
maximum allocated by the program. You could then provide two simple
functions that just returned these values. These are usually
sufficient hints for most programmers to know if they are leaking
memory or not.

As long as you are tracking all allocations in a linked list, you also
might as well provide a memory traversal mechanism. I.e., an ability
to walk through all allocated memory locations. Why would you do this?
Well you would do it in conjunction with an enhancement that tracked
*where* each allocation came from:

/* In some include file somewhere, say "estdlib.h" */
#define malloc(x) malloc_enhanced ((x), __FILE__, __LINE__)
void * malloc_enhanced (size_t, const char *, int);

/* In some module, say estdlib.c */
struct enhancedMemHdr {
struct enhancedMemHdr * linkNext;
size_t sz;
const char * moduleOrg;
int lineNumberOrg;
char mem[1]; /* struct hack */
};

void * malloc_enhanced (size_t sz, const char * file, int line) {
struct enhancedMemHdr * ptr;
if (!sz) {
diagnostic_badm alloc (file, line, sz);
return NULL;
}
/* store sz, file & line as well: */
ptr = specialmalloc (sz, file, line);
if (!ptr) {
diagnostic_outo fmemory (file, line, sz); /* don't abort */
}
return ptr.mem;
}

So the point is not to look into the memory itself while you walk the
allocations (you wouldn't have type information, so that would be kind
of useless) but rather you would be interested in *where* the memory
got allocated. So you could do some simple statistics to figure out
where your memory was mostly being allocated for.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 3 '06 #88
"Herbert Rosenau" <os****@pc-rosenau.dewrote in message
news:wm******** *************** ****@JUPITER1.P C-ROSENAU.DE...
On Fri, 1 Sep 2006 11:01:21 UTC, "Philip Potter"
<ph***********@ xilinx.comwrote :
Richard, while you have earned a lot of my respect and admiration
through
your posting, I do feel that you are on a little bit of a crusade here.
You
seem to be arguing that because garbage collection can't protect against
all
memory leaks without care from the user, it is worthless. This is a
somewhat
ridiculous argument - after all, manual memory management has precisely
the
same property.

No. It is quite more easy to write errorfree programs using
malloc()/free() than having a defective GC. A GC that can't handle
perfectly each and any dynamic memory is perfectly unuseable per
design.
GCs don't give you a license not to think, but I didn't see anybody
arguing
that they do.

GC as it should be designed araises this claim.
I'm not sure what this sentence means. Assuming "araises" is a typo for
"raises", you're changing the definition of a GC to be something which is
impossible to implement, and then rebutting it. It's a straw man argument.

The fact of the matter is that GCs are not inherently broken. Java and LISP
programs do actually work without crashing, provided the programmer
understands what they are doing. (And if the programmer doesn't know what
they are doing, well, all is lost.)
In C it will fail
always miserably, so it is useless.
I'm prepared to talk about whether or not GCs are suited to C, but blind
assertions do little to convince me that they are not.

I personally feel that GCs are suited to some forms of programming but not
others. A programmer using a GC still has to take care - but less care IMHO
than one using malloc()/free(); GCs reduce development time and maintainence
costs, at the cost of less efficient memory management and bigger runtime
footprint - lower performance, in short.

If this isn't what the Standards committee think what C is about, then I'm
not going to argue. But I'm also not going to argue with anybody who plugs a
GC into C for their own personal use.

Philip

Sep 4 '06 #89
we******@gmail. com wrote:

[ snipped ]

Thanks Paul for a very thoughtful and helpful critique of GE. Some of
your suggested enhancements are provided for. GE.h has a new prototype
allowing 'void * Free(void *);' and provides a typedef of the node
structure for the list. Free returns a pointer to the beginning of the
list. Also besides 'size_t size(void *)' there is 'size_t sizeall(void)'
which sums all the sizes.

Also 'void freeall(void)' which trips through the list freeing each
allocation and removing its node in turn. I think I have to revisit this
function to ensure allocations get freed in reverse order so that I
don't free **a before I free all the *a pointers.

Thanks again.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Sep 4 '06 #90

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.