By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,538 Members | 2,225 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,538 IT Pros & Developers. It's quick & easy.

GC with lots of small ones

P: n/a
Hi,

I wonder if anybody can comment if what I see is normal in FW 1.1 and how to
avoid this.

I have .Net assembly, which creates literally thousands of temporary strings
and other objects when running. Usually it is something like
{
string s=some value;
some local processing here
...
}
so, expectation is that GC will collect it some time after as unused
reference. However, it looks like in lots of cases when strings are returned
to calling method, GC has problems with finding such references and cleaning
them up. Especially, when objects are created on one thread and processed in
another.

Same with arrays, hashtables etc.

I've seen that some such temporary objects survive through tens of GC
cycles. However, when number of such temporary objects is low - less than
20000 or so, GC seems to be able to do the job.

Because of this, assembly starts choking on memory in 2-4 hours during heavy
duty use. VM grows 10 and more times very easily. My intuition is that GC
times out before finding majority of freed references - maybe because it
relocates lot of data during first phase?

Are there any "real" recommendations, which techniques should be used to
make app more GC-friendly? E.g. like, don't create more than 1000 objects
per minute, or always set temp strings after use to null or something like
this?

Thanks
Alex
Jul 21 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a

"AlexS" <sa***********@SPAMsympaticoPLEASE.ca> wrote in message
news:O$**************@tk2msftngp13.phx.gbl...
Hi,

I wonder if anybody can comment if what I see is normal in FW 1.1 and how
to
avoid this.

I have .Net assembly, which creates literally thousands of temporary
strings
and other objects when running. Usually it is something like
{
string s=some value;
some local processing here
...
}
so, expectation is that GC will collect it some time after as unused
reference. However, it looks like in lots of cases when strings are
returned
to calling method, GC has problems with finding such references and
cleaning
them up. Especially, when objects are created on one thread and processed
in
another.

Same with arrays, hashtables etc.

I've seen that some such temporary objects survive through tens of GC
cycles. However, when number of such temporary objects is low - less than
20000 or so, GC seems to be able to do the job.
It sounds like alot of objects are being promoted. Are you just working with
strings, collections, etc or are you using other, more complicated objects?
Because of this, assembly starts choking on memory in 2-4 hours during
heavy
duty use. VM grows 10 and more times very easily. My intuition is that GC
times out before finding majority of freed references - maybe because it
relocates lot of data during first phase?
The GC usually only does a generation 0 sweep, doing gen 1 and 2 sweeps less
often. If your objects are living just long enough to make it to gen 1, then
you may end up with longer cleanup times. Are you using any objects with
finalizers? An object with a finalizer causes its entire object graph to be
promoted, making it effectivly an automatic gen 1. If you are using objects
with finalizers, make sure you are disposing them(or if you wrote them,
provide a IDisposable implementation that calls GC.SupressFinalize).
Are there any "real" recommendations, which techniques should be used to
make app more GC-friendly? E.g. like, don't create more than 1000 objects
per minute, or always set temp strings after use to null or something like
this? It'd be hard to be sure about the object creatino rate, and i doubt thats
the case. I've seen benchmarks of millions of allocations and deallocations
in a minutes time, I don't think object restrictions are going to help. Nor
will setting temp variables to null unless you are in a particular
circumstance.

In situations like(this ignores string interning):

{
string s = "a very large string indeed";
DoSomething(s);
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string" //the first instance of s can be collected
here
DoSomethingElse(s);
}

whereas
{
string s = "a very large string indeed";
DoSomething(s);
s = null; //the first instance of s can be collected here
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string"
DoSomethingElse(s);

}

but I expect that situation to be rather rare and unless the strings are
truely huge(several megs at the least) I wouldn't bother with it.
Thanks
Alex

Jul 21 '05 #2

P: n/a
Hi Alex,

Jay B. Harlow did supply me this link, I find it very interesting,

http://msdn.microsoft.com/architectu...l/scalenet.asp

There is a lot written about how the Garbage Collector functions.

Cor
Jul 21 '05 #3

P: n/a
Daniel,

see in text

Thanks
Alex

"Daniel O'Connell [C# MVP]" <onyxkirx@--NOSPAM--comcast.net> wrote in
message news:%2***************@TK2MSFTNGP11.phx.gbl...

"AlexS" <sa***********@SPAMsympaticoPLEASE.ca> wrote in message
news:O$**************@tk2msftngp13.phx.gbl...
Hi,

I wonder if anybody can comment if what I see is normal in FW 1.1 and how to
avoid this.

I have .Net assembly, which creates literally thousands of temporary
strings
and other objects when running. Usually it is something like
{
string s=some value;
some local processing here
...
}
so, expectation is that GC will collect it some time after as unused
reference. However, it looks like in lots of cases when strings are
returned
to calling method, GC has problems with finding such references and
cleaning
them up. Especially, when objects are created on one thread and processed in
another.

Same with arrays, hashtables etc.

I've seen that some such temporary objects survive through tens of GC
cycles. However, when number of such temporary objects is low - less than 20000 or so, GC seems to be able to do the job.
It sounds like alot of objects are being promoted. Are you just working

with strings, collections, etc or are you using other, more complicated objects?

Mostly it's strings and various collections - hashtables, arraylist, simple
arrays. I wouldn't say they are "more complicated". Is hashtable of classes,
which contain collections of strings more complicated? I am not sure here.
But according to CLR profiler problem seems to be with promotions.


Because of this, assembly starts choking on memory in 2-4 hours during
heavy
duty use. VM grows 10 and more times very easily. My intuition is that GC times out before finding majority of freed references - maybe because it
relocates lot of data during first phase?


The GC usually only does a generation 0 sweep, doing gen 1 and 2 sweeps

less often. If your objects are living just long enough to make it to gen 1, then you may end up with longer cleanup times. Are you using any objects with
finalizers? An object with a finalizer causes its entire object graph to be promoted, making it effectivly an automatic gen 1. If you are using objects with finalizers, make sure you are disposing them(or if you wrote them,
provide a IDisposable implementation that calls GC.SupressFinalize).

No finalizers. It was easy to find and fight leaks like SolidBrushes, where
I can use Dispose, Not for strings.
Are there any "real" recommendations, which techniques should be used to
make app more GC-friendly? E.g. like, don't create more than 1000 objects per minute, or always set temp strings after use to null or something like this? It'd be hard to be sure about the object creatino rate, and i doubt thats
the case. I've seen benchmarks of millions of allocations and

deallocations in a minutes time, I don't think object restrictions are going to help. Nor will setting temp variables to null unless you are in a particular
circumstance.

In situations like(this ignores string interning):

{
string s = "a very large string indeed";
DoSomething(s);
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string" //the first instance of s can be collected
here
DoSomethingElse(s);
}

whereas
{
string s = "a very large string indeed";
DoSomething(s);
s = null; //the first instance of s can be collected here
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string"
DoSomethingElse(s);

}

but I expect that situation to be rather rare and unless the strings are
truely huge(several megs at the least) I wouldn't bother with it.


Some of strings were collected more efficiently when I used second variant.
If s=null; is absent, strings are shown as floating around in heap -
relocated and live. It happens not always, but happens a lot. Especially in
loops, and, it seems, in recursive calls. But strange, that you think it is
only for big strings. My heap is full of small ones - 0.1-10K.
I've seen also lots of chunks from String.Split.

I wonder if there is real difference for GC between

return <string expression>

and

string str=<string expression>;
return str;

?

Thanks
Alex


Jul 21 '05 #4

P: n/a

It sounds like alot of objects are being promoted. Are you just working with
strings, collections, etc or are you using other, more complicated

objects?

Mostly it's strings and various collections - hashtables, arraylist,
simple
arrays. I wouldn't say they are "more complicated". Is hashtable of
classes,
which contain collections of strings more complicated? I am not sure here.
But according to CLR profiler problem seems to be with promotions.


Hrm, nothing that needs finalization or disposal, so I wouldn't consider
anything complicated here. By the sounds of it your objects are living long
enough to survive to Generation 2, which could be a problem as the program
runs for a while.
How long does the processing take on these strings? And are there alot of
duplicated strings?
It'd be hard to be sure about the object creatino rate, and i doubt thats
the case. I've seen benchmarks of millions of allocations and

deallocations
in a minutes time, I don't think object restrictions are going to help.

Nor
will setting temp variables to null unless you are in a particular
circumstance.

In situations like(this ignores string interning):

{
string s = "a very large string indeed";
DoSomething(s);
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string" //the first instance of s can be collected
here
DoSomethingElse(s);
}

whereas
{
string s = "a very large string indeed";
DoSomething(s);
s = null; //the first instance of s can be collected here
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string"
DoSomethingElse(s);

}

but I expect that situation to be rather rare and unless the strings are
truely huge(several megs at the least) I wouldn't bother with it.


Some of strings were collected more efficiently when I used second
variant.
If s=null; is absent, strings are shown as floating around in heap -
relocated and live. It happens not always, but happens a lot. Especially
in
loops, and, it seems, in recursive calls. But strange, that you think it
is
only for big strings. My heap is full of small ones - 0.1-10K.
I've seen also lots of chunks from String.Split.

Without knowing the specifics, I do have some thoughts. Is your design such
that you create a string, do some work that allocates a good many
objects(enough to trigger a gen1 collectino), then create another into the
same variable? If so its possible you are prolonging the life of your
strings into a higher generation, nulling would fix that. If this algorithm
is highly recursive, it might actually be a serious source of memory
problems.
I wonder if there is real difference for GC between

return <string expression>

and

string str=<string expression>;
return str;


There shouldn't be. The JIT would probably generate similar or identical
code.
Jul 21 '05 #5

P: n/a
Daniel, thanks

Looks like you confirm some of my suspicions.
How long does the processing take on these strings? And are there alot of
duplicated strings?
In terms of absolute time - less than 1 second. In terms how many objects
could be created during this period - hundreds if not thousands. Also,
strings could be created in one thread and passed to another before becoming
obsolete.

I've seen also lots of chunks from String.Split.

Without knowing the specifics, I do have some thoughts. Is your design

such that you create a string, do some work that allocates a good many
objects(enough to trigger a gen1 collectino), then create another into the
same variable? If so its possible you are prolonging the life of your
strings into a higher generation, nulling would fix that. If this algorithm is highly recursive, it might actually be a serious source of memory
problems.
Lots of code with such behavior. Because I have thousands of objects and
calls, clr profiler literally chokes. Standard profile log is 50-100MB,
which kills it usually. Exceptions, hanging, not enough memory - I've seen
it all :-(

I wonder if there is real difference for GC between

return <string expression>

and

string str=<string expression>;
return str;


There shouldn't be. The JIT would probably generate similar or identical
code.


Some small consolation :-)

Thanks, Daniel.

I am trying now to think out some way to clean up this mess.

Jul 21 '05 #6

P: n/a
> I've seen also lots of chunks from String.Split. Without knowing the specifics, I do have some thoughts. Is your design

such
that you create a string, do some work that allocates a good many
objects(enough to trigger a gen1 collectino), then create another into
the
same variable? If so its possible you are prolonging the life of your
strings into a higher generation, nulling would fix that. If this

algorithm
is highly recursive, it might actually be a serious source of memory
problems.


Lots of code with such behavior. Because I have thousands of objects and
calls, clr profiler literally chokes. Standard profile log is 50-100MB,
which kills it usually. Exceptions, hanging, not enough memory - I've seen
it all :-(


Hrmm, this isn't good. I am trying now to think out some way to clean up this mess.


From what I understand, I'm afraid the best course may be to redesign your
app. I think the problem is inherent to the design. You either need to
serialize processing so objects disappear quickly or change the object
allocation code so that the allocations occur just before calcuations and
disappear right after. Mutlithreading may be a big part of this.

Are alot of your strings identical?
Jul 21 '05 #7

P: n/a

"Daniel O'Connell [C# MVP]" <onyxkirx@--NOSPAM--comcast.net> wrote in
message news:%2****************@tk2msftngp13.phx.gbl...
> I've seen also lots of chunks from String.Split.
Without knowing the specifics, I do have some thoughts. Is your design such
that you create a string, do some work that allocates a good many
objects(enough to trigger a gen1 collectino), then create another into
the
same variable? If so its possible you are prolonging the life of your
strings into a higher generation, nulling would fix that. If this

algorithm
is highly recursive, it might actually be a serious source of memory
problems.


Lots of code with such behavior. Because I have thousands of objects and
calls, clr profiler literally chokes. Standard profile log is 50-100MB,
which kills it usually. Exceptions, hanging, not enough memory - I've seen it all :-(


Hrmm, this isn't good.
I am trying now to think out some way to clean up this mess.


From what I understand, I'm afraid the best course may be to redesign your
app. I think the problem is inherent to the design. You either need to
serialize processing so objects disappear quickly or change the object
allocation code so that the allocations occur just before calcuations and
disappear right after. Mutlithreading may be a big part of this.

Are alot of your strings identical?


If you add same string to 2 different collections - are the collection items
identical? I think not.

Could you expand a bit on serializing processing to make objects disappear
quickly? I am not sure I see how it could be done when collections are
filled by recursion or strings are passed between threads.

Unfortunately redesign is out of question - thousands of lines, which were
developed by several people.

So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment
- if objects could exist for long time, they should be nulled explicitly
- if object implements IDispose it must be disposed before next assigment
explicitly
- when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly
- try to avoid as much as possible string.concat

Doesn't look very convincing, what do you think? Most of this I never seen
in simple applications, where objects are not highly volatile. And I see now
all of this when doing a real processing for real files - parsing, editing.
I mean - negative impact on heap.

Did I miss anything?

Thanks
Alex

Jul 21 '05 #8

P: n/a

From what I understand, I'm afraid the best course may be to redesign
your
app. I think the problem is inherent to the design. You either need to
serialize processing so objects disappear quickly or change the object
allocation code so that the allocations occur just before calcuations and
disappear right after. Mutlithreading may be a big part of this.

Are alot of your strings identical?
If you add same string to 2 different collections - are the collection
items
identical? I think not.


Genreally not, but if you tend to have large numbers of strings that are
identical, you may save memory by interning(or you may not, I forget what
happens when you intern a string that will never be referenced again).
Could you expand a bit on serializing processing to make objects disappear
quickly? I am not sure I see how it could be done when collections are
filled by recursion or strings are passed between threads. It really isn't, it was a suggestion for a potential redesign.
Unfortunately redesign is out of question - thousands of lines, which were
developed by several people.
Thats unfortunate...I hope you can figure out a way to reduce memory usage.
I would rarely recommend this, but perhaps you should insert a GC.Collect(2)
call on a timer that goes every half hour and see if it clears gen 2 for
you. It is a hack but if the problem is over promotion, it just might work.
So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment
- if objects could exist for long time, they should be nulled explicitly
No, if variables could exist for a long time, they should be nulled. Its not
possible to null an object, ;).
- if object implements IDispose it must be disposed before next assigment
explicitly Yes. - when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly
Neither of these are true. Nulling a variable won't change anything. - try to avoid as much as possible string.concat Maybe, maybe not. String.Concat can be efficent if you are dealing with 2 or
3 strings, but if you are doing more than that definatly go with
StringBuilder.

Doesn't look very convincing, what do you think? Most of this I never seen
in simple applications, where objects are not highly volatile. And I see
now
all of this when doing a real processing for real files - parsing,
editing.
I mean - negative impact on heap.

Did I miss anything?

Thanks
Alex


Jul 21 '05 #9

P: n/a
Hi Alex
So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap
True. If your object survives a generation 0 collection, then dies, you've got "mid-life crisis", and that object (in your case a string) will be in memory much longer than you need
it.
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment
By changing the reference from the first string, you've abandoned it in memory and the GC will take care of it. So nulling won't help, but won't hurt either.
- if objects could exist for long time, they should be nulled explicitly
Only if they are member variables, and the container object is still alive.
- if object implements IDispose it must be disposed before next assigment
explicitly
That's generally good practice. Consider using the C# using pattern where appropriate.
- when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly
Nulling them won't help them get collected any faster unless they are members.
- try to avoid as much as possible string.concat
Um, maybe. Stringbuilder is your friend if you are creating many large strings.
If you haven't already, check out
Rico Mariani's blog (http://weblogs.asp.net/ricom/)
Brad Abram's blog (http://weblogs.asp.net/brada/archive...24/140645.aspx)
Improving .NET Application Performance and Scalability Chapter 5 (http://msdn.microsoft.com/library/de...etchapt05.asp).
Hope that helps
-Chris

--------------------From: "AlexS" <sa***********@SPAMsympaticoPLEASE.ca>
References: <O$**************@tk2msftngp13.phx.gbl> <#e*************@TK2MSFTNGP11.phx.gbl> <#1**************@TK2MSFTNGP12.phx.gbl> <et**************@TK2MSFTNGP09.phx.gbl> <ez**************@TK2MSFTNGP10.phx.gbl> <#a**************@tk2msftngp13.phx.gbl>Subject: Re: GC with lots of small ones
Date: Wed, 26 May 2004 13:16:02 -0400
Lines: 72
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1409
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409
Message-ID: <e7**************@TK2MSFTNGP10.phx.gbl>
Newsgroups: microsoft.public.dotnet.general
NNTP-Posting-Host: toronto-hse-ppp3855754.sympatico.ca 67.70.1.127
Path: cpmsftngxa10.phx.gbl!TK2MSFTNGXA01.phx.gbl!TK2MSFT NGP08.phx.gbl!TK2MSFTNGP10.phx.gbl
Xref: cpmsftngxa10.phx.gbl microsoft.public.dotnet.general:135301
X-Tomcat-NG: microsoft.public.dotnet.general
"Daniel O'Connell [C# MVP]" <onyxkirx@--NOSPAM--comcast.net> wrote in
message news:%2****************@tk2msftngp13.phx.gbl...
>> > I've seen also lots of chunks from String.Split.
>> Without knowing the specifics, I do have some thoughts. Is your design
> such
>> that you create a string, do some work that allocates a good many
>> objects(enough to trigger a gen1 collectino), then create another into
>> the
>> same variable? If so its possible you are prolonging the life of your
>> strings into a higher generation, nulling would fix that. If this
> algorithm
>> is highly recursive, it might actually be a serious source of memory
>> problems.
>
> Lots of code with such behavior. Because I have thousands of objects and
> calls, clr profiler literally chokes. Standard profile log is 50-100MB,
> which kills it usually. Exceptions, hanging, not enough memory - I'veseen > it all :-(
>


Hrmm, this isn't good.
> I am trying now to think out some way to clean up this mess.


From what I understand, I'm afraid the best course may be to redesign your
app. I think the problem is inherent to the design. You either need to
serialize processing so objects disappear quickly or change the object
allocation code so that the allocations occur just before calcuations and
disappear right after. Mutlithreading may be a big part of this.

Are alot of your strings identical?


If you add same string to 2 different collections - are the collection items
identical? I think not.

Could you expand a bit on serializing processing to make objects disappear
quickly? I am not sure I see how it could be done when collections are
filled by recursion or strings are passed between threads.

Unfortunately redesign is out of question - thousands of lines, which were
developed by several people.

So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment
- if objects could exist for long time, they should be nulled explicitly
- if object implements IDispose it must be disposed before next assigment
explicitly
- when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly
- try to avoid as much as possible string.concat

Doesn't look very convincing, what do you think? Most of this I never seen
in simple applications, where objects are not highly volatile. And I see now
all of this when doing a real processing for real files - parsing, editing.
I mean - negative impact on heap.

Did I miss anything?

Thanks
Alex


--

This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

Note: For the benefit of the community-at-large, all responses to this message are best directed to the newsgroup/thread from which they originated.

Jul 21 '05 #10

P: n/a
Thanks, Chris

I just want to confirm that you confirmed my findings. As most of strings
and other objects are members in long existing containers most of my points
seems to be in line with what you said. I wonder if next version of FW will
be better in this respect. Note, this is heavy duty processing - I have
hundreds of 000's of objects created and existing during app lifetime.

Now I have much cleaner picture and better behaving app. I am sad only
because now I have to find out how to make clr profiler to behave in more
correct fashion.
It eats too much memory - log files exceeding 100MB
It crashes on exceptions or says not enough disk space - while there is
plenty - to display some graph all too frequently
It doesn't allow to filter out namespaces, objects and calls

Anyway, I managed to make the app more less eating heap. That's already a
progress.

Rgds
Alex
""Chris Lyon [MSFT]"" <cl***@online.microsoft.com> wrote in message
news:Nl****************@cpmsftngxa10.phx.gbl...
Hi Alex
So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap
True. If your object survives a generation 0 collection, then dies,

you've got "mid-life crisis", and that object (in your case a string) will
be in memory much longer than you need it.
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment
By changing the reference from the first string, you've abandoned it in

memory and the GC will take care of it. So nulling won't help, but won't
hurt either.
- if objects could exist for long time, they should be nulled explicitly
Only if they are member variables, and the container object is still

alive.
- if object implements IDispose it must be disposed before next assigment
explicitly
That's generally good practice. Consider using the C# using pattern where

appropriate.
- when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly
Nulling them won't help them get collected any faster unless they are

members.


PS:
By the way, I found stringbuilder helps with small ones too. What is big - 8
chars or 80?
Jul 21 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.