By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,857 Members | 1,768 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,857 IT Pros & Developers. It's quick & easy.

How does "new" work in a loop?

P: n/a
I'm just learning C#. I'm writing a program (using Visual C# 2005 on
WinXP) to combine several files into one (HKSplit is a popular
freeware program that does this, but it requires all input and output
to be within one directory, and I want to be able to combine files
from different directories into another directory of my choice).

My program seems to work fine, but I'm wondering about this loop:
for (int i = 0; i < numFiles; i++)
{
// read next input file

FileStream fs = new FileStream(fileNames[i],
FileMode.Open, FileAccess.Read, FileShare.Read);
Byte[] inputBuffer = new Byte[fs.Length];

fs.Read(inputBuffer, 0, (int)fs.Length);
fs.Close();

//append to output stream previously opened as fsOut

fsOut.Write(inputBuffer, 0, (int) inputBuffer.Length);
progBar.Value++;
} // for int i

As you can see, the objects fs and inputBuffer are both created as
"new" each time through the loop, which could be many times. I didn't
think this would work; I just tried it to see what kind of error
message I would get, and I was surprised when it ran. Every test run
has produced perfect results.

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?

I can see that fs might go away after fs.Close(), but I don't
understand why I'm allowed to recreate the byte array over and over,
without ever disposing of it. I have verifed with the debugger that
the array has a different size each time the input file size changes,
so it really is being reallocated each time through the loop, rather
than just being reused. I've tried to find explanations of how "new"
works in a loop, but I haven't been able to so far. Any help,
including pointers to the VS docs or a popular book on C#, would be
appreciated.
Jul 6 '06 #1
Share this Question
Share on Google+
51 Replies


P: n/a
Well, "new" always create a new instance of the object. The object was
assigned to the variable during the previous cycle will be flagged so the
garbage collector can release memory. Now, since the GC does not clear
memory immediately, there is a period of time when the old object stays in
memory so there is the change that you can use too much memory before the GC
is invoked, but not a great chance. It is not something that I would worry
about unless you plan to join files where each part is a gig in size. :)


"Tony Sinclair" <no@spam.comwrote in message
news:jv********************************@4ax.com...
I'm just learning C#. I'm writing a program (using Visual C# 2005 on
WinXP) to combine several files into one (HKSplit is a popular
freeware program that does this, but it requires all input and output
to be within one directory, and I want to be able to combine files
from different directories into another directory of my choice).

My program seems to work fine, but I'm wondering about this loop:
for (int i = 0; i < numFiles; i++)
{
// read next input file

FileStream fs = new FileStream(fileNames[i],
FileMode.Open, FileAccess.Read, FileShare.Read);
Byte[] inputBuffer = new Byte[fs.Length];

fs.Read(inputBuffer, 0, (int)fs.Length);
fs.Close();

//append to output stream previously opened as fsOut

fsOut.Write(inputBuffer, 0, (int) inputBuffer.Length);
progBar.Value++;
} // for int i

As you can see, the objects fs and inputBuffer are both created as
"new" each time through the loop, which could be many times. I didn't
think this would work; I just tried it to see what kind of error
message I would get, and I was surprised when it ran. Every test run
has produced perfect results.

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?

I can see that fs might go away after fs.Close(), but I don't
understand why I'm allowed to recreate the byte array over and over,
without ever disposing of it. I have verifed with the debugger that
the array has a different size each time the input file size changes,
so it really is being reallocated each time through the loop, rather
than just being reused. I've tried to find explanations of how "new"
works in a loop, but I haven't been able to so far. Any help,
including pointers to the VS docs or a popular book on C#, would be
appreciated.

Jul 6 '06 #2

P: n/a
The memory is allocated during each loop and in concept released when the
loop ends. In reality the memory is marked to be released by GC (garbage
collection) as some future point in time. I have found that if you were to
do this loop only a few times and the program was idle the memory would be
freed but if this loop involves a lot of files and very little idle time is
returned to the system you will run out of memory.

That being said I would at the very least place fs=null and inputbuffer =
null at the end of the loop.

A better solution for fs would be the using statment which would force GC
and return the memory.

using(FileStream fs = new FileStream(fileNames[i],FileMode.Open,
FileAccess.Read, FileShare.Read))
{
Byte[] inputBuffer = new Byte[fs.Length];

/// do something with fs

inputBuffer = null;
}

Regards,
John

"Tony Sinclair" <no@spam.comwrote in message
news:jv********************************@4ax.com...
I'm just learning C#. I'm writing a program (using Visual C# 2005 on
WinXP) to combine several files into one (HKSplit is a popular
freeware program that does this, but it requires all input and output
to be within one directory, and I want to be able to combine files
from different directories into another directory of my choice).

My program seems to work fine, but I'm wondering about this loop:
for (int i = 0; i < numFiles; i++)
{
// read next input file

FileStream fs = new FileStream(fileNames[i],
FileMode.Open, FileAccess.Read, FileShare.Read);
Byte[] inputBuffer = new Byte[fs.Length];

fs.Read(inputBuffer, 0, (int)fs.Length);
fs.Close();

//append to output stream previously opened as fsOut

fsOut.Write(inputBuffer, 0, (int) inputBuffer.Length);
progBar.Value++;
} // for int i

As you can see, the objects fs and inputBuffer are both created as
"new" each time through the loop, which could be many times. I didn't
think this would work; I just tried it to see what kind of error
message I would get, and I was surprised when it ran. Every test run
has produced perfect results.

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?

I can see that fs might go away after fs.Close(), but I don't
understand why I'm allowed to recreate the byte array over and over,
without ever disposing of it. I have verifed with the debugger that
the array has a different size each time the input file size changes,
so it really is being reallocated each time through the loop, rather
than just being reused. I've tried to find explanations of how "new"
works in a loop, but I haven't been able to so far. Any help,
including pointers to the VS docs or a popular book on C#, would be
appreciated.

Jul 6 '06 #3

P: n/a

Tony Sinclair wrote:
I'm just learning C#. I'm writing a program (using Visual C# 2005 on
WinXP) to combine several files into one (HKSplit is a popular
freeware program that does this, but it requires all input and output
to be within one directory, and I want to be able to combine files
from different directories into another directory of my choice).

My program seems to work fine, but I'm wondering about this loop:
for (int i = 0; i < numFiles; i++)
{
// read next input file

FileStream fs = new FileStream(fileNames[i],
FileMode.Open, FileAccess.Read, FileShare.Read);
Byte[] inputBuffer = new Byte[fs.Length];

fs.Read(inputBuffer, 0, (int)fs.Length);
fs.Close();

//append to output stream previously opened as fsOut

fsOut.Write(inputBuffer, 0, (int) inputBuffer.Length);
progBar.Value++;
} // for int i

As you can see, the objects fs and inputBuffer are both created as
"new" each time through the loop, which could be many times. I didn't
think this would work; I just tried it to see what kind of error
message I would get, and I was surprised when it ran. Every test run
has produced perfect results.

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?
Unlikely that you are creating a memory leak. C# uses garbage
collection.
When the object goes out of scope (in your case, the } marked // for
int i)
the object is destroyed. The next time through the loop, a new one is
created.

Matt

Jul 6 '06 #4

P: n/a
John J. Hughes II <no@invalid.comwrote:
The memory is allocated during each loop and in concept released when the
loop ends. In reality the memory is marked to be released by GC (garbage
collection) as some future point in time. I have found that if you were to
do this loop only a few times and the program was idle the memory would be
freed but if this loop involves a lot of files and very little idle time is
returned to the system you will run out of memory.

That being said I would at the very least place fs=null and inputbuffer =
null at the end of the loop.
Why? It serves no purpose - and code which serves no purpose is just
distracting, IMO.
A better solution for fs would be the using statment which would force GC
and return the memory.
It wouldn't return the memory - but it *would* close/dispose the stream
in all situations, whether or not there's an exception.
Closing/disposing the stream doesn't return any memory, but it releases
the handle on the file.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 6 '06 #5

P: n/a
Matt <ma********@sprynet.comwrote:

<snip>
Unlikely that you are creating a memory leak. C# uses garbage
collection. When the object goes out of scope (in your case, the }
marked // for int i) the object is destroyed. The next time through
the loop, a new one is created.
The object is *not* destroyed when it reaches the end of the scope.
..NET does not have deterministic garbage collection. Instead, the
object's memory will be released *at some point* after it is last used.

In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 6 '06 #6

P: n/a
Peter Rilling <pe***@nospam.rilling.netwrote:
Well, "new" always create a new instance of the object. The object was
assigned to the variable during the previous cycle will be flagged so the
garbage collector can release memory.
Not quite - the "old" object isn't marked in any way (which would
basically be like reference counting). Instead, every time the GC kicks
in, all the "live" objects in the system are marked, and after that's
finished anything which *isn't* marked can be destroyed (or finalized,
if it has a finalizer).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 6 '06 #7

P: n/a
"Matt" <ma********@sprynet.comwrote:

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?

Unlikely that you are creating a memory leak. C# uses garbage
collection. When the object goes out of scope (in your case, the }
marked // for int i) the object is destroyed. The next time through
the loop, a new one is created.
It isn't freed immediately on every loop, or when it goes out of scope,
unless it's a value type (struct in C#). The GC kicks in on an as-needed
basis: memory pressure, repeated allocations, etc. That's one thing, and
is easy.

The other half of the story is resources. The GC isn't a resource
manager, so you shouldn't rely on it to manage things like file handles,
since it won't release them in a timely manner. That's why it's
important to use "using" with FileStream and other classes which
implement IDispose.

-- Barry

--
http://barrkel.blogspot.com/
Jul 6 '06 #8

P: n/a
Barry Kelly <ba***********@gmail.comwrote:
important to use "using" with FileStream and other classes which
implement IDispose.
IDisposable rather

-- Barry

--
http://barrkel.blogspot.com/
Jul 6 '06 #9

P: n/a
Jon Skeet [C# MVP] wrote:
John J. Hughes II <no@invalid.comwrote:
>The memory is allocated during each loop and in concept released when the
loop ends. In reality the memory is marked to be released by GC (garbage
collection) as some future point in time. I have found that if you were to
do this loop only a few times and the program was idle the memory would be
freed but if this loop involves a lot of files and very little idle time is
returned to the system you will run out of memory.

That being said I would at the very least place fs=null and inputbuffer =
null at the end of the loop.

Why? It serves no purpose - and code which serves no purpose is just
distracting, IMO.
Well, it actually serves one purpose, at least occationally.

If the first generation of the heap is full when the buffer is going to
be allocated, a garbage collection kicks in to free some memory. If you
have removed the reference to the previous buffer, it can be collected,
otherwise not.

For that purpose, you can just as well set the reference to null before
you create the new buffer:

inputBuffer = null;
inputBuffer = new Byte[fs.Length];
I wouldn't allocate a new buffer for every file, though. I would use a
reasonably sized buffer to read chunks from the files:

while ((len = fs.Read(inputBuffer, 0, 4096)) 0) {
fsOut.Write(inputBuffer, 0, len);
}
fs.Close();
>A better solution for fs would be the using statment which would force GC
and return the memory.

It wouldn't return the memory - but it *would* close/dispose the stream
in all situations, whether or not there's an exception.
Closing/disposing the stream doesn't return any memory, but it releases
the handle on the file.
Jul 7 '06 #10

P: n/a
Jon Skeet [C# MVP] wrote:
Matt <ma********@sprynet.comwrote:

<snip>
>Unlikely that you are creating a memory leak. C# uses garbage
collection. When the object goes out of scope (in your case, the }
marked // for int i) the object is destroyed. The next time through
the loop, a new one is created.

The object is *not* destroyed when it reaches the end of the scope.
.NET does not have deterministic garbage collection. Instead, the
object's memory will be released *at some point* after it is last used.

In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.
But at that point there are still references to those objects.

Actually, the buffer won't be collectable until after the next buffer
has been created in the next iteration of the loop, when the reference
is replaced by the reference to the new buffer.
Jul 7 '06 #11

P: n/a
Göran Andersson <gu***@guffa.comwrote:
In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.

But at that point there are still references to those objects.
Not if the variables are in registers and have been overwritten. The JIT
compiler can detect the variable's lifetime, it doesn't necessarily last
out the whole lexical scope.
Actually, the buffer won't be collectable until after the next buffer
has been created in the next iteration of the loop, when the reference
is replaced by the reference to the new buffer.
Ditto.

-- Barry

--
http://barrkel.blogspot.com/
Jul 7 '06 #12

P: n/a
Barry Kelly wrote:
Göran Andersson <gu***@guffa.comwrote:
>>In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.
But at that point there are still references to those objects.

Not if the variables are in registers and have been overwritten. The JIT
compiler can detect the variable's lifetime, it doesn't necessarily last
out the whole lexical scope.
So you mean that the scope of a variable only last one single iteration
of the loop?
>Actually, the buffer won't be collectable until after the next buffer
has been created in the next iteration of the loop, when the reference
is replaced by the reference to the new buffer.

Ditto.

-- Barry
Jul 7 '06 #13

P: n/a
Göran Andersson wrote:
The object is *not* destroyed when it reaches the end of the scope.
.NET does not have deterministic garbage collection. Instead, the
object's memory will be released *at some point* after it is last used.

In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.

But at that point there are still references to those objects.

Actually, the buffer won't be collectable until after the next buffer
has been created in the next iteration of the loop, when the reference
is replaced by the reference to the new buffer.
Nope. In release mode, the JIT is smart enough to work out when a
variable can no longer be read, and will not count that variable as a
live root.

Jon

Jul 7 '06 #14

P: n/a
Göran Andersson <gu***@guffa.comwrote:
Barry Kelly wrote:
Göran Andersson <gu***@guffa.comwrote:
>In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.
But at that point there are still references to those objects.
Not if the variables are in registers and have been overwritten. The JIT
compiler can detect the variable's lifetime, it doesn't necessarily last
out the whole lexical scope.

So you mean that the scope of a variable only last one single iteration
of the loop?
I mean that the "FileStream fs" in the OP may be GCd before
progBar.Value++, like Jon says. The variable's lifetime may be smaller
than its scope. Scope is a lexical concept that exists only at compile
time.

-- Barry

--
http://barrkel.blogspot.com/
Jul 7 '06 #15

P: n/a
Göran Andersson wrote:
Why? It serves no purpose - and code which serves no purpose is just
distracting, IMO.

Well, it actually serves one purpose, at least occationally.
Not in release mode - see the other replies.

Jon

Jul 7 '06 #16

P: n/a
Well Jon you can site what is supposed to happen but I have to deal with
what really happens. I write services that run constantly and in some
cases don't return much idle time back to the system for days. I have
found that <var>=null on non-disposable values and using(<statement>) allows
my program to maintain an even memory allocation and stops the memory creep.
I will grant you in my code I am probably using them to excess but having my
customers tell me of memory errors after running my program for X+/- days
depending on load can be really hard to track down, this stopped after
adding the the set to null statements and using statements.

Note in forms applications I normally don't use them as much being as the
system is normally idle.

As you say "IMO" ;>

Regards,
John

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
John J. Hughes II <no@invalid.comwrote:
>The memory is allocated during each loop and in concept released when the
loop ends. In reality the memory is marked to be released by GC
(garbage
collection) as some future point in time. I have found that if you were
to
do this loop only a few times and the program was idle the memory would
be
freed but if this loop involves a lot of files and very little idle time
is
returned to the system you will run out of memory.

That being said I would at the very least place fs=null and inputbuffer =
null at the end of the loop.

Why? It serves no purpose - and code which serves no purpose is just
distracting, IMO.
>A better solution for fs would be the using statment which would force GC
and return the memory.

It wouldn't return the memory - but it *would* close/dispose the stream
in all situations, whether or not there's an exception.
Closing/disposing the stream doesn't return any memory, but it releases
the handle on the file.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jul 7 '06 #17

P: n/a

Barry Kelly wrote:
"Matt" <ma********@sprynet.comwrote:

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?
Unlikely that you are creating a memory leak. C# uses garbage
collection. When the object goes out of scope (in your case, the }
marked // for int i) the object is destroyed. The next time through
the loop, a new one is created.

It isn't freed immediately on every loop, or when it goes out of scope,
unless it's a value type (struct in C#). The GC kicks in on an as-needed
basis: memory pressure, repeated allocations, etc. That's one thing, and
is easy.
Yes, I know this. The OP was a C++ programmer, I was giving it to him
in C++ context. GC is deterministic, it will kick in when it makes
sense
to kick in.
>
The other half of the story is resources. The GC isn't a resource
manager, so you shouldn't rely on it to manage things like file handles,
since it won't release them in a timely manner. That's why it's
important to use "using" with FileStream and other classes which
implement IDispose.
True and a good point. I was just explaining why it wasn't a memory
leak,
but your explanation is better for this purpose.

Thanks
Matt

Jul 7 '06 #18

P: n/a
Jon Skeet [C# MVP] wrote:
Göran Andersson wrote:
>>The object is *not* destroyed when it reaches the end of the scope.
.NET does not have deterministic garbage collection. Instead, the
object's memory will be released *at some point* after it is last used.

In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.
But at that point there are still references to those objects.

Actually, the buffer won't be collectable until after the next buffer
has been created in the next iteration of the loop, when the reference
is replaced by the reference to the new buffer.

Nope. In release mode, the JIT is smart enough to work out when a
variable can no longer be read, and will not count that variable as a
live root.

Jon
Does that mean that the reference is removed from the variable?
Otherwise the garbage collector will still see the reference and can't
collect the object.
Jul 7 '06 #19

P: n/a
Barry Kelly wrote:
Göran Andersson <gu***@guffa.comwrote:
>Barry Kelly wrote:
>>Göran Andersson <gu***@guffa.comwrote:

In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.
But at that point there are still references to those objects.
Not if the variables are in registers and have been overwritten. The JIT
compiler can detect the variable's lifetime, it doesn't necessarily last
out the whole lexical scope.
So you mean that the scope of a variable only last one single iteration
of the loop?

I mean that the "FileStream fs" in the OP may be GCd before
progBar.Value++, like Jon says. The variable's lifetime may be smaller
than its scope. Scope is a lexical concept that exists only at compile
time.

-- Barry
Does that mean that the compiler adds code to remove the reference from
the fs variable? As long as the reference is there, the garbage
collector won't collect the object.
Jul 7 '06 #20

P: n/a
Göran Andersson <gu***@guffa.comwrote:
Barry Kelly wrote:
I mean that the "FileStream fs" in the OP may be GCd before
progBar.Value++, like Jon says. The variable's lifetime may be smaller
than its scope. Scope is a lexical concept that exists only at compile
time.

Does that mean that the compiler adds code to remove the reference from
the fs variable? As long as the reference is there, the garbage
collector won't collect the object.
If the 'fs' variable is enregistered, or its location on the stack is
reused for another variable in the interest of reducing stack
consumption, then it may be overwritten and thus won't be visible to the
GC any more.

The JIT doesn't maintain the lifetime of a variable for its entire
lexical scope, except maybe if you've compiled to debug and are running
under the debugger.

You'd be surprised by what the GC will collect. I know I was. I've been
investigating a bug since yesterday evening that was most enlightening,
with respect to this behaviour. It can even collect objects referred to
by the object whose instance method is currently on the stack, under the
right circumstances.

-- Barry

--
http://barrkel.blogspot.com/
Jul 7 '06 #21

P: n/a


"Barry Kelly" <ba***********@gmail.comwrote in message
news:qt********************************@4ax.com...
Göran Andersson <gu***@guffa.comwrote:
Barry Kelly wrote:
I mean that the "FileStream fs" in the OP may be GCd before
progBar.Value++, like Jon says. The variable's lifetime may be smaller
than its scope. Scope is a lexical concept that exists only at compile
time.
Does that mean that the compiler adds code to remove the reference from
the fs variable? As long as the reference is there, the garbage
collector won't collect the object.

If the 'fs' variable is enregistered, or its location on the stack is
reused for another variable in the interest of reducing stack
consumption, then it may be overwritten and thus won't be visible to the
GC any more.

The JIT doesn't maintain the lifetime of a variable for its entire
lexical scope, except maybe if you've compiled to debug and are running
under the debugger.

You'd be surprised by what the GC will collect. I know I was. I've been
investigating a bug since yesterday evening that was most enlightening,
with respect to this behaviour. It can even collect objects referred to
by the object whose instance method is currently on the stack, under the
right circumstances.

-- Barry
The issue here is that when the GC finds an object to collect, it must
follow all the links from that object and collect those first. If it hits a
reference loop, it stops at the object that refers to the start of the
collection link and works backwards.

Mike Ober.

Jul 7 '06 #22

P: n/a
Göran Andersson <gu***@guffa.comwrote:
Nope. In release mode, the JIT is smart enough to work out when a
variable can no longer be read, and will not count that variable as a
live root.
Does that mean that the reference is removed from the variable?
Otherwise the garbage collector will still see the reference and can't
collect the object.
No, it just means that the variable's value isn't considered when the
GC works out which references are "live".

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 7 '06 #23

P: n/a
Michael D. Ober <obermd.@.alum.mit.edu.nospamwrote:
The issue here is that when the GC finds an object to collect, it must
follow all the links from that object and collect those first. If it hits a
reference loop, it stops at the object that refers to the start of the
collection link and works backwards.
Firstly, I don't see how that's relevant to this situation.

Secondly, it's just not true. There is nothing to say that a "parent"
object can only be collected after its "children". For one thing, there
can be a cyclic reference, in which case both can be collected in
either order.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 7 '06 #24

P: n/a
John J. Hughes II <no@invalid.comwrote:
Well Jon you can site what is supposed to happen but I have to deal with
what really happens. I write services that run constantly and in some
cases don't return much idle time back to the system for days. I have
found that <var>=null on non-disposable values and using(<statement>) allows
my program to maintain an even memory allocation and stops the memory creep.
Obviously you should use using statements - but that *doesn't* reclaim
any memory.

However, setting things to null when they're about to go out of scope
is *not* helping you. Really, it's not. If you believe it is, please
write a short but complete program that demonstrates it in release
mode. I can easily write a short but complete program which
demonstrates objects being garbage collected before variables referring
to them reach the end of their lexical scope.
I will grant you in my code I am probably using them to excess but having my
customers tell me of memory errors after running my program for X+/- days
depending on load can be really hard to track down, this stopped after
adding the the set to null statements and using statements.
Using statements may well have made a difference to that (particularly
with classes with finalizers) but I'm afraid I just don't believe that
setting variables to null when they're about to go out of scope does
any good - and it clutters up the code.
Note in forms applications I normally don't use them as much being as the
system is normally idle.

As you say "IMO" ;>
Well, I've got good evidence that things can be garbage collected
without local variables being set to null before they go out of scope -
do you have any evidence that *just* setting things to null helps?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 7 '06 #25

P: n/a
On Thu, 06 Jul 2006 13:37:47 -0700, Tony Sinclair <no@spam.comwrote:
>Any help,
including pointers to the VS docs or a popular book on C#, would be
appreciated.
My sincere gratitude to everyone who responded. I was unaware of both
the "using" statement's use in this context, and the difference in GC
behaviour between debug and release versions. I also found Mr.
Skeet's essay on software sins (on his blog) very interesting.

Thank you,
Tony
Jul 7 '06 #26

P: n/a
Matt <ma********@sprynet.comwrote:
Unlikely that you are creating a memory leak. C# uses garbage
collection. When the object goes out of scope (in your case, the }
marked // for int i) the object is destroyed. The next time through
the loop, a new one is created.
It isn't freed immediately on every loop, or when it goes out of scope,
unless it's a value type (struct in C#). The GC kicks in on an as-needed
basis: memory pressure, repeated allocations, etc. That's one thing, and
is easy.

Yes, I know this. The OP was a C++ programmer, I was giving it to him
in C++ context. GC is deterministic, it will kick in when it makes
sense to kick in.
But that's the point - it's *different* to C++, so saying what would
happen in a C++ context is misleading. (And the GC is
non-deterministic, not deterministic. I suspect that's what you meant
to say.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 7 '06 #27

P: n/a
I reckon that if the thing that you are creating is a GDI object like a font,
the system can run out of resources before it gets around to GCing them.

Everything is not as automagic as sometimes assumed.

Tony Sinclair wrote:
I'm just learning C#. I'm writing a program (using Visual C# 2005 on
WinXP) to combine several files into one (HKSplit is a popular
freeware program that does this, but it requires all input and output
to be within one directory, and I want to be able to combine files
from different directories into another directory of my choice).

My program seems to work fine, but I'm wondering about this loop:
for (int i = 0; i < numFiles; i++)
{
// read next input file

FileStream fs = new FileStream(fileNames[i],
FileMode.Open, FileAccess.Read, FileShare.Read);
Byte[] inputBuffer = new Byte[fs.Length];

fs.Read(inputBuffer, 0, (int)fs.Length);
fs.Close();

//append to output stream previously opened as fsOut

fsOut.Write(inputBuffer, 0, (int) inputBuffer.Length);
progBar.Value++;
} // for int i

As you can see, the objects fs and inputBuffer are both created as
"new" each time through the loop, which could be many times. I didn't
think this would work; I just tried it to see what kind of error
message I would get, and I was surprised when it ran. Every test run
has produced perfect results.

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?

I can see that fs might go away after fs.Close(), but I don't
understand why I'm allowed to recreate the byte array over and over,
without ever disposing of it. I have verifed with the debugger that
the array has a different size each time the input file size changes,
so it really is being reallocated each time through the loop, rather
than just being reused. I've tried to find explanations of how "new"
works in a loop, but I haven't been able to so far. Any help,
including pointers to the VS docs or a popular book on C#, would be
appreciated.
Jul 7 '06 #28

P: n/a

Jon wrote:
Matt <ma********@sprynet.comwrote:
Unlikely that you are creating a memory leak. C# uses garbage
collection. When the object goes out of scope (in your case, the }
marked // for int i) the object is destroyed. The next time through
the loop, a new one is created.
>
It isn't freed immediately on every loop, or when it goes out of scope,
unless it's a value type (struct in C#). The GC kicks in on an as-needed
basis: memory pressure, repeated allocations, etc. That's one thing, and
is easy.
Yes, I know this. The OP was a C++ programmer, I was giving it to him
in C++ context. GC is deterministic, it will kick in when it makes
sense to kick in.

But that's the point - it's *different* to C++, so saying what would
happen in a C++ context is misleading. (And the GC is
non-deterministic, not deterministic. I suspect that's what you meant
to say.)
Yes, that's what I meant to say. Brain was on "off" this morning before
the
first cup of coffee :)

I'm not sure its misleading to help people understand things in the
language
they know best. I came from the C world, moving into C++ was a rather
ugly
experience back then (back in <mumble>) that was made easier by
thinking
of it as first a "better C" (prototypes, being able to define variables
inline, etc).
Only when I really "got" the syntax and structure of C++ could I begin
to think
that way. I think C# is the same way.

Just my $0.02, of course.
Matt

Jul 7 '06 #29

P: n/a
Ok, next time I find the time I'll do that...

By the way I don't believe that using and or setting a value to null
directly releases memory it just marks the memory as not being needed. I
have found that this is more relevant in nested instances when each class or
value tells the compiler the value is no longer needed. It seems to help
in loops where the new instance is used but does not have as big an impact.

Personally I don't think it really clutters code for a couple of reasons.
First of all setting a value to null at the end of its logical use reminds
me not to use it later, sort of note saying it not available. The second
reason is I normally use them in the dispose call which is forced by the
using statement.

So you end up with something like:

public class myClass
{
byte[] data = byte[1000];
public void dispose()
{
data = null;
}
public void dosomething()
{
}
}

public void fun()
{
using(myClass c = new myClass)
c.dosomething();
}

Regards,
John

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
John J. Hughes II <no@invalid.comwrote:
>Well Jon you can site what is supposed to happen but I have to deal with
what really happens. I write services that run constantly and in some
cases don't return much idle time back to the system for days. I have
found that <var>=null on non-disposable values and using(<statement>)
allows
my program to maintain an even memory allocation and stops the memory
creep.

Obviously you should use using statements - but that *doesn't* reclaim
any memory.

However, setting things to null when they're about to go out of scope
is *not* helping you. Really, it's not. If you believe it is, please
write a short but complete program that demonstrates it in release
mode. I can easily write a short but complete program which
demonstrates objects being garbage collected before variables referring
to them reach the end of their lexical scope.
>I will grant you in my code I am probably using them to excess but having
my
customers tell me of memory errors after running my program for X+/- days
depending on load can be really hard to track down, this stopped after
adding the the set to null statements and using statements.

Using statements may well have made a difference to that (particularly
with classes with finalizers) but I'm afraid I just don't believe that
setting variables to null when they're about to go out of scope does
any good - and it clutters up the code.
>Note in forms applications I normally don't use them as much being as the
system is normally idle.

As you say "IMO" ;>

Well, I've got good evidence that things can be garbage collected
without local variables being set to null before they go out of scope -
do you have any evidence that *just* setting things to null helps?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jul 7 '06 #30

P: n/a
Matt <ma********@sprynet.comwrote:
I'm not sure its misleading to help people understand things in the
language they know best.
It is when the truth is different.

You said that objects are cleaned up when they fall out of scope. That
is simply not true. It implies deterministic clean-up which doesn't
exist in .NET.

In particular, someone who has been told and believes that objects are
destroyed deterministically are likely to use finalizers to implement
C++-style RAII - which simply won't work in .NET.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 7 '06 #31

P: n/a
John J. Hughes II <no@invalid.comwrote:
Ok, next time I find the time I'll do that...

By the way I don't believe that using and or setting a value to null
directly releases memory it just marks the memory as not being needed.
No, it does no such thing. Memory is never marked as not being needed -
the "mark" part of "mark and sweep" is marking things which *are* still
needed. Now, if the JIT can tell that a variable is no longer
reachable, it won't use that variable as a root when considering which
objects are still in use.
I have found that this is more relevant in nested instances when each
class or value tells the compiler the value is no longer needed. It
seems to help in loops where the new instance is used but does not
have as big an impact.
That suggests you believe you have some evidence that it has an effect.
I really doubt that you have - in release mode at least. (In debug mode
it would make a difference, but that's not a good reason to add more
code in, IMO.)

Here's some code which demonstrates that the GC doesn't need anything
to be set to null in order to finalize and then free it:

using System;

class Test
{
~Test()
{
Console.WriteLine ("Finalizer called");
}

static void Main()
{
Test t = new Test();

Console.WriteLine ("Calling GC");
GC.Collect();
GC.WaitForPendingFinalizers();

Console.WriteLine ("End of method");
}
}

The results are:
Calling GC
Finalizer called
End of method

So the finalizer is being called before the end of the method - no need
for nulling the variable out. Now, I know that the finalizer being
called isn't the same thing as the object being freed, but it shows
that the GC considers it not to be needed any more.
Personally I don't think it really clutters code for a couple of reasons.
First of all setting a value to null at the end of its logical use reminds
me not to use it later, sort of note saying it not available. The second
reason is I normally use them in the dispose call which is forced by the
using statement.

So you end up with something like:

public class myClass
{
byte[] data = byte[1000];
public void dispose()
{
data = null;
}
public void dosomething()
{
}
}

public void fun()
{
using(myClass c = new myClass)
c.dosomething();
}
If your class doesn't use any unmanaged resources either directly or
indirectly, there's really very little point in implementing
IDisposable in the first place. Just let the object get collected when
the GC notices it's not used - I don't think you're doing anything to
improve garbage collection using the above, but you're forcing yourself
to remember to use the using statement when you really don't need to.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 7 '06 #32

P: n/a
I do agree the memory is not marked... poor verbiage on my part.

I don't think your example really proves anything since you are calling
garbage collection. I have no argument that when GC runs it will clean up
memory that is not being used. I personally believe that all references to
a variable are not removed in a timely fashion unless you tell them too be.
The key here is timely.

Again as I have said I had a problem with memory creep, the only change I
did was add using statements the problem slowed down but was not eliminated.
The second change was to add value=null statement (shotgun blast style) and
the problem went away. Since it was a production system I used great care
to change as little as possible so I really don't think I fixed any other
problems.

If at some point in the near future if I can give you code which proves my
point I will be happy too but the last time I had the problem it required a
system running full blown for 14 days on average.

That being said I may have gotten my head wet and decided it was raining
when it was snowing. I decide to use an umbrella and my head it not wet
now.

Regards,
John

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
John J. Hughes II <no@invalid.comwrote:
>Ok, next time I find the time I'll do that...

By the way I don't believe that using and or setting a value to null
directly releases memory it just marks the memory as not being needed.

No, it does no such thing. Memory is never marked as not being needed -
the "mark" part of "mark and sweep" is marking things which *are* still
needed. Now, if the JIT can tell that a variable is no longer
reachable, it won't use that variable as a root when considering which
objects are still in use.
>I have found that this is more relevant in nested instances when each
class or value tells the compiler the value is no longer needed. It
seems to help in loops where the new instance is used but does not
have as big an impact.

That suggests you believe you have some evidence that it has an effect.
I really doubt that you have - in release mode at least. (In debug mode
it would make a difference, but that's not a good reason to add more
code in, IMO.)

Here's some code which demonstrates that the GC doesn't need anything
to be set to null in order to finalize and then free it:

using System;

class Test
{
~Test()
{
Console.WriteLine ("Finalizer called");
}

static void Main()
{
Test t = new Test();

Console.WriteLine ("Calling GC");
GC.Collect();
GC.WaitForPendingFinalizers();

Console.WriteLine ("End of method");
}
}

The results are:
Calling GC
Finalizer called
End of method

So the finalizer is being called before the end of the method - no need
for nulling the variable out. Now, I know that the finalizer being
called isn't the same thing as the object being freed, but it shows
that the GC considers it not to be needed any more.
>Personally I don't think it really clutters code for a couple of reasons.
First of all setting a value to null at the end of its logical use
reminds
me not to use it later, sort of note saying it not available. The
second
reason is I normally use them in the dispose call which is forced by the
using statement.

So you end up with something like:

public class myClass
{
byte[] data = byte[1000];
public void dispose()
{
data = null;
}
public void dosomething()
{
}
}

public void fun()
{
using(myClass c = new myClass)
c.dosomething();
}

If your class doesn't use any unmanaged resources either directly or
indirectly, there's really very little point in implementing
IDisposable in the first place. Just let the object get collected when
the GC notices it's not used - I don't think you're doing anything to
improve garbage collection using the above, but you're forcing yourself
to remember to use the using statement when you really don't need to.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jul 7 '06 #33

P: n/a
John J. Hughes II <no@invalid.comwrote:
I do agree the memory is not marked... poor verbiage on my part.

I don't think your example really proves anything since you are calling
garbage collection.
Well, I can make an example which ends up garbage collecting due to
other activity if you want. It'll do the same thing. Just change the
call to GC.Collect() to

for (int i=0; i < 10000000; i++)
{
byte[] b = new byte[1000];
}

and you'll see the same thing.
I have no argument that when GC runs it will clean up
memory that is not being used. I personally believe that all references to
a variable are not removed in a timely fashion unless you tell them too be.
The key here is timely.
It's not a matter of the reference being removed. It's a case of the
release-mode garbage collector ignoring variables which are no longer
relevant.
Again as I have said I had a problem with memory creep, the only change I
did was add using statements the problem slowed down but was not eliminated.
And *that* can have a significant impact - because many classes which
implement IDisposable also have finalizers which are suppressed when
you call Dispose. That really *does* affect when the memory can be
freed, and can make a big difference.
The second change was to add value=null statement (shotgun blast style) and
the problem went away. Since it was a production system I used great care
to change as little as possible so I really don't think I fixed any other
problems.
I'm afraid I still don't believe you saw what you claimed to be seeing
- not on a production system. You *would* see improvements in a
debugger, but that's a different matter.
If at some point in the near future if I can give you code which proves my
point I will be happy too but the last time I had the problem it required a
system running full blown for 14 days on average.

That being said I may have gotten my head wet and decided it was raining
when it was snowing. I decide to use an umbrella and my head it not wet
now.
I really suspect you were mistaken, I'm afraid.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 7 '06 #34

P: n/a
"Michael D. Ober" <obermd.@.alum.mit.edu.nospamwrote:
The issue here is that when the GC finds an object to collect, it must
follow all the links from that object and collect those first. If it hits a
reference loop, it stops at the object that refers to the start of the
collection link and works backwards.
I think this is a different issue, but I still need to comment:

A copying garbage collector actually works differently. The CLR
collector is compacting, which is a kind of copying collector. It looks
for objects that are *alive*, and any space that's left over is garbage
that can be collected. The fact that collection time is inversely
proportional to the amount of garbage means that, with enough garbage,
GC should always outperform manual memory allocation. I'll refer you to
this for more information:

http://citeseer.ist.psu.edu/appel87garbage.html

-- Barry

--
http://barrkel.blogspot.com/
Jul 7 '06 #35

P: n/a

Jon wrote:
Matt <ma********@sprynet.comwrote:
I'm not sure its misleading to help people understand things in the
language they know best.

It is when the truth is different.

You said that objects are cleaned up when they fall out of scope. That
is simply not true. It implies deterministic clean-up which doesn't
exist in .NET.

In particular, someone who has been told and believes that objects are
destroyed deterministically are likely to use finalizers to implement
C++-style RAII - which simply won't work in .NET.
I stand corrected. Thanks for the better explanation.

Matt

Jul 7 '06 #36

P: n/a
Jon Skeet [C# MVP] wrote:
Göran Andersson <gu***@guffa.comwrote:
>>Nope. In release mode, the JIT is smart enough to work out when a
variable can no longer be read, and will not count that variable as a
live root.
Does that mean that the reference is removed from the variable?
Otherwise the garbage collector will still see the reference and can't
collect the object.

No, it just means that the variable's value isn't considered when the
GC works out which references are "live".
How would the GC know the scope of the variable, when the scope is
something that only the compiler is aware of?
Jul 7 '06 #37

P: n/a
That's why the Font class implements IDisposable. When you call Dispose
it will free the GDI resource, so that it doesn't matter when the object
is garbage collected.

Ian Semmel wrote:
I reckon that if the thing that you are creating is a GDI object like a
font, the system can run out of resources before it gets around to GCing
them.

Everything is not as automagic as sometimes assumed.

Tony Sinclair wrote:
>I'm just learning C#. I'm writing a program (using Visual C# 2005 on
WinXP) to combine several files into one (HKSplit is a popular
freeware program that does this, but it requires all input and output
to be within one directory, and I want to be able to combine files
from different directories into another directory of my choice).

My program seems to work fine, but I'm wondering about this loop:
for (int i = 0; i < numFiles; i++)
{
// read next input file

FileStream fs = new FileStream(fileNames[i],
FileMode.Open, FileAccess.Read, FileShare.Read);
Byte[] inputBuffer = new Byte[fs.Length];

fs.Read(inputBuffer, 0, (int)fs.Length);
fs.Close();

//append to output stream previously opened as fsOut

fsOut.Write(inputBuffer, 0, (int) inputBuffer.Length);
progBar.Value++;
} // for int i

As you can see, the objects fs and inputBuffer are both created as
"new" each time through the loop, which could be many times. I didn't
think this would work; I just tried it to see what kind of error
message I would get, and I was surprised when it ran. Every test run
has produced perfect results.
So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?
I can see that fs might go away after fs.Close(), but I don't
understand why I'm allowed to recreate the byte array over and over,
without ever disposing of it. I have verifed with the debugger that
the array has a different size each time the input file size changes,
so it really is being reallocated each time through the loop, rather
than just being reused. I've tried to find explanations of how "new"
works in a loop, but I haven't been able to so far. Any help,
including pointers to the VS docs or a popular book on C#, would be
appreciated.
Jul 7 '06 #38

P: n/a
Barry Kelly wrote:
Göran Andersson <gu***@guffa.comwrote:
>Barry Kelly wrote:
>>I mean that the "FileStream fs" in the OP may be GCd before
progBar.Value++, like Jon says. The variable's lifetime may be smaller
than its scope. Scope is a lexical concept that exists only at compile
time.
Does that mean that the compiler adds code to remove the reference from
the fs variable? As long as the reference is there, the garbage
collector won't collect the object.

If the 'fs' variable is enregistered, or its location on the stack is
reused for another variable in the interest of reducing stack
consumption, then it may be overwritten and thus won't be visible to the
GC any more.
Yes, it might. On the other hand it might not.

In this case it's not very likely that the stack space will be reused
inside the loop, is it? It's needed for the fs variable in the next
iteration of the loop.
The JIT doesn't maintain the lifetime of a variable for its entire
lexical scope, except maybe if you've compiled to debug and are running
under the debugger.

You'd be surprised by what the GC will collect. I know I was. I've been
investigating a bug since yesterday evening that was most enlightening,
with respect to this behaviour. It can even collect objects referred to
by the object whose instance method is currently on the stack, under the
right circumstances.

-- Barry
Jul 7 '06 #39

P: n/a
Göran Andersson <gu***@guffa.comwrote:
If the 'fs' variable is enregistered, or its location on the stack is
reused for another variable in the interest of reducing stack
consumption, then it may be overwritten and thus won't be visible to the
GC any more.

Yes, it might. On the other hand it might not.

In this case it's not very likely that the stack space will be reused
inside the loop, is it?
It's unpredictably likely. Compile this program in release mode and run
it:

---8<---
using System;

class App
{
class A
{
public void Foo()
{
}

~A()
{
Console.WriteLine("A finalized.");
}
}

static unsafe void Main()
{
for (int i = 0; i < 2; ++i)
{
Console.WriteLine("Loop Start");
A a = new A();
a.Foo();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine("Loop End");
}
}
}
--->8---

What would you expect it to output? If the slot or whatever for "a"
isn't freed up until the next iteration, this is what I'd expect:

---8<---
Loop Start
Loop End
Loop Start
A finalized
Loop End
A finalized
--->8---

But that isn't what happens:

---8<---
Loop Start
A finalized.
Loop End
Loop Start
A finalized.
Loop End
--->8---

GC can be surprisingly proactive. I've made an entry on my blog today on
precisely this topic:

http://barrkel.blogspot.com/2006/07/...collector.html

-- Barry

--
http://barrkel.blogspot.com/
Jul 8 '06 #40

P: n/a
Göran Andersson <gu***@guffa.comwrote:
Jon Skeet [C# MVP] wrote:
Göran Andersson <gu***@guffa.comwrote:
>Nope. In release mode, the JIT is smart enough to work out when a
variable can no longer be read, and will not count that variable as a
live root.
Does that mean that the reference is removed from the variable?
Otherwise the garbage collector will still see the reference and can't
collect the object.
No, it just means that the variable's value isn't considered when the
GC works out which references are "live".

How would the GC know the scope of the variable, when the scope is
something that only the compiler is aware of?
One possible implementation: the C# compiler compiles to IL, and the JIT
produces the actual code. The IL contains ldloc and stloc for locals,
and thus the JIT can make a note of where the last use of a variable
occurs for each basic block. Hence it can produce tables which indicate
which stack locations / registers are valid roots for given instruction
pointer ranges.

It needs to do analysis like this anyway to make sensible decisions on
enregistering. Without knowing when a variable is no longer needed, and
thus a register freed up for use by another variable, code wouldn't be
as performant as it could be.

-- Barry

--
http://barrkel.blogspot.com/
Jul 8 '06 #41

P: n/a
Göran Andersson <gu***@guffa.comwrote:
No, it just means that the variable's value isn't considered when the
GC works out which references are "live".
How would the GC know the scope of the variable, when the scope is
something that only the compiler is aware of?
I can't remember (and can't find in the specs) whether the compiler
adds some information in the IL or whether the JIT works it out. Either
way, it clearly happens :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 8 '06 #42

P: n/a
Jon Skeet [C# MVP] <sk***@pobox.comwrote:
I can't remember (and can't find in the specs) whether the compiler
adds some information in the IL or whether the JIT works it out. Either
way, it clearly happens :)
The information isn't in the IL. It may be in the PDB. Quote from CLI
specs (v1.1):

"ilasm allows nested local variable scopes to be provided and allows
locals in nested scopes to share the same location as those in the outer
scope. The information about local names, scoping, and overlapping of
scoped locals is persisted to the PDB (debugger symbol) file rather than
the PE file itself."

In Java class files, there is a mechanism for specifying the extent of a
local variable. I've never used it when writing compilers targeting the
JVM, so I don't know if it actually gets used by the JIT, or whether it
does its own work. I would expect it to do its own work, since that kind
of work (def-use / use-def chains / SSA graph) is needed to implement
other compiler optimizations.

-- Barry

--
http://barrkel.blogspot.com/
Jul 8 '06 #43

P: n/a
Barry Kelly wrote:
Göran Andersson <gu***@guffa.comwrote:
>Jon Skeet [C# MVP] wrote:
>>Göran Andersson <gu***@guffa.comwrote:
Nope. In release mode, the JIT is smart enough to work out when a
variable can no longer be read, and will not count that variable as a
live root.
Does that mean that the reference is removed from the variable?
Otherwise the garbage collector will still see the reference and can't
collect the object.
No, it just means that the variable's value isn't considered when the
GC works out which references are "live".
How would the GC know the scope of the variable, when the scope is
something that only the compiler is aware of?

One possible implementation: the C# compiler compiles to IL, and the JIT
produces the actual code. The IL contains ldloc and stloc for locals,
and thus the JIT can make a note of where the last use of a variable
occurs for each basic block. Hence it can produce tables which indicate
which stack locations / registers are valid roots for given instruction
pointer ranges.

It needs to do analysis like this anyway to make sensible decisions on
enregistering. Without knowing when a variable is no longer needed, and
thus a register freed up for use by another variable, code wouldn't be
as performant as it could be.

-- Barry
But you yourself said that:

"Scope is a lexical concept that exists only at compile time."

I guess that's not really so, then.
Jul 8 '06 #44

P: n/a
Göran Andersson <gu***@guffa.comwrote:
But you yourself said that:

"Scope is a lexical concept that exists only at compile time."

I guess that's not really so, then.
I think you're confusing scope with GC reachability.

In compiler theory, the word "scope" is overloaded. It can either refer
to (i) the extent of source code for which the identifier is valid ("the
scope of a variable") or (ii) the set of identifiers which are valid for
the current position while parsing the source code ("the variable isn't
in scope").

It is implemented with the compiler's symbol table. After the compiler
has finished parsing and has resolved identifiers, scope no longer
exists. The information may be carried forward to the PDB for debugging,
but that's the end of it.

GC reachability is the set of rules by which objects in a graph are
determined to be alive or eligible for collection. Reachability is
typically defined by (i) a set of object roots and (ii) the transitive
closure of objects referenced by these roots.

The point is that the set of object roots at a particular location in
compiled code does not necessarily correspond exactly with the variables
which are lexically in scope at that location in the original source
code.

A variable being lexically "in scope" does not imply that it is GC
reachable.

-- Barry

--
http://barrkel.blogspot.com/
Jul 8 '06 #45

P: n/a
Barry Kelly <ba***********@gmail.comwrote:
One possible implementation: the C# compiler compiles to IL, and the JIT
produces the actual code. The IL contains ldloc and stloc for locals,
and thus the JIT can make a note of where the last use of a variable
occurs for each basic block. Hence it can produce tables which indicate
which stack locations / registers are valid roots for given instruction
pointer ranges.
There's another reason why it needs this info: so it can adjust all
pointers to relocated objects after a GC has just finished.

-- Barry

--
http://barrkel.blogspot.com/
Jul 8 '06 #46

P: n/a
Barry Kelly wrote:
Göran Andersson <gu***@guffa.comwrote:
>But you yourself said that:

"Scope is a lexical concept that exists only at compile time."

I guess that's not really so, then.

I think you're confusing scope with GC reachability.

In compiler theory, the word "scope" is overloaded. It can either refer
to (i) the extent of source code for which the identifier is valid ("the
scope of a variable") or (ii) the set of identifiers which are valid for
the current position while parsing the source code ("the variable isn't
in scope").

It is implemented with the compiler's symbol table. After the compiler
has finished parsing and has resolved identifiers, scope no longer
exists. The information may be carried forward to the PDB for debugging,
but that's the end of it.

GC reachability is the set of rules by which objects in a graph are
determined to be alive or eligible for collection. Reachability is
typically defined by (i) a set of object roots and (ii) the transitive
closure of objects referenced by these roots.

The point is that the set of object roots at a particular location in
compiled code does not necessarily correspond exactly with the variables
which are lexically in scope at that location in the original source
code.

A variable being lexically "in scope" does not imply that it is GC
reachable.

-- Barry
No, I'm not at all confused about what scope is. It's a bit surprising
how much the GC knows about it, though. Even if it doesn't know the
"available" scope of the variables, it seems to know the "utilized"
scope, or the active lifetime of the variables (which may be shorter
than the physical lifetime).

Is there any information that supports the theory that the GC knows when
a reference is no longer reachable? Can we trust that it will always be
able to collect objects that won't be used?

Does the scope matter? Will there be a difference between:

for (int i=0; i<1000; i++) {
byte[] buffer = new byte[10000];
}

and:

byte[] buffer;
for (int i=0; i<1000; i++) {
buffer = new byte[10000];
}

Will it with certainty always be able to collect the previous buffer?
Will it never differ from this?

byte[] buffer;
for (int i=0; i<1000; i++) {
buffer = new byte[10000];
buffer = null;
}
Jul 8 '06 #47

P: n/a
Göran Andersson <gu***@guffa.comwrote:
Barry Kelly wrote:
Göran Andersson <gu***@guffa.comwrote:
But you yourself said that:

"Scope is a lexical concept that exists only at compile time."

I guess that's not really so, then.
I think you're confusing scope with GC reachability.

No, I'm not at all confused about what scope is.
I don't understand how you could have quoted me in this context without
you being mistaken.
It's a bit surprising
how much the GC knows about it, though.
The GC doesn't know anything about scope. That's what I've been trying
to explain to you. The scope information is *LOST* after compile time.
The thing that the GC knows about IS NOT SCOPE.
it seems to know the "utilized"
scope, or the active lifetime of the variables (which may be shorter
than the physical lifetime).
Like I said in the other messages, the JIT needs this info (variable
lifetime - not scope) for enregistering and stack reuse, and so it
calculates it, and the GC needs this data for adjusting pointers after a
collection.
Is there any information that supports the theory that the GC knows when
a reference is no longer reachable?
The JIT can only detect the last use of a given variable definition (in
the Single Static Assignment (SSA) model of "variable definition"). It's
the JIT compiler that is doing the analysis, not the GC.

I recommend that you Google up on:

* Use-Definition chain, Definition-Use chain (ud-chain, du-chain)
* Single Static Assignment (SSA - this is a more modern approach)

Alternatively, you can look up use-def / def-use chains in the Dragon
book (Compilers: Principles, Techniques and Tools, by Aho, Sethi &
Ullman).
Can we trust that it will always be
able to collect objects that won't be used?
No, it can only collect objects which aren't used. For example:

---8<---
object x = new object();
Halting_Problem(); // might not return
Console.WriteLine(x);
--->8---

The JIT clearly can't determine that x is dead at the point of calling
Halting_Problem(), so the GC can't collect x.
Does the scope matter?
THE SCOPE DOESN'T EXIST IN IL. The scope is GONE, GONE, GONE, ALL GONE,
after the C# compiler has produced IL. Use ILDASM to decompile an
assembly some time. You will notice that THERE IS NO SCOPE INFORMATION
in the dump. There is only a list of local variables per method.
Will there be a difference between:

for (int i=0; i<1000; i++) {
byte[] buffer = new byte[10000];
}

and:

byte[] buffer;
for (int i=0; i<1000; i++) {
buffer = new byte[10000];
}

Will it with certainty always be able to collect the previous buffer?
Will it never differ from this?
It's entirely implementation defined, based on how smart the JIT is at
recognizing that variables are no longer needed. It's a function of the
sophistication of the compiler. It's in the JIT's interest to discover
when variables are no longer needed, because that creates room for other
variables to be enregistered, or stack space minimized.

-- Barry

--
http://barrkel.blogspot.com/
Jul 8 '06 #48

P: n/a
Barry Kelly wrote:
Göran Andersson <gu***@guffa.comwrote:
>Barry Kelly wrote:
>>Göran Andersson <gu***@guffa.comwrote:

But you yourself said that:

"Scope is a lexical concept that exists only at compile time."

I guess that's not really so, then.
I think you're confusing scope with GC reachability.
No, I'm not at all confused about what scope is.

I don't understand how you could have quoted me in this context without
you being mistaken.
No, I can see that. Hopefully it will dawn on you.
>It's a bit surprising
how much the GC knows about it, though.

The GC doesn't know anything about scope. That's what I've been trying
to explain to you. The scope information is *LOST* after compile time.
The thing that the GC knows about IS NOT SCOPE.
If you read more than just one sentence at a time, perhaps you would
understand what I am saying, instead of gettings stuck on a single word.
>it seems to know the "utilized"
scope, or the active lifetime of the variables (which may be shorter
than the physical lifetime).

Like I said in the other messages, the JIT needs this info (variable
lifetime - not scope) for enregistering and stack reuse, and so it
calculates it, and the GC needs this data for adjusting pointers after a
collection.
No, it doesn't. It uses the same information for that as it did to
determine which objects can be collected. If it used different
information in the phases, it would mess up the references.
>Can we trust that it will always be
able to collect objects that won't be used?

No, it can only collect objects which aren't used. For example:

---8<---
object x = new object();
Halting_Problem(); // might not return
Console.WriteLine(x);
--->8---

The JIT clearly can't determine that x is dead at the point of calling
Halting_Problem(), so the GC can't collect x.
Well, that is obvious, isn't it?

Ok, let me rephrase the question a bit more precise:

Can we trust that it will always be able to collect objects that
possibly can't be used later in the execution?
>Does the scope matter?

THE SCOPE DOESN'T EXIST IN IL. The scope is GONE, GONE, GONE, ALL GONE,
after the C# compiler has produced IL. Use ILDASM to decompile an
assembly some time. You will notice that THERE IS NO SCOPE INFORMATION
in the dump. There is only a list of local variables per method.
Will you PLEASE STOP SHOUTING!

I wasn't asking if the scope was existing in the IL code. I was asking
if the scope mattered.

If you please try to look beyond your hangup on this word, maybe you
could try to understand the question?
Jul 9 '06 #49

P: n/a
Göran Andersson <gu***@guffa.comwrote:

<snip>
Ok, let me rephrase the question a bit more precise:

Can we trust that it will always be able to collect objects that
possibly can't be used later in the execution?
On the current implementations, in release mode? I believe so, in
simple cases. The JIT doesn't do complex analysis, so if you had:

bool first=true;

object bigObject = (...);

for (int i=0; i < 100000; i++)
{
if (first)
{
useObject (bigObject);
first = false;
}

// Code not using bigObject
}

then the JIT wouldn't work out that first could never become true after
the first iteration and bigObject would therefore never be used after
that point. That's one of the few situations where it might make sense
to set a local variable to null.
I looked at the CLR spec and I found *very* little about garbage
collection. No guarantees about this kind of thing at all. Normally I'm
a spec hound in terms of only coding to the spec, but the ramifications
of only trusting to the spec in this case are so horrible that I
believe it makes more sense to go with what happens in reality.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 9 '06 #50

51 Replies

This discussion thread is closed

Replies have been disabled for this discussion.