Boxing and Unboxing ??

Peter Olcott

According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a value
type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value type"
and "reference type" mean something entirely different than what they mean on
every other platform in every other language. Normally a value type is the
actual data itself stored in memory, (such as an integer) and a reference type
is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what the
terms "value type" and "reference type" actually mean in terms of their
underlying architecture?

Jan 12 '07

Subscribe Post Reply

161

7665

« First
<
2
3
4

Barry Kelly

"Ignacio Machin \( .NET/ C# MVP \)" <machin TA laceupsolutions.com>
wrote:

|It
| is comparable to the slight extra overhead that polymorphism requires yet
| providing much more versatile code.

Not at all, they are two very different things.

I don't agree with this statement - polymorphism is exactly what boxing
is about. It allows an int, a "fundamental" type in most statically
typed languages, to be stored polymorphically in a location of type
object.

-- Barry

--
http://barrkel.blogspot.com/

Jan 17 '07 #151

Peter Olcott

"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@a75g2000cwd.googlegr oups.com...

>
Peter Olcott wrote:
>"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@a75g2000cwd.googleg roups.com...

>In order to improve computer language design one must look for ways to reduce
the number of details that the programmer must keep track of. The distinction
between value types and reference types as separate types is one detail that
might be able to be eliminated.

Perhaps, but the only ways I can think of to eliminate this detail
create even more ugly details, or result in horrid inefficiencies. I
can't for the life of me figure out how to unify the C# type system
(and thus eliminate the value / reference distinction and the need to
pay any attention to it at all) without introducing far more details
for the programmer to keep track of. I freely admit that this may be a
lack of imagination on my part. :-)

These details might still exist under the covers, I am only proposing that the
distinction between value type and reference type be made entirely transparent
to the C# programmer.

>
>It might be able to be eliminated with no
degradation of performance in terms of increases in either time or space.
(CPU
cycles or RAM).

AND while making things better (simpler) for the programmer, not worse.

Jan 17 '07 #152

Peter Olcott

"Barry Kelly" <ba***********@gmail.comwrote in message
news:e6********************************@4ax.com...

"Ignacio Machin \( .NET/ C# MVP \)" <machin TA laceupsolutions.com>
wrote:

>|It
| is comparable to the slight extra overhead that polymorphism requires yet
| providing much more versatile code.

Not at all, they are two very different things.

I don't agree with this statement - polymorphism is exactly what boxing
is about. It allows an int, a "fundamental" type in most statically
typed languages, to be stored polymorphically in a location of type
object.

If you simply keep everything in a box, then there is no boxing and unboxing
overhead, merely boxing initialization. Small value types such as integer and
double can have the functions that use them serve as their box.

>
-- Barry

--
http://barrkel.blogspot.com/

Jan 17 '07 #153

Bruce Wood

Peter Olcott wrote:

"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@51g2000cwl.googlegro ups.com...
OK.. this discussion is descending into silly territory. Perhaps this
will help. I make the following claims.

1. The price for the unified type model and additional expressive power
of C++ (where everything is a value unless you "manually" take a
reference to it) is that you are forced to take careful note of lots of
picky details. In particular, you have to litter your code with & and
*, and _know when to do so_ and when not to.

I am not recommending making C# more like C++, just the opposite. I am
recommending that C# reduce its complexity even more, and do this in a way that
neither reduces speed nor increases space very much. Polymorphism both reduces
speed and increases space, yet the benefits far outweigh the cost, because the
benefits are large and the cost is small.

Yes, but HOW? I can't for the life of me see how, or even see a way to
approach the problem. I've thought of only three alternatives so far,
and in all cases the cure is worse than the disease, as it were:

1. Unify in favour of reference types. Everything is (or appears to be)
a reference type. If everything really is a reference type, then
performance goes in the crapper as every int, bool, and double goes
onto the heap and requires a pointer dereference. Whether everything
appears to be or really is a reference type, whenever you want to pass
a value (like an int) to a method by value, you have to say something
special, like "val" (as opposed to the current "ref" which would become
the default). This would litter your code with "val" markers on method
headers, and if you forgot one, you might hose your caller. Some
languages take this approach (FORTRAN did, I'm not sure about
Smalltalk). It doesn't strike me as helping matters at all.

2. Unify in favour of value types. This is effectively what C++ does.
The problem is that then _almost_ every time I pass an object to a
method I have to remember to say "ref". This just ends up peppering my
method signatures with "ref" all over the place, and if I forget to add
a "ref" then I end up with a horribly inefficient call as some monster
instance is loaded onto the stack. Guess what every newbie out there
will be doing? All this gives me is a bunch more busywork (saying "ref"
all over the place) for no gain that I can see, other than the ability
to pass an object on the stack that one time out of 100 when it's
really what I want.

3. Keep the distinction, but hide it somehow. Besides not really
understanding how this would work, it still doesn't help me, because
the distinction is _important_. It really does matter which semantics I
choose for a type, just as it matters in C++ whether I choose to pass
an object/value by value or by reference. It deeply affects my program
and how it works. It matters a lot whether an assignment gives me a
copy or a reference to the same object instance. I'm not sure how to
abstract that away.

As I said, I freely admit that this may be lack of imagination on my
part. Feel free to propose another solution.

What would this "abstracting away" look like?

>

2. Far from liberating the programmer from "having to worry about value
versus reference types", C++ throws it right in your face and forces
you to deal with it in almost every line of code. By contrast, C# does
something conceptually ugly but practically beautiful. By dividing
types into value types and reference types, C# forces you to make the
choice once, up front. From then on the language normally does what you
would expect with the type.

3. C# is simpler to code in for what appears to be a trivial reason,
but turns out to be pivotal: in order to get the behaviour you want,
you don't have to say anything. You don't have to say &, or *, or ref,
or out. You just declare a variable, work with it, pass it to methods,
and everything works the way you would like it to. As with all
heuristic rules, this one is not absolute. Sometimes the language
_doesn't_ do what you want by default, and you have to say "ref" or
"out". However, look around the Framework classes and see how often you
see those two keywords. They're very, very rarely needed.

4. Point #3 means that C# is easier to learn for newbies. One just
writes code, and by and large it does "the right thing" without any
extra tweaking. The same cannot be said for C++. I was amazed in a
previous job by how many people were guessing at when to use & and when
to pass a variable straight into a method, and when they had to
dereference with * and when they didn't. These were experienced
programmers. The only time this happens in C# is in that
one-in-a-hundred case in which you need ref, out, or a clone.

5. Far from "improving programmer productivity," unifying the type
model in C# in the style of C++ would reduce programmer productivity
and increase programmer confusion. I saw it happen in C and C++. I see
no reason why the same change in C# would not produce the same results.

6. There is a price for the greater ease of use of C#: there are some
occasionally useful C++ idioms that simply can't be done in C#. One
that comes to mind is deciding to pass an object instance on the stack.
Another is storing a reference to an arbitrary variable for later
update. C# is certainly less powerful than C++. However, I think that
the C# team has done an excellent job of eliminating power where it's
the kind of power that usually gets you into big trouble and is only
occasionally legitimately useful. Others may disagree.

Jan 17 '07 #154

Jon Skeet [C# MVP]

Peter Olcott <pe********@yahoo.comwrote:

<snip>

These details might still exist under the covers, I am only proposing that the
distinction between value type and reference type be made entirely transparent
to the C# programmer.

If the distinction is entirely transparent, how am I meant to say that
I want one thing to be treated with value semantics and another with
reference semantics?

Note that when it comes to parameter passing, there are 4 options
(leaving "out" aside for the moment):

1) Pass value type argument by value
2) Pass reference type argument by value
3) Pass value type argument by reference
4) Pass reference type argument by reference

2) and 3) are quite similar (although not the same), but the others are
very different. In other words, I believe there are more semantics (all
of which are useful in some situations) than your proposal allows.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 17 '07 #155

Peter Olcott

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...

Peter Olcott <pe********@yahoo.comwrote:

<snip>

>These details might still exist under the covers, I am only proposing that
the
distinction between value type and reference type be made entirely
transparent
to the C# programmer.

If the distinction is entirely transparent, how am I meant to say that
I want one thing to be treated with value semantics and another with
reference semantics?

Note that when it comes to parameter passing, there are 4 options
(leaving "out" aside for the moment):

1) Pass value type argument by value
2) Pass reference type argument by value
3) Pass value type argument by reference
4) Pass reference type argument by reference

Pass most everything by reference except items that are [int] or smaller and do
not need to be changed by the called function. Large items that need to be
protected from change would be passed by reference using the [in] parameter
qualifier indicating that they are read-only. When I am referring to the term
[passing by reference] I am only referring to the fact that the machine address
of the data is passed, and not the actual data itself.

>
2) and 3) are quite similar (although not the same), but the others are
very different. In other words, I believe there are more semantics (all
of which are useful in some situations) than your proposal allows.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 17 '07 #156

Peter Olcott

"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@s34g2000cwa.googlegr oups.com...

>
Peter Olcott wrote:
>"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@51g2000cwl.googlegr oups.com...
OK.. this discussion is descending into silly territory. Perhaps this
will help. I make the following claims.

1. The price for the unified type model and additional expressive power
of C++ (where everything is a value unless you "manually" take a
reference to it) is that you are forced to take careful note of lots of
picky details. In particular, you have to litter your code with & and
*, and _know when to do so_ and when not to.

I am not recommending making C# more like C++, just the opposite. I am
recommending that C# reduce its complexity even more, and do this in a way
that
neither reduces speed nor increases space very much. Polymorphism both
reduces
speed and increases space, yet the benefits far outweigh the cost, because
the
benefits are large and the cost is small.

Yes, but HOW? I can't for the life of me see how, or even see a way to
approach the problem. I've thought of only three alternatives so far,
and in all cases the cure is worse than the disease, as it were:

1. Unify in favour of reference types. Everything is (or appears to be)
a reference type. If everything really is a reference type, then
performance goes in the crapper as every int, bool, and double goes
onto the heap and requires a pointer dereference. Whether everything

A pointer dereference is not expensive. I just benchmarked it at only 16% more
total time, in a tight loop.

appears to be or really is a reference type, whenever you want to pass
a value (like an int) to a method by value, you have to say something
special, like "val" (as opposed to the current "ref" which would become

There is no [val] or [ref] the idea it to remove these concepts from the
language domain. In their place are [in], (input read-only) [out] (output
write-only) and [io] (input/output read/write).

the default). This would litter your code with "val" markers on method
headers, and if you forgot one, you might hose your caller. Some
languages take this approach (FORTRAN did, I'm not sure about
Smalltalk). It doesn't strike me as helping matters at all.

2. Unify in favour of value types. This is effectively what C++ does.
The problem is that then _almost_ every time I pass an object to a
method I have to remember to say "ref". This just ends up peppering my
method signatures with "ref" all over the place, and if I forget to add
a "ref" then I end up with a horribly inefficient call as some monster
instance is loaded onto the stack. Guess what every newbie out there
will be doing? All this gives me is a bunch more busywork (saying "ref"
all over the place) for no gain that I can see, other than the ability
to pass an object on the stack that one time out of 100 when it's
really what I want.

3. Keep the distinction, but hide it somehow. Besides not really
understanding how this would work, it still doesn't help me, because
the distinction is _important_. It really does matter which semantics I
choose for a type, just as it matters in C++ whether I choose to pass
an object/value by value or by reference. It deeply affects my program
and how it works. It matters a lot whether an assignment gives me a
copy or a reference to the same object instance. I'm not sure how to
abstract that away.

As I said, I freely admit that this may be lack of imagination on my
part. Feel free to propose another solution.

What would this "abstracting away" look like?

>>
>
2. Far from liberating the programmer from "having to worry about value
versus reference types", C++ throws it right in your face and forces
you to deal with it in almost every line of code. By contrast, C# does
something conceptually ugly but practically beautiful. By dividing
types into value types and reference types, C# forces you to make the
choice once, up front. From then on the language normally does what you
would expect with the type.

3. C# is simpler to code in for what appears to be a trivial reason,
but turns out to be pivotal: in order to get the behaviour you want,
you don't have to say anything. You don't have to say &, or *, or ref,
or out. You just declare a variable, work with it, pass it to methods,
and everything works the way you would like it to. As with all
heuristic rules, this one is not absolute. Sometimes the language
_doesn't_ do what you want by default, and you have to say "ref" or
"out". However, look around the Framework classes and see how often you
see those two keywords. They're very, very rarely needed.

4. Point #3 means that C# is easier to learn for newbies. One just
writes code, and by and large it does "the right thing" without any
extra tweaking. The same cannot be said for C++. I was amazed in a
previous job by how many people were guessing at when to use & and when
to pass a variable straight into a method, and when they had to
dereference with * and when they didn't. These were experienced
programmers. The only time this happens in C# is in that
one-in-a-hundred case in which you need ref, out, or a clone.

5. Far from "improving programmer productivity," unifying the type
model in C# in the style of C++ would reduce programmer productivity
and increase programmer confusion. I saw it happen in C and C++. I see
no reason why the same change in C# would not produce the same results.

6. There is a price for the greater ease of use of C#: there are some
occasionally useful C++ idioms that simply can't be done in C#. One
that comes to mind is deciding to pass an object instance on the stack.
Another is storing a reference to an arbitrary variable for later
update. C# is certainly less powerful than C++. However, I think that
the C# team has done an excellent job of eliminating power where it's
the kind of power that usually gets you into big trouble and is only
occasionally legitimately useful. Others may disagree.

Jan 17 '07 #157

Jon Skeet [C# MVP]

Peter Olcott <No****@SeeScreen.comwrote:

<snip>

1) Pass value type argument by value
2) Pass reference type argument by value
3) Pass value type argument by reference
4) Pass reference type argument by reference

Pass most everything by reference except items that are [int] or smaller and do
not need to be changed by the called function. Large items that need to be
protected from change would be passed by reference using the [in] parameter
qualifier indicating that they are read-only. When I am referring to the term
[passing by reference] I am only referring to the fact that the machine address
of the data is passed, and not the actual data itself.

Right. So how do I differentiate a method which changes the contents of
the "object I pass in" and a method which changes the value of the
variable to refer to a completely different object? They are different
semantics, and both are useful at times. How does your scheme allow
them to be differentiated?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 17 '07 #158

Jesse McGrew

Peter Olcott wrote:

"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@s34g2000cwa.googlegr oups.com...

[...]

1. Unify in favour of reference types. Everything is (or appears to be)
a reference type. If everything really is a reference type, then
performance goes in the crapper as every int, bool, and double goes
onto the heap and requires a pointer dereference. Whether everything

A pointer dereference is not expensive. I just benchmarked it at only 16% more
total time, in a tight loop.

First, "only 16%" is quite a significant performance hit for a feature
of questionable usefulness. The performance hit from virtual methods,
interfaces, delegates, etc. is at least one that you only have to take
when you use those features. This one would affect every single
operation.

Also, your benchmark ignores the effects of all these pointers on cache
performance, as well as the additional work the GC would have to
perform if *everything* were referred to by reference, increasing the
number of pointers and heap objects in the average program by a factor
of... ten or more?

Jesse

Jan 17 '07 #159

Barry Kelly

Peter Olcott wrote:

Peter, what you wrote doesn't make sense. It implies either you don't
know what boxing means, or you don't know what a reference type is, or
both.

If you simply keep everything in a box

Boxes are on the heap. They are on the heap to avoid lifetime issues. If
they were not on the heap, and were instead on the stack for locals,
then one could store such a local in a global structure and violate
memory safety. For example, permitting that would permit the following
(in C for your ease of understanding):

---8<---
#include <stdio.h>

static int *value;

void store(int *x)
{
value = x;
}

int *retrieve(void)
{
return value;
}

void do_store(void)
{
int x = 42; // here's my local
store(&x); // here I am passing it by reference (boxed)
}

void recurse(int count)
{
if (count 0)
recurse(count - 1);
}

int main(void)
{
do_store();
recurse(10); // trashing stack
printf("%d\n", *retrieve()); // whups! CORRUPTED!
return 0;
}
--->8---

We don't store fundamental types like 'int' on the heap for performance
reasons. That's why they are value types. Value types are usually copied
instead of passed by reference. When they are passed by reference (via
'ref'), then their usefulness is severly constrained in order to avoid
the above problem (demonstrated in the C program).

>, then there is no boxing and unboxing
overhead, merely boxing initialization.

Small value types such as integer and
double can have the functions that use them serve as their box.

Functions cannot serve as a box. Functions are code. Boxes are objects
allocated on the heap. Functions are not mutable objects. You can't
store values inside functions.

-- Barry

--
http://barrkel.blogspot.com/

Jan 17 '07 #160

Barry Kelly

Peter Olcott wrote:

A pointer dereference is not expensive. I just benchmarked it at only 16% more
total time, in a tight loop.

A pointer dereference could potentially take seconds, if the page being
pointed to has been paged out by the OS. The memory hierarchy, and costs
of virtual memory lookup if the page isn't in the TLB, are substantial.
Of course you don't see them in a tight loop because all the caches etc.
aren't missing in that case. But the costs are substantial when missed.

-- Barry

--
http://barrkel.blogspot.com/

Jan 17 '07 #161

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Bruce Wood wrote:

By the way, C# has no concept of "friend", and there's really no way to
fake it.

InternalsVisibleToAttribute has some similarities to friend.

Arne

Jan 28 '07 #162

Similar topics