Boxing and Unboxing ??

Peter Olcott

According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a value
type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value type"
and "reference type" mean something entirely different than what they mean on
every other platform in every other language. Normally a value type is the
actual data itself stored in memory, (such as an integer) and a reference type
is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what the
terms "value type" and "reference type" actually mean in terms of their
underlying architecture?

Jan 12 '07 #1

Subscribe Reply

161

7674

1
2
3
>
Last »

Bob Graham

Value types are stored on the "Stack" and go away, as it were, immediately
when they go out of scope. Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected when
the system feels like it. References to ref types are passed normally as
a pointer to the address. Value types are passed a copy of the value.
I'm sure someone with more years under their Microsoft belt will chime
in here with a more exlicit and concise answer, but this is basically how
it is.
Bob

>
According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms
"value type"
and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value
type is the actual data itself stored in memory, (such as an integer)
and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean
something entirely different. can someone please give me a quick
overview of what the terms "value type" and "reference type" actually
mean in terms of their underlying architecture?

Posted by NewsLook (Trial Licence) from http://www.ghytred.com/NewsLook/about.aspx

Jan 12 '07 #2

Peter Olcott

"Bob Graham" <rv************************@sbcglobal.netwrote in message
news:de******************************@ghytred.com. ..

Value types are stored on the "Stack" and go away, as it were, immediately
when they go out of scope. Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected when
the system feels like it. References to ref types are passed normally as
a pointer to the address. Value types are passed a copy of the value.
I'm sure someone with more years under their Microsoft belt will chime
in here with a more exlicit and concise answer, but this is basically how
it is.
Bob

What I am looking for is all of the extra steps that form what is referred to as
boxing and unboxing. In C/C++ converting a value type to a reference type is a
very simple operation and I don't think that there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a reference type
to a value type.

in C/C++
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;

>
>>
According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms
"value type"
and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value
type is the actual data itself stored in memory, (such as an integer)
and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean
something entirely different. can someone please give me a quick
overview of what the terms "value type" and "reference type" actually
mean in terms of their underlying architecture?

Posted by NewsLook (Trial Licence) from
http://www.ghytred.com/NewsLook/about.aspx

Jan 12 '07 #3

Bob Graham

From Troellsen's Professional c#:

"Given that .NET defines two major categories of types (value based and
reference based), you may occasionally need to represent a variable of
one category as a variable of the other category. C# provides a very simple
mechanism, termed boxing, to convert a value type to a reference type.
Assume that you have created a variable of type short:
// Make a short value type.
short s = 25;
If, during the course of your application, you wish to represent this
value type as a reference type, you would "box" the value as follows:
// Box the value into an object reference.
object objShort = s;
Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type by storing the variable
in a System.Object. When you box a value, the CLR allocates a new object
on the heap and copies the value type's value (in this case, 25) into that
instance. What is returned to you is a reference to the newly allocated
object. Using this technique, .NET developers have no need to make use
of a set of wrapper classes used to temporarily treat stack data as heap-allocated
objects. The opposite operation is also permitted through unboxing. Unboxing
is the process of converting the value held in the object reference back
into a corresponding value type on the stack. The unboxing operation begins
by verifying that the receiving data type is equivalent to the boxed type,
and if so, it copies the value back into a local stack-based variable.
For example, the following unboxing operation works successfully, given
that the underlying type of the objShort is indeed a short (you'll examine
the C# casting operator in detail in the next chapter, so hold tight for
now): // Unbox the reference back into a corresponding short.
short anotherShort = (short)objShort;"

I'll stop there due to my distaste for violating copyrights. You may wan
to pick up this book for your language jump. It's more about the language
and makes a lot of comparisons to c/c and Java.
Bob

>

"Bob Graham" <rv************************@sbcglobal.netwrote in
message news:de******************************@ghytred.com. ..
>Value types are stored on the "Stack" and go away, as it were,
immediately when they go out of scope. Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected
when the system feels like it. References to ref types are passed
normally as a pointer to the address. Value types are passed a copy of the value.
I'm sure someone with more years under their Microsoft belt will
chime in here with a more exlicit and concise answer, but this is
basically how it is.
Bob

What I am looking for is all of the extra steps that form what is
referred to as boxing and unboxing. In C/C converting a value type to
a reference type is a very simple operation and I don't think that
there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a
reference type to a value type.

in C/C
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;

>>
>>>
According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms
"value type"
and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value
type is the actual data itself stored in memory, (such as an integer)
and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean
something entirely different. can someone please give me a quick
overview of what the terms "value type" and "reference type" actually
mean in terms of their underlying architecture?

Posted by NewsLook (Trial Licence) from
http://www.ghytred.com/NewsLook/about.aspx

Posted by NewsLook (Trial Licence) from http://www.ghytred.com/NewsLook/about.aspx

Jan 12 '07 #4

Bob Graham

But Generics are a more powerful alternative that you may want to read
up on. They get rid of boxing and unboxing penalties.
Bob

>

"Bob Graham" <rv************************@sbcglobal.netwrote in
message news:de******************************@ghytred.com. ..
>Value types are stored on the "Stack" and go away, as it were,
immediately when they go out of scope. Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected
when the system feels like it. References to ref types are passed
normally as a pointer to the address. Value types are passed a copy of the value.
I'm sure someone with more years under their Microsoft belt will
chime in here with a more exlicit and concise answer, but this is
basically how it is.
Bob

What I am looking for is all of the extra steps that form what is
referred to as boxing and unboxing. In C/C converting a value type to
a reference type is a very simple operation and I don't think that
there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a
reference type to a value type.

in C/C
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;

>>
>>>
According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms
"value type"
and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value
type is the actual data itself stored in memory, (such as an integer)
and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean
something entirely different. can someone please give me a quick
overview of what the terms "value type" and "reference type" actually
mean in terms of their underlying architecture?

Posted by NewsLook (Trial Licence) from
http://www.ghytred.com/NewsLook/about.aspx

Posted by NewsLook (Trial Licence) from http://www.ghytred.com/NewsLook/about.aspx

Jan 12 '07 #5

Dave Sexton

Hi Bob,

Value types are stored on the "Stack" and go away, as it were, immediately
when they go out of scope.

They can also be stored on the heap when they are fields of an object, for
instance. I like to think of value types as being in-line in terms of
memory. In other words, they can live anywhere since it's their "value"
that's important. On the contrary, reference types must live somewhere
where their "reference" can be used - the heap in .NET.

Mostly numeric types and structs.

If you include enums then you've named them all, although they are all
really structures (structs, if you want to use the term loosely). A value
type in the .NET framework is any object that derives from
System.ValueType. The C# compiler, though, requires you to specify the
struct keyword instead of class, but that just means your class derives from
System.ValueType.

<snip>

--
Dave Sexton
http://davesexton.com/blog

Jan 13 '07 #6

Dave Sexton

Hi Peter,

According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a
value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value
type" and "reference type" mean something entirely different than what
they mean on every other platform in every other language. Normally a
value type is the actual data itself stored in memory, (such as an
integer) and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what
the terms "value type" and "reference type" actually mean in terms of
their underlying architecture?

Your definitions are correct even in .NET. The real difference between the
framework and some of the other platforms you may be accustomed to is in the
management of memory. i.e., garbage collection.

--
Dave Sexton
http://davesexton.com/blog

Jan 13 '07 #7

Peter Olcott

"Bob Graham" <rv************************@sbcglobal.netwrote in message
news:64******************************@ghytred.com. ..

From Troellsen's Professional c#:

"Given that .NET defines two major categories of types (value based and
reference based), you may occasionally need to represent a variable of
one category as a variable of the other category. C# provides a very simple
mechanism, termed boxing, to convert a value type to a reference type.
Assume that you have created a variable of type short:
// Make a short value type.
short s = 25;
If, during the course of your application, you wish to represent this
value type as a reference type, you would "box" the value as follows:
// Box the value into an object reference.
object objShort = s;
Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type by storing the variable
in a System.Object. When you box a value, the CLR allocates a new object
on the heap and copies the value type's value (in this case, 25) into that
instance. What is returned to you is a reference to the newly allocated
object. Using this technique, .NET developers have no need to make use
of a set of wrapper classes used to temporarily treat stack data as
heap-allocated
objects. The opposite operation is also permitted through unboxing. Unboxing
is the process of converting the value held in the object reference back
into a corresponding value type on the stack. The unboxing operation begins
by verifying that the receiving data type is equivalent to the boxed type,
and if so, it copies the value back into a local stack-based variable.
For example, the following unboxing operation works successfully, given
that the underlying type of the objShort is indeed a short (you'll examine
the C# casting operator in detail in the next chapter, so hold tight for
now): // Unbox the reference back into a corresponding short.
short anotherShort = (short)objShort;"

So a reference type is not anything at all like what the term "reference type"
means everywhere outside of the .NET. architecture. They probably should have
chosen different names such as Managed Heap Type and Stack Type, this would have
been far less misleading.

What I really want to see is the underlying architecture of Managed Heap Type
and Stack Type. In particular is there a whole lot of extra baggage for this
"value type" (Stack Type) as there seems to be for the Managed Heap Type
(reference type) ???

>
I'll stop there due to my distaste for violating copyrights. You may wan
to pick up this book for your language jump. It's more about the language
and makes a lot of comparisons to c/c and Java.
Bob
>>

"Bob Graham" <rv************************@sbcglobal.netwrote in
message news:de******************************@ghytred.com. ..
>>Value types are stored on the "Stack" and go away, as it were,
immediately when they go out of scope. Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected
when the system feels like it. References to ref types are passed
normally as a pointer to the address. Value types are passed a copy of the
value.
I'm sure someone with more years under their Microsoft belt will
chime in here with a more exlicit and concise answer, but this is
basically how it is.
Bob

What I am looking for is all of the extra steps that form what is
referred to as boxing and unboxing. In C/C converting a value type to
a reference type is a very simple operation and I don't think that
there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a
reference type to a value type.

in C/C
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;

>>>

According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms
"value type"
and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value
type is the actual data itself stored in memory, (such as an integer)
and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean
something entirely different. can someone please give me a quick
overview of what the terms "value type" and "reference type" actually
mean in terms of their underlying architecture?

Posted by NewsLook (Trial Licence) from
http://www.ghytred.com/NewsLook/about.aspx

Posted by NewsLook (Trial Licence) from
http://www.ghytred.com/NewsLook/about.aspx

Jan 13 '07 #8

Peter Olcott

So with Generics Boxing and UnBoxing beomes obsolete?

"Bob Graham" <rv************************@sbcglobal.netwrote in message
news:4e******************************@ghytred.com. ..

But Generics are a more powerful alternative that you may want to read
up on. They get rid of boxing and unboxing penalties.
Bob

>>

"Bob Graham" <rv************************@sbcglobal.netwrote in
message news:de******************************@ghytred.com. ..
>>Value types are stored on the "Stack" and go away, as it were,
immediately when they go out of scope. Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected
when the system feels like it. References to ref types are passed
normally as a pointer to the address. Value types are passed a copy of the
value.
I'm sure someone with more years under their Microsoft belt will
chime in here with a more exlicit and concise answer, but this is
basically how it is.
Bob

What I am looking for is all of the extra steps that form what is
referred to as boxing and unboxing. In C/C converting a value type to
a reference type is a very simple operation and I don't think that
there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a
reference type to a value type.

in C/C
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;

>>>

According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms
"value type"
and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value
type is the actual data itself stored in memory, (such as an integer)
and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean
something entirely different. can someone please give me a quick
overview of what the terms "value type" and "reference type" actually
mean in terms of their underlying architecture?

Posted by NewsLook (Trial Licence) from
http://www.ghytred.com/NewsLook/about.aspx

Posted by NewsLook (Trial Licence) from
http://www.ghytred.com/NewsLook/about.aspx

Jan 13 '07 #9

Peter Olcott

"Dave Sexton" <dave@jwa[remove.this]online.comwrote in message
news:%2********************@TK2MSFTNGP04.phx.gbl.. .

Hi Peter,

>According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a
value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value
type" and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value type
is the actual data itself stored in memory, (such as an integer) and a
reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what the
terms "value type" and "reference type" actually mean in terms of their
underlying architecture?

Your definitions are correct even in .NET. The real difference between the
framework and some of the other platforms you may be accustomed to is in the
management of memory. i.e., garbage collection.

It seems that .NET adds a whole lot of extra baggage to these otherwise very
simple terms.
int X = 56; // refers to 56 (value type)
int* Y = &X; // Y refers to the address of 56 (reference type)
That is all there is to it, no runtime cost involved at all, no complex
underlying infrastructure.

>
--
Dave Sexton
http://davesexton.com/blog

Jan 13 '07 #10

Jesse McGrew

Peter Olcott wrote:

According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a value
type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value type"
and "reference type" mean something entirely different than what they mean on
every other platform in every other language. Normally a value type is the
actual data itself stored in memory, (such as an integer) and a reference type
is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what the
terms "value type" and "reference type" actually mean in terms of their
underlying architecture?

Well, if you're familiar with Delphi or Java, you've already seen
reference types. Class instances in those languages are always stored
as pointers to data on the heap, just like reference types in .NET, and
when you access an object's fields, you're implicitly deferencing the
pointer. In Delphi, records are equivalent to value types; in Java,
primitives like int and double are.

A value type is a type that's normally passed by value, and whose
contents *can* (but don't have to) live on the stack. A reference type
is always passed by reference, and always lives on the heap. A variable
of a value type takes up the entire size of the type, and assigning one
such variable to another copies the contents; a variable of a reference
type only takes up the size of a pointer, and assigning one to another
simply makes both variables point to the same data.

Boxing means copying a value type onto the heap, along with some type
information, so that it can be used like any other instance of
System.Object. This is because even though all types in .NET derive
from System.Object (a reference type), value types are stored
differently. To keep polymorphism and garbage collection working, the
data has to be copied at runtime, because you can't just use a pointer
to a value type on the stack as a managed reference - for example, you
might store that pointer in a global variable, where it would have to
live on after the function returns and its stack frame is destroyed.

Unboxing is the reverse - copying the contents of a boxed value type
(from the heap) back onto the stack so you can work with it in its
usual form.

Jesse

Jan 13 '07 #11

Peter Olcott

"Jesse McGrew" <jm*****@gmail.comwrote in message
news:11*********************@s34g2000cwa.googlegro ups.com...

Peter Olcott wrote:
>According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a
value
type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value
type"
and "reference type" mean something entirely different than what they mean on
every other platform in every other language. Normally a value type is the
actual data itself stored in memory, (such as an integer) and a reference
type
is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what the
terms "value type" and "reference type" actually mean in terms of their
underlying architecture?

Well, if you're familiar with Delphi or Java, you've already seen
reference types. Class instances in those languages are always stored
as pointers to data on the heap, just like reference types in .NET, and
when you access an object's fields, you're implicitly deferencing the
pointer. In Delphi, records are equivalent to value types; in Java,
primitives like int and double are.

A value type is a type that's normally passed by value, and whose
contents *can* (but don't have to) live on the stack. A reference type
is always passed by reference, and always lives on the heap. A variable
of a value type takes up the entire size of the type, and assigning one
such variable to another copies the contents; a variable of a reference
type only takes up the size of a pointer, and assigning one to another
simply makes both variables point to the same data.

Boxing means copying a value type onto the heap, along with some type
information, so that it can be used like any other instance of
System.Object. This is because even though all types in .NET derive
from System.Object (a reference type), value types are stored
differently. To keep polymorphism and garbage collection working, the
data has to be copied at runtime, because you can't just use a pointer
to a value type on the stack as a managed reference - for example, you
might store that pointer in a global variable, where it would have to
live on after the function returns and its stack frame is destroyed.

Unboxing is the reverse - copying the contents of a boxed value type
(from the heap) back onto the stack so you can work with it in its
usual form.

Jesse

Well that is a little more clear now, thanks. So the "value types" have less
baggage? I try to understand these things in the same way that I understand
their equivalents in C and C++. I try to understand them in terms of the
underlying machine operations in assembly language.

With .NET this is a little trickier because it has another layer in-between, and
does not seem to be able to directly expose the actual platform specific
assembly language of what it is doing. In C or C++ I simply tell the compiler to
output assembly language, then I can see everything.

Jan 13 '07 #12

Bruce Wood

Peter Olcott wrote:

"Dave Sexton" <dave@jwa[remove.this]online.comwrote in message
news:%2********************@TK2MSFTNGP04.phx.gbl.. .
Hi Peter,

According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a
value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value
type" and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value type
is the actual data itself stored in memory, (such as an integer) and a
reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what the
terms "value type" and "reference type" actually mean in terms of their
underlying architecture?
Your definitions are correct even in .NET. The real difference between the
framework and some of the other platforms you may be accustomed to is in the
management of memory. i.e., garbage collection.

It seems that .NET adds a whole lot of extra baggage to these otherwise very
simple terms.
int X = 56; // refers to 56 (value type)
int* Y = &X; // Y refers to the address of 56 (reference type)
That is all there is to it, no runtime cost involved at all, no complex
underlying infrastructure.

Yes, but you're comparing apples to oranges.

One of the explicit goals of C# (and Java) is to disallow the kind of
pointer aliasing that your example demonstrates, and all of the
security issues that that implies. In C# (and Java) you can't just
"take the address of" something. There is no "&" operator in either
language (unless, in C#, you resort to "unsafe" code).

Both languages are garbage collected, and both languages prevent us
(the programmers) from arbitrarily messing with memory.

This means that in both languages, you can't just take the address of a
value type (like your int X) and treat that as a reference type. If you
want to treat a value type as an object (a reference type), the runtime
must box it into a structure on the heap, like all other objects, and
then you can have a reference to it.

In brief, C# does _not_ allow you the same kind of low-level control
that C++ does. If you move from C++ to C# you lose expressive power. On
the other hand, you also lose a lot of constructs that allow you to
royally hose yourself. Using your example, you can't return the pointer
Y from a function and then later use that pointer into a
no-longer-valid part of the stack to hammer whatever might be there. No
can do in C# and Java, because neither language allows you to take the
address of an arbitrary variable.

C# is much more like Java than it is like C++, IMHO, which doesn't mean
that comparisons can't be made between C# and C++... just that many of
the concepts don't match up precisely.

Jan 13 '07 #13

Jesse McGrew

Peter Olcott wrote:

"Jesse McGrew" <jm*****@gmail.comwrote in message
news:11*********************@s34g2000cwa.googlegro ups.com...
Well, if you're familiar with Delphi or Java, you've already seen
reference types. Class instances in those languages are always stored
as pointers to data on the heap, just like reference types in .NET, and
when you access an object's fields, you're implicitly deferencing the
pointer. In Delphi, records are equivalent to value types; in Java,
primitives like int and double are.

A value type is a type that's normally passed by value, and whose
contents *can* (but don't have to) live on the stack. A reference type
is always passed by reference, and always lives on the heap. A variable
of a value type takes up the entire size of the type, and assigning one
such variable to another copies the contents; a variable of a reference
type only takes up the size of a pointer, and assigning one to another
simply makes both variables point to the same data.

Boxing means copying a value type onto the heap, along with some type
information, so that it can be used like any other instance of
System.Object. This is because even though all types in .NET derive
from System.Object (a reference type), value types are stored
differently. To keep polymorphism and garbage collection working, the
data has to be copied at runtime, because you can't just use a pointer
to a value type on the stack as a managed reference - for example, you
might store that pointer in a global variable, where it would have to
live on after the function returns and its stack frame is destroyed.

Unboxing is the reverse - copying the contents of a boxed value type
(from the heap) back onto the stack so you can work with it in its
usual form.

Jesse

Well that is a little more clear now, thanks. So the "value types" have less
baggage? I try to understand these things in the same way that I understand
their equivalents in C and C++. I try to understand them in terms of the
underlying machine operations in assembly language.

Value types do have less baggage (in their unboxed form). For example,
int is a value type - you wouldn't want to have to dereference
pointers, call methods, etc. every time you added or compared two
integers. But they also have less functionality, because you can only
take full advantage of inheritance and polymorphism when you're using
reference types, just like you can only do it with pointers and
references in C++. You need reference types to get that kind of OOP
behavior, as well as to implement structures like trees and lists.

Take the following C# definitions:

struct Value {
public int foo;
}

class Ref {
public int bar;
}

Value my_val;
Ref my_ref;

my_val.foo = my_ref.bar = 0;

The equivalent in C++ would be:

class Value {
public:
int foo;
};

class Ref {
public:
int bar;
};

Value my_val;
Ref * my_ref;

my_val.foo = my_ref->bar = 0;

Every time you declare a variable or field of the type Ref, you're
really declaring a pointer; and when you call its methods or access its
fields, you still write "." in C#, but it works like "->".

With .NET this is a little trickier because it has another layer in-between, and
does not seem to be able to directly expose the actual platform specific
assembly language of what it is doing. In C or C++ I simply tell the compiler to
output assembly language, then I can see everything.

You can view the assembly code in Visual Studio 2005. Run the program,
hit pause to break into the debugger, then right-click on a source line
and choose "Go to Disassembly".

Jesse

Jan 13 '07 #14

Jon Skeet [C# MVP]

Bob Graham <rv************************@sbcglobal.netwrote:

Value types are stored on the "Stack" and go away, as it were, immediately
when they go out of scope.

Value types aren't always stored on the stack.

See http://www.pobox.com/~skeet/csharp/memory.html

Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected when
the system feels like it. References to ref types are passed normally as
a pointer to the address. Value types are passed a copy of the value.

It=3Fs simpler than that - the value of the expression is always passed
by value unless you use ref/out - it=3Fs just that with reference types,
the value of the expression *is* a reference.

See http://www.pobox.com/~skeet/csharp/parameters.html for more
details.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 13 '07 #15

Jon Skeet [C# MVP]

Peter Olcott <No****@SeeScreen.comwrote:

So a reference type is not anything at all like what the term "reference type"
means everywhere outside of the .NET. architecture.

=3FReference type=3F means exactly the same in Java as it means in .NET.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 13 '07 #16

Jon Skeet [C# MVP]

Jesse McGrew <jm*****@gmail.comwrote:

A value type is a type that's normally passed by value, and whose
contents *can* (but don't have to) live on the stack. A reference type
is always passed by reference, and always lives on the heap.

Saying that reference types are "passed by reference" leads to
misunderstandings. Reference type instances are never passed at all -
there's no expression whose value is the instance itself, only the
reference. That reference is passed by value.

See http://www.pobox.com/~skeet/csharp/parameters.html for more details
of this distinction, and why it's an important one to make.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 13 '07 #17

Peter Olcott

"Jesse McGrew" <jm*****@gmail.comwrote in message
news:11**********************@l53g2000cwa.googlegr oups.com...

Peter Olcott wrote:
>"Jesse McGrew" <jm*****@gmail.comwrote in message
news:11*********************@s34g2000cwa.googlegr oups.com...
Well, if you're familiar with Delphi or Java, you've already seen
reference types. Class instances in those languages are always stored
as pointers to data on the heap, just like reference types in .NET, and
when you access an object's fields, you're implicitly deferencing the
pointer. In Delphi, records are equivalent to value types; in Java,
primitives like int and double are.

A value type is a type that's normally passed by value, and whose
contents *can* (but don't have to) live on the stack. A reference type
is always passed by reference, and always lives on the heap. A variable
of a value type takes up the entire size of the type, and assigning one
such variable to another copies the contents; a variable of a reference
type only takes up the size of a pointer, and assigning one to another
simply makes both variables point to the same data.

Boxing means copying a value type onto the heap, along with some type
information, so that it can be used like any other instance of
System.Object. This is because even though all types in .NET derive
from System.Object (a reference type), value types are stored
differently. To keep polymorphism and garbage collection working, the
data has to be copied at runtime, because you can't just use a pointer
to a value type on the stack as a managed reference - for example, you
might store that pointer in a global variable, where it would have to
live on after the function returns and its stack frame is destroyed.

Unboxing is the reverse - copying the contents of a boxed value type
(from the heap) back onto the stack so you can work with it in its
usual form.

Jesse

Well that is a little more clear now, thanks. So the "value types" have less
baggage? I try to understand these things in the same way that I understand
their equivalents in C and C++. I try to understand them in terms of the
underlying machine operations in assembly language.

Value types do have less baggage (in their unboxed form). For example,
int is a value type - you wouldn't want to have to dereference
pointers, call methods, etc. every time you added or compared two
integers. But they also have less functionality, because you can only

Does that mean that you do have to call a method every time you add or compare
two integers that are stored in reference types?

take full advantage of inheritance and polymorphism when you're using
reference types, just like you can only do it with pointers and
references in C++. You need reference types to get that kind of OOP
behavior, as well as to implement structures like trees and lists.

Take the following C# definitions:

struct Value {
public int foo;
}

class Ref {
public int bar;
}

Value my_val;
Ref my_ref;

my_val.foo = my_ref.bar = 0;

The equivalent in C++ would be:

class Value {
public:
int foo;
};

class Ref {
public:
int bar;
};

Value my_val;
Ref * my_ref;

my_val.foo = my_ref->bar = 0;

Every time you declare a variable or field of the type Ref, you're
really declaring a pointer; and when you call its methods or access its
fields, you still write "." in C#, but it works like "->".

>With .NET this is a little trickier because it has another layer in-between,
and
does not seem to be able to directly expose the actual platform specific
assembly language of what it is doing. In C or C++ I simply tell the compiler
to
output assembly language, then I can see everything.

You can view the assembly code in Visual Studio 2005. Run the program,
hit pause to break into the debugger, then right-click on a source line
and choose "Go to Disassembly".

Is that actual Intel machine specific assembly language, or the .NET virtual
machine assembly language?

>
Jesse

Jan 13 '07 #18

Barry Kelly

Peter Olcott wrote:

"Jesse McGrew" <jm*****@gmail.comwrote:
[...]

Does that mean that you do have to call a method every time you add or compare
two integers that are stored in reference types?

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.

Value types that are fields of a reference type are stored inline in the
memory for that object on the heap.

For example:

class A { int x; }

.... can be imagined as being roughly equivalent (from a memory layout
perspective) to this in C:

typedef void *MethodTable; // CLR implementation detail
typedef struct A_ { MethodTable *mt; int x; } *A;

In fact, you can't add two boxed integers in C#, since it's got no way
to represent them as anything other than 'object'. You need to cast them
to 'int' to add them - and that unboxes them.

Example:

object x = 42; // x now contains a boxed int
object y = 10; // as does y
Console.WriteLine(x + y); // can't add object to object

int unboxedX = (int) x;
int unboxedY = (int) y;
Console.WriteLine(unboxedX + unboxedY); // etc.

You can view the assembly code in Visual Studio 2005. Run the program,
hit pause to break into the debugger, then right-click on a source line
and choose "Go to Disassembly".

Is that actual Intel machine specific assembly language, or the .NET virtual
machine assembly language?

Why don't you try it and see, before asking this kind of question?

It's the actual Intel machine code. Be aware of the usual gotchas re
Debug and Release mode.

You can get a higher-quality disassembly, with more correct CLR symbols,
with the MS symbol server (SRV* etc.) combined with SOS.DLL (or use
WinDbg with SOS).

-- Barry

--
http://barrkel.blogspot.com/

Jan 13 '07 #19

Barry Kelly

Barry Kelly wrote:

Peter Olcott wrote:
"Jesse McGrew" <jm*****@gmail.comwrote:
[...]

Does that mean that you do have to call a method every time you add or compare
two integers that are stored in reference types?

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.

I should hasten to point out one thing though: when calling a method on
a value type (e.g. ToString() or GetHashCode()) that hasn't been
(re)declared or overridden in the value type, the value type needs to be
boxed to be passed as the 'this' argument (whether it be
System.Object::ToString(), System.Object::GetHashCode(), etc.)

It's the same principle ('this' in these cases is typically of type
'object'), but it is a little hidden.

-- Barry

--
http://barrkel.blogspot.com/

Jan 13 '07 #20

Peter Olcott

"Barry Kelly" <ba***********@gmail.comwrote in message
news:2b********************************@4ax.com...

Peter Olcott wrote:
>"Jesse McGrew" <jm*****@gmail.comwrote:
[...]

>Does that mean that you do have to call a method every time you add or
compare
two integers that are stored in reference types?

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.

Value types that are fields of a reference type are stored inline in the
memory for that object on the heap.

For example:

class A { int x; }

... can be imagined as being roughly equivalent (from a memory layout
perspective) to this in C:

typedef void *MethodTable; // CLR implementation detail
typedef struct A_ { MethodTable *mt; int x; } *A;

In fact, you can't add two boxed integers in C#, since it's got no way
to represent them as anything other than 'object'. You need to cast them
to 'int' to add them - and that unboxes them.

So a member function can not add two integer members without unboxing them
first? That would sound like horrendous design.

>
Example:

object x = 42; // x now contains a boxed int
object y = 10; // as does y
Console.WriteLine(x + y); // can't add object to object

int unboxedX = (int) x;
int unboxedY = (int) y;
Console.WriteLine(unboxedX + unboxedY); // etc.

You can view the assembly code in Visual Studio 2005. Run the program,
hit pause to break into the debugger, then right-click on a source line
and choose "Go to Disassembly".

Is that actual Intel machine specific assembly language, or the .NET virtual
machine assembly language?

Why don't you try it and see, before asking this kind of question?

It's the actual Intel machine code. Be aware of the usual gotchas re
Debug and Release mode.

You can get a higher-quality disassembly, with more correct CLR symbols,
with the MS symbol server (SRV* etc.) combined with SOS.DLL (or use
WinDbg with SOS).

-- Barry

--
http://barrkel.blogspot.com/

Jan 13 '07 #21

Bruce Wood

So a member function can not add two integer members without unboxing them

first? That would sound like horrendous design.

No. _Only_ if you declare the integer as being of type "object". Viz:

public class SimpleClass
{
private int a;
private int b;

public SimpleClass(int first, int second)
{
this.a = first;
this.b = second;
}

public int Sum()
{
return this.a + this.b;
}
}

No boxing, no unboxing. The integers are stored on the heap with the
object's state, and used directly in the sum operation.

Versus:
public class SimpleClass
{
private object a;
private object b;

public SimpleClass(int first, int second)
{
this.a = first; // Causes boxing
this.b = second; // Causes boxing
}

public int Sum()
{
return (int)this.a + (int)this.b; // "int" cast causes
unboxing
}
}

In the first case, the integers are value types, stored just as they
would be in C++. In the second case, they are declared as reference
types, so the CLR allocates space for them on the heap and copies the
values "first" and "second" into the boxes on the heap. The object
state then maintains references to the two boxes. In order to use the
values, you must fetch them from the boxes on the heap, or "unbox" them
(which, in reality, is just a pointer dereference, which is probably
what you would expect anyway).

Jan 13 '07 #22

Peter Olcott

"Bruce Wood" <br*******@canada.comwrote in message
news:11*********************@v45g2000cwv.googlegro ups.com...

>So a member function can not add two integer members without unboxing them
first? That would sound like horrendous design.

No. _Only_ if you declare the integer as being of type "object". Viz:

public class SimpleClass
{
private int a;
private int b;

public SimpleClass(int first, int second)
{
this.a = first;
this.b = second;
}

public int Sum()
{
return this.a + this.b;
}
}

No boxing, no unboxing. The integers are stored on the heap with the
object's state, and used directly in the sum operation.

Versus:
public class SimpleClass
{
private object a;
private object b;

public SimpleClass(int first, int second)
{
this.a = first; // Causes boxing
this.b = second; // Causes boxing
}

public int Sum()
{
return (int)this.a + (int)this.b; // "int" cast causes
unboxing
}
}

In the first case, the integers are value types, stored just as they
would be in C++. In the second case, they are declared as reference
types, so the CLR allocates space for them on the heap and copies the
values "first" and "second" into the boxes on the heap. The object
state then maintains references to the two boxes. In order to use the
values, you must fetch them from the boxes on the heap, or "unbox" them
(which, in reality, is just a pointer dereference, which is probably
what you would expect anyway).

Well that's not too bad then. It would seem that good design might be able to
completely eliminate the boxing and unboxing overhead penalty. Is it possible to
pass data around as unboxed data? Can I pass the address of a struct, so that a
class member can update this struct without boxing and unboxing?

What is the best way to get one class to update the struct data of another
class?

Jan 13 '07 #23

Barry Kelly

Peter Olcott wrote:

Well that's not too bad then. It would seem that good design might be able to
completely eliminate the boxing and unboxing overhead penalty.

Yup. Nobody I know with experience worries much about this.

Is it possible to pass data around as unboxed data?

Yes - declare your types rather than using 'object'.

Can I pass the address of a struct, so that a
class member can update this struct without boxing and unboxing?

You can, but in a strictly downwards (call stack) fashion, via the 'ref'
modifier on arguments. You can't safely store the address.

With unsafe code, you can use the '&' operator to get the address, and
basically write C code to manipulate the data. But that's unsafe code:
it's not verifiable, it won't work if the executable is run from a
network location, and almost certainly won't work if you're (e.g.)
writing an ASP.NET application for hosting on a server somewhere -
unless you control the server & permissions completely.

What is the best way to get one class to update the struct data of another
class?

By calling methods on the other class.

-- Barry

--
http://barrkel.blogspot.com/

Jan 13 '07 #24

Barry Kelly

Peter Olcott wrote:

"Barry Kelly" <ba***********@gmail.comwrote:

So a member function can not add two integer members without unboxing them
first? That would sound like horrendous design.

If you carefully read what I wrote, you'll notice:

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.

You *cannot*, I repeat *CANNOT*, have two boxed integer members in C# -
the members would need to be of type *object*, not int, in order for
them to be boxed.

-- Barry

--
http://barrkel.blogspot.com/

Jan 13 '07 #25

Peter Olcott

"Barry Kelly" <ba***********@gmail.comwrote in message
news:t4********************************@4ax.com...

Peter Olcott wrote:

>Well that's not too bad then. It would seem that good design might be able to
completely eliminate the boxing and unboxing overhead penalty.

Yup. Nobody I know with experience worries much about this.

>Is it possible to pass data around as unboxed data?

Yes - declare your types rather than using 'object'.

>Can I pass the address of a struct, so that a
class member can update this struct without boxing and unboxing?

You can, but in a strictly downwards (call stack) fashion, via the 'ref'
modifier on arguments. You can't safely store the address.

So I can call one member function from another member function of a different
class and pass the address of the a struct to the second member function so that
the second member function can directly update the contents of the struct,
without any boxing and unboxing overhead? If the answer is yes, then what is the
syntax for doing this?

With unsafe code, you can use the '&' operator to get the address, and
basically write C code to manipulate the data. But that's unsafe code:
it's not verifiable, it won't work if the executable is run from a
network location, and almost certainly won't work if you're (e.g.)
writing an ASP.NET application for hosting on a server somewhere -
unless you control the server & permissions completely.

>What is the best way to get one class to update the struct data of another
class?

By calling methods on the other class.

-- Barry

--
http://barrkel.blogspot.com/

Jan 13 '07 #26

Peter Olcott

"Barry Kelly" <ba***********@gmail.comwrote in message
news:2d********************************@4ax.com...

Peter Olcott wrote:

>"Barry Kelly" <ba***********@gmail.comwrote:

So a member function can not add two integer members without unboxing them
first? That would sound like horrendous design.

If you carefully read what I wrote, you'll notice:

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.

You *cannot*, I repeat *CANNOT*, have two boxed integer members in C# -
the members would need to be of type *object*, not int, in order for
them to be boxed.

-- Barry

I carefully read it, yet, did not fully understand the meaning of all of the
terminology that was used. For one thing, I don't see why there is ever any need
for boxing and unboxing. I know that there is no such need in C++. I also know
that it must somehow support GC, and that is why it is needed. I don't see how
it supports GC. Is it something like maintaining a chain of pointers indicating
who owns what?

>
--
http://barrkel.blogspot.com/

Jan 13 '07 #27

Barry Kelly

Peter Olcott wrote:

You can, but in a strictly downwards (call stack) fashion, via the 'ref'
modifier on arguments. You can't safely store the address.
So I can call one member function from another member function of a different
class and pass the address of the a struct to the second member function so that
the second member function can directly update the contents of the struct,
without any boxing and unboxing overhead?

Yes.

If the answer is yes, then what is the
syntax for doing this?

void Foo(ref MyStruct value) { } // declaration

// ...
MyStruct myStructValue;
// ...
Foo(ref myStructValue); // usage

There is also an 'out', which is similar but (a) argument need not be
definitely assigned when passed in (but it will be definitely assigned
after the call) and (b) it is treated as unassigned in the body of the
method taking the parameter, and will be so treated until it's assigned
(and must be assigned before the method returns).

But be sure to measure that:

1) MyStruct being a struct (value type) is the right thing to do.
Typically, if sizeof(MyStruct) is greater than (say) 16 bytes, it's
looking like it might be too big. Of course, there are exceptions to
this, like in all performance work. Measure, etc.

2) The savings by passing by-ref outweigh the fact that it's a mutable
reference. In other words, beware that there's no const by-ref
mechanism.

-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #28

Barry Kelly

Peter Olcott wrote:

I carefully read it, yet, did not fully understand the meaning of all of the
terminology that was used. For one thing, I don't see why there is ever any need
for boxing and unboxing.

Consider how these things would be implemented in a memory-safe[1]
manner without boxing (whether manual boxing like Java 1.4, or
autoboxing like C# and Java 1.5+):

* IEnumerable
* ArrayList or List<object(take your pick)
* Component.Tag

I know that there is no such need in C++.

If you try to create a C++ analogue of IEnumerable in a memory-safe way,
you'll need to reinvent boxing. In other words, you'll need some way of
unifying all values into some interface that can be queried for type and
safely converted into its actual value.

And just because a feature is useful doesn't mean that it is necessary.
C++ isn't memory-safe.

I also know
that it must somehow support GC, and that is why it is needed.

GC is an orthogonal issue to boxing per se. Autoboxing, however,
requires some kind of GC, even if it's as dumb as reference counting, if
it's to be sane (IMHO).

I don't see how it supports GC. Is it something like maintaining a chain
of pointers indicating who owns what?

GC in no way requires boxing. GC follows the same references you program
with. There are no magic references behind the scenes.

[1] By "memory-safe", I mean that it's provably impossible to violate
language's memory model. See e.g. type safety on Wikipedia for more
info:

http://en.wikipedia.org/wiki/Type_safety

-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #29

Barry Kelly

Barry Kelly wrote:

* Component.Tag

That ought to be Control.Tag - my Delphiness showing.

-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #30

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Peter Olcott wrote:

So with Generics Boxing and UnBoxing beomes obsolete?

Not in general.

Generics make boxing an unboxing obsolete in the context
of storing value types in collections.

Arne

Jan 14 '07 #31

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Peter Olcott wrote:

What I am looking for is all of the extra steps that form what is referred to as
boxing and unboxing. In C/C++ converting a value type to a reference type is a
very simple operation and I don't think that there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a reference type
to a value type.

in C/C++
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;

That code is not equivalent to what we are discussing in C#.

In fact it does not really have any equivalent in C# (not using
unsafe code).

Arne

Jan 14 '07 #32

Bruce Wood

Peter Olcott wrote:

"Barry Kelly" <ba***********@gmail.comwrote in message
news:2d********************************@4ax.com...
Peter Olcott wrote:

"Barry Kelly" <ba***********@gmail.comwrote:

So a member function can not add two integer members without unboxing them
first? That would sound like horrendous design.
If you carefully read what I wrote, you'll notice:

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.
You *cannot*, I repeat *CANNOT*, have two boxed integer members in C# -
the members would need to be of type *object*, not int, in order for
them to be boxed.

-- Barry

I carefully read it, yet, did not fully understand the meaning of all of the
terminology that was used. For one thing, I don't see why there is ever any need
for boxing and unboxing. I know that there is no such need in C++.

That's because in C# (and Java) you _can't_ say:

int x = 3;
int *p = &x;

because the "&" operator simply doesn't exist. You can't take the
address of an arbitrary variable.

I also know that it must somehow support GC, and that is why it is needed.

Well, more to the point, a language that supports garbage collection
can't allow one to take addresses of arbitrary memory locations, as the
garbage collector could then never determine what objects were
referenced and which weren't (because an address into the midst of an
object would then be legal).

Is it something like maintaining a chain of pointers indicating who owns what?

Well, sort of. The GC walks the stack and all static objects, looking
for references to objects on the heap. It then follows references
stored in those objects, etc, until it exhausts the network of
references. Any objects left thus unmarked are available for
collection.

Of course, it's rather more complex than that, but you get the idea. If
you allow references into the midst of objects, then it's much more
difficult to decide what is referenced and what isn't.

In C# and Java, every reference that you can directly manipulate in
code is to a valid object on the heap. That's why, if you want to treat
an int as an object (and thus have a reference to it) then the CLR has
to create an object wrapper for it and put it on the heap.

Jan 14 '07 #33

Peter Olcott

"Barry Kelly" <ba***********@gmail.comwrote in message
news:bv********************************@4ax.com...

Peter Olcott wrote:

You can, but in a strictly downwards (call stack) fashion, via the 'ref'
modifier on arguments. You can't safely store the address.

So I can call one member function from another member function of a different
class and pass the address of the a struct to the second member function so
that
the second member function can directly update the contents of the struct,
without any boxing and unboxing overhead?

Yes.

>If the answer is yes, then what is the
syntax for doing this?

void Foo(ref MyStruct value) { } // declaration

// ...
MyStruct myStructValue;
// ...
Foo(ref myStructValue); // usage

There is also an 'out', which is similar but (a) argument need not be
definitely assigned when passed in (but it will be definitely assigned
after the call) and (b) it is treated as unassigned in the body of the
method taking the parameter, and will be so treated until it's assigned
(and must be assigned before the method returns).

But be sure to measure that:

1) MyStruct being a struct (value type) is the right thing to do.
Typically, if sizeof(MyStruct) is greater than (say) 16 bytes, it's
looking like it might be too big. Of course, there are exceptions to
this, like in all performance work. Measure, etc.

2) The savings by passing by-ref outweigh the fact that it's a mutable
reference. In other words, beware that there's no const by-ref
mechanism.

There is no inherent reason why this could not be added to the language as a
compile time feature later on. It might be simpler to stick with the established
convention and simply make an [in] equivalent of an [out] parameter, instead of
using the somewhat less obvious [const]. There would be no reason to distinguish
between [in] by reference and [in] by value, they could all be passed by
reference, or anything larger than [int] could always be passed by reference.

It is good to know that aggregate data can be passed by reference without the
boxing and unboxing overhead, if need be.

>
-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #34

Peter Olcott

"Barry Kelly" <ba***********@gmail.comwrote in message
news:u9********************************@4ax.com...

Peter Olcott wrote:

>I carefully read it, yet, did not fully understand the meaning of all of the
terminology that was used. For one thing, I don't see why there is ever any
need
for boxing and unboxing.

Consider how these things would be implemented in a memory-safe[1]
manner without boxing (whether manual boxing like Java 1.4, or
autoboxing like C# and Java 1.5+):

* IEnumerable
* ArrayList or List<object(take your pick)
* Component.Tag

>I know that there is no such need in C++.

If you try to create a C++ analogue of IEnumerable in a memory-safe way,
you'll need to reinvent boxing. In other words, you'll need some way of
unifying all values into some interface that can be queried for type and
safely converted into its actual value.

And just because a feature is useful doesn't mean that it is necessary.
C++ isn't memory-safe.

>I also know
that it must somehow support GC, and that is why it is needed.

GC is an orthogonal issue to boxing per se. Autoboxing, however,
requires some kind of GC, even if it's as dumb as reference counting, if
it's to be sane (IMHO).

>I don't see how it supports GC. Is it something like maintaining a chain
of pointers indicating who owns what?

GC in no way requires boxing. GC follows the same references you program
with. There are no magic references behind the scenes.

[1] By "memory-safe", I mean that it's provably impossible to violate
language's memory model. See e.g. type safety on Wikipedia for more
info:

http://en.wikipedia.org/wiki/Type_safety

A strongly type language like C++ effectively prevents any accidental type
errors, why bother with more than this?

>
-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #35

Peter Olcott

"Arne Vajhøj" <ar**@vajhoej.dkwrote in message
news:45***********************@news.sunsite.dk...

Peter Olcott wrote:
>What I am looking for is all of the extra steps that form what is referred to
as boxing and unboxing. In C/C++ converting a value type to a reference type
is a very simple operation and I don't think that there are any runtime steps
at all. All the steps are done at compile time. Likewise for converting a
reference type to a value type.

in C/C++
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;

That code is not equivalent to what we are discussing in C#.

In fact it does not really have any equivalent in C# (not using
unsafe code).

Arne

Couldn't there possibly be a way to create safe code that does not ever require
any extra runtime overhead? Couldn't all the safety checking somehow be done at
compile time?

Jan 14 '07 #36

Peter Olcott

"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@a75g2000cwd.googlegr oups.com...

>
Peter Olcott wrote:
>"Barry Kelly" <ba***********@gmail.comwrote in message
news:2d********************************@4ax.com.. .
Peter Olcott wrote:

"Barry Kelly" <ba***********@gmail.comwrote:

So a member function can not add two integer members without unboxing them
first? That would sound like horrendous design.

If you carefully read what I wrote, you'll notice:

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.

You *cannot*, I repeat *CANNOT*, have two boxed integer members in C# -
the members would need to be of type *object*, not int, in order for
them to be boxed.

-- Barry

I carefully read it, yet, did not fully understand the meaning of all of the
terminology that was used. For one thing, I don't see why there is ever any
need
for boxing and unboxing. I know that there is no such need in C++.

That's because in C# (and Java) you _can't_ say:

int x = 3;
int *p = &x;

because the "&" operator simply doesn't exist. You can't take the
address of an arbitrary variable.

I still don't see any reason why a completely type safe language can not be
constructed without the need for any runtime overhead. You could even allow
construct such as the above, and still be completely type safe, merely disallow
type casting.

>
>I also know that it must somehow support GC, and that is why it is needed.

Well, more to the point, a language that supports garbage collection
can't allow one to take addresses of arbitrary memory locations, as the
garbage collector could then never determine what objects were
referenced and which weren't (because an address into the midst of an
object would then be legal).

It could do this, but, then you have the issue of reference counting, more extra
overhead. You don't have this problem when data is simply passed by address with
no assignment to another pointer variable.

>
>Is it something like maintaining a chain of pointers indicating who owns
what?

Well, sort of. The GC walks the stack and all static objects, looking
for references to objects on the heap. It then follows references
stored in those objects, etc, until it exhausts the network of
references. Any objects left thus unmarked are available for
collection.

Global data is disallowed?

Of course, it's rather more complex than that, but you get the idea. If
you allow references into the midst of objects, then it's much more
difficult to decide what is referenced and what isn't.

In C# and Java, every reference that you can directly manipulate in
code is to a valid object on the heap. That's why, if you want to treat
an int as an object (and thus have a reference to it) then the CLR has
to create an object wrapper for it and put it on the heap.

I still don't see any need for a wrapper. Do you mean for reference counting?

Jan 14 '07 #37

Jesse McGrew

Peter Olcott wrote:

[1] By "memory-safe", I mean that it's provably impossible to violate
language's memory model. See e.g. type safety on Wikipedia for more
info:

http://en.wikipedia.org/wiki/Type_safety

A strongly type language like C++ effectively prevents any accidental type
errors, why bother with more than this?

Because this also prevents *intentional* type errors, which is
important for running code in a sandbox. Your web browser can guarantee
that a Java applet embedded into a page won't crash your system or
delete all your files, because Java enforces type safety at all levels;
this is the same sort of thing.

Jesse

Jan 14 '07 #38

Jesse McGrew

Peter Olcott wrote:

"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@a75g2000cwd.googlegr oups.com...

[...]

That's because in C# (and Java) you _can't_ say:

int x = 3;
int *p = &x;

because the "&" operator simply doesn't exist. You can't take the
address of an arbitrary variable.

I still don't see any reason why a completely type safe language can not be
constructed without the need for any runtime overhead. You could even allow
construct such as the above, and still be completely type safe, merely disallow
type casting.

If you disallow type casting, you neuter the language. You need to be
able to cast instances of derived classes to their bases and back. You
can do the first kind of cast without any runtime overhead, but you
need *some* runtime overhead to cast a base instance to its actual
derived class, even in C++ with dynamic_cast<>.

(The overhead in C++ isn't for performing the actual cast, but for
verifying that the cast is valid - that the object actually belongs to
the class you're casting it to. In C#, that's usually the case, but for
unboxing casts there's also overhead for copying the value out of its
box.)

Well, more to the point, a language that supports garbage collection
can't allow one to take addresses of arbitrary memory locations, as the
garbage collector could then never determine what objects were
referenced and which weren't (because an address into the midst of an
object would then be legal).

It could do this, but, then you have the issue of reference counting, more extra
overhead. You don't have this problem when data is simply passed by address with
no assignment to another pointer variable.

The desire to avoid that overhead (as well as other problems with
reference counting) is, presumably, why .NET uses a garbage collector
instead.

Well, sort of. The GC walks the stack and all static objects, looking
for references to objects on the heap. It then follows references
stored in those objects, etc, until it exhausts the network of
references. Any objects left thus unmarked are available for
collection.
Global data is disallowed?

No, that's what "static objects" refers to. In C#, you typically only
store global data by putting it in the static fields of a class. (There
are a couple other types of global data used with C++/CLI: bare global
variables and gcroots.)

Of course, it's rather more complex than that, but you get the idea. If
you allow references into the midst of objects, then it's much more
difficult to decide what is referenced and what isn't.

In C# and Java, every reference that you can directly manipulate in
code is to a valid object on the heap. That's why, if you want to treat
an int as an object (and thus have a reference to it) then the CLR has
to create an object wrapper for it and put it on the heap.

I still don't see any need for a wrapper. Do you mean for reference counting?

The wrapper is there so that the int on the heap can be treated like
any other object, with a type pointer, virtual methods, etc. If it were
just stored on the heap as a plain integer, there'd be no way for your
code (and more importantly, the garbage collector) to tell it apart
from a float or an object reference at runtime.

Boxing lets you write a method like this:

public static void PrintIt(object foo)
{
Console.WriteLine("Thanks for this " + foo.GetType().Name + ": " +
foo.ToString());
}

And then pass in *any* value, whether it's an integer, a structure, or
an object reference. An unboxed integer is just a number, with no type
information other than that stored in the compiler's internals; a boxed
integer is a full-fledged instance of a class derived from
System.Object.

Jesse

Jan 14 '07 #39

Bruce Wood

Peter Olcott wrote:

"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@a75g2000cwd.googlegr oups.com...
That's because in C# (and Java) you _can't_ say:

int x = 3;
int *p = &x;

because the "&" operator simply doesn't exist. You can't take the
address of an arbitrary variable.

I still don't see any reason why a completely type safe language can not be
constructed without the need for any runtime overhead. You could even allow
construct such as the above, and still be completely type safe, merely disallow
type casting.

I also know that it must somehow support GC, and that is why it is needed.
Well, more to the point, a language that supports garbage collection
can't allow one to take addresses of arbitrary memory locations, as the
garbage collector could then never determine what objects were
referenced and which weren't (because an address into the midst of an
object would then be legal).

It could do this, but, then you have the issue of reference counting, more extra
overhead. You don't have this problem when data is simply passed by address with
no assignment to another pointer variable.

C# supports pass-by-reference using the "ref" keyword.

However, I don't see how a language that allowed one to take the
address of arbitrary data could implement garbage collection. Even with
reference counting, the theory is that an _object_ counts references to
itself. An int, however, isn't an object. You're faced with the problem
of an object counting references to itself _or piece of data that it
holds_. How could you engineer a system whereby object A could keep
track of this sort of thing:

int *p = &(A.X);
int *q = p;

How does the object A now know that there are two references to it, p
and q, which point to a field inside A and not to A itself?

I don't see how you could automate this kind of reference counting,
even in C++, but then I'm no C++ guru.

Is it something like maintaining a chain of pointers indicating who owns
what?
Well, sort of. The GC walks the stack and all static objects, looking
for references to objects on the heap. It then follows references
stored in those objects, etc, until it exhausts the network of
references. Any objects left thus unmarked are available for
collection.
Global data is disallowed?

No. Global data is allowed. That's what I meant by "static".

Of course, it's rather more complex than that, but you get the idea. If
you allow references into the midst of objects, then it's much more
difficult to decide what is referenced and what isn't.

In C# and Java, every reference that you can directly manipulate in
code is to a valid object on the heap. That's why, if you want to treat
an int as an object (and thus have a reference to it) then the CLR has
to create an object wrapper for it and put it on the heap.

I still don't see any need for a wrapper. Do you mean for reference counting?

C# and Java don't do reference counting. They walk the network of
object references at garbage collection time. "Mark and sweep."

I guess a good summary would be to say that the more regular the
situation, the easier it is to write good code to deal with it. By
forcing every collectable object to be the same, and allowing
references only to objects on the heap (apart from pass-by-ref, which
doesn't enter into garbage collection), C# and Java make it easier on
the garbage collector, which allows the GC to be more efficient.

Once you open up the language to allow arbitrary addressing of objects
and the values within them, you create a nightmare situation for the
garbage collector. Not that a sufficiently clever team of people
couldn't do it, I suppose, but it adds a lot of additional complexity,
and one has to ask exactly what would be gained? Java has demonstrated
that you can write perfectly good code without the ability to take
arbitrary addresses, pointer arithmetic, and the other stuff that C and
C++ pointers provide. There are some domains where the power of C / C++
pointers is arguably a great boon, but for most programming problems it
isn't required. So, you don't lose very much, and you gain a much
simpler garbage collector and better run-time security.

And yes, in .NET 2.0 you can pretty-much avoid boxing (and unboxing)
altogether. It was difficult in .NET 1.1 because all of the standard
collections were collections of Object, and so storing values in a
Hashtable or an ArrayList (aka Vector in C++) meant incurring boxing
overhead. Even in .NET 1.1, however, you could roll your own
collections that didn't box or unbox, but they had to be type-specific.
..NET 2.0's generics (aka templates in C++) eliminate this problem. I
wouldn't say that boxing is a thing of the past, but more than 90% of
boxing in .NET 1.1 was in collections, and that's no longer necessary.

So the runtime penalty is almost non-existent, assuming that you use
appropriate language constructs.

Personally, I'm glad that arbitrary addressing was never put into Java
or C#. When I moved from C / C++ to Java I wondered how I would ever do
without the "&" operator, but I quickly realized that for the type of
software I write (business software) it really isn't needed. If,
however, I ever go back to writing real-time switching systems, I will
no doubt want C++ back again. Each tool has its uses, and C# is, in my
opinion, better suited to most day-to-day programming problems than is
C++. However, there are places that C# won't take you, where C++ is
much better suited.

Jan 14 '07 #40

Jon Skeet [C# MVP]

Peter Olcott <No****@SeeScreen.comwrote:

I carefully read it, yet, did not fully understand the meaning of all of the
terminology that was used. For one thing, I don't see why there is ever any need
for boxing and unboxing. I know that there is no such need in C++. I also know
that it must somehow support GC, and that is why it is needed. I don't see how
it supports GC. Is it something like maintaining a chain of pointers indicating
who owns what?

It's not required for C++ because C++ doesn't have a single type
hierarchy. You can't treat an int as if it were an object.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 14 '07 #41

Jon Skeet [C# MVP]

Peter Olcott <No****@SeeScreen.comwrote:

<snip>

It is good to know that aggregate data can be passed by reference without the
boxing and unboxing overhead, if need be.

Normally aggregate data is stored in a reference type to start with,
where there's no boxing penalty anyway.

Last time you were concerned with the performance penalty of boxing and
unboxing, we proved that in the benchmark you were worried about, the
cost of boxing and unboxing was negligible. Now, do you have a
different evidence-based reason for worrying about the penalty? If not,
I'd suggest you get some more experience (and start profiling) before
worrying about it any more.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 14 '07 #42

Barry Kelly

Peter Olcott wrote:

Couldn't there possibly be a way to create safe code that does not ever require
any extra runtime overhead? Couldn't all the safety checking somehow be done at
compile time?

What safety checking are you talking about? Work with the C# types (such
as 'int') and there won't be any boxing, the work will be done at
compile time, and you won't pay any costs. You only need to worry about
boxing if you need to store an int or other value type into a location
of type 'object' (which isn't too often, but occasionally useful, in my
experience).

If you're talking about creating a pointer to int, and keeping it safe,
then the only verifiable way to do that is via ref parameters we talked
about in another branch of this thread. That's because the compiler can
guarantee things relating to the flow of code. Storing such references
in other structures isn't allowed because it can trivially create
dangling references:

// theoretical field type
ref int _savedX;

void Foo(ref int x)
{
_savedX = ref x;
}

void Bar(int x)
{
Foo(ref x);
}

void Baz()
{
Bar(42);
Console.WriteLine(_savedX); // uh oh - reading from invalid location
}

That's why such reference are restricted to parameters only.

-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #43

Barry Kelly

Peter Olcott wrote:

A strongly type language like C++ effectively prevents any accidental type
errors,

Not all errors. If you take the address of a variable, C++ doesn't do
anything to ensure that the variable you've taken the address of lives
longer than variable which stores the taken address. This is more what I
mean by memory safety, over type safety. It's a far stronger commitment.

The mere existence of access violations in commercial programs is
evidence enough for this.

why bother with more than this?

There is also another class of error: intentional errors, to (e.g.)
violate security when running in a browser as another poster indicated,
or in some hosted process such as a web hosting provider's ASP.NET
context, or in a SQL Server 2005 process, etc.

-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #44

Barry Kelly

Bruce Wood wrote:

However, I don't see how a language that allowed one to take the
address of arbitrary data could implement garbage collection.

It's actually possible, albeit with conservative collection. The
Boehm-Demers-Weiser collector can be linked with C++ to give it GC, for
example.

Once you open up the language to allow arbitrary addressing of objects
and the values within them, you create a nightmare situation for the
garbage collector.

Actually, it wouldn't be that much of a problem, except certain rules
would start applying. For example, if you ever took the address of a
local variable or parameter, that variable would have to be moved out to
the heap behind the scenes (a lot like variable capture in anonymous
delegates). Similarly, taking the address of a field would become an
interior pointer, and would keep the object alive.

Personally, I'm glad that arbitrary addressing was never put into Java
or C#. When I moved from C / C++ to Java I wondered how I would ever do
without the "&" operator

Don't forget that C# has a unary '&' operator in unsafe code.

-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #45

Peter Olcott

"Jesse McGrew" <jm*****@gmail.comwrote in message
news:11**********************@l53g2000cwa.googlegr oups.com...

Peter Olcott wrote:

[1] By "memory-safe", I mean that it's provably impossible to violate
language's memory model. See e.g. type safety on Wikipedia for more
info:

http://en.wikipedia.org/wiki/Type_safety

A strongly type language like C++ effectively prevents any accidental type
errors, why bother with more than this?

Because this also prevents *intentional* type errors, which is
important for running code in a sandbox. Your web browser can guarantee
that a Java applet embedded into a page won't crash your system or
delete all your files, because Java enforces type safety at all levels;
this is the same sort of thing.

Jesse

Ah, I see. So now we can have safe ActiveX components that are embedded in
webpages. There is no longer a tradeoff between the safety of Java and the
functionality of ActiveX.

Jan 14 '07 #46

Peter Olcott

"Jesse McGrew" <jm*****@gmail.comwrote in message
news:11*********************@v45g2000cwv.googlegro ups.com...

Peter Olcott wrote:
>"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@a75g2000cwd.googleg roups.com...
[...]

That's because in C# (and Java) you _can't_ say:

int x = 3;
int *p = &x;

because the "&" operator simply doesn't exist. You can't take the
address of an arbitrary variable.

I still don't see any reason why a completely type safe language can not be
constructed without the need for any runtime overhead. You could even allow
construct such as the above, and still be completely type safe, merely
disallow
type casting.

If you disallow type casting, you neuter the language. You need to be
able to cast instances of derived classes to their bases and back. You
can do the first kind of cast without any runtime overhead, but you
need *some* runtime overhead to cast a base instance to its actual
derived class, even in C++ with dynamic_cast<>.

(The overhead in C++ isn't for performing the actual cast, but for
verifying that the cast is valid - that the object actually belongs to
the class you're casting it to. In C#, that's usually the case, but for
unboxing casts there's also overhead for copying the value out of its
box.)

I was not referring to this kind of type casting. This is not literally type
casting from one entirely different type to another. It looks like the really
dangerous type casting is casting from an integer to a pointer to a function,
this is the kind of type casting that allows malicious code such as viruses and
worms to exist, and take control.

>

Well, more to the point, a language that supports garbage collection
can't allow one to take addresses of arbitrary memory locations, as the
garbage collector could then never determine what objects were
referenced and which weren't (because an address into the midst of an
object would then be legal).

It could do this, but, then you have the issue of reference counting, more
extra
overhead. You don't have this problem when data is simply passed by address
with
no assignment to another pointer variable.

The desire to avoid that overhead (as well as other problems with
reference counting) is, presumably, why .NET uses a garbage collector
instead.

That does not really eliminate reference counting, it merely delegates it to the
GC.

>

Well, sort of. The GC walks the stack and all static objects, looking
for references to objects on the heap. It then follows references
stored in those objects, etc, until it exhausts the network of
references. Any objects left thus unmarked are available for
collection.

Global data is disallowed?

No, that's what "static objects" refers to. In C#, you typically only
store global data by putting it in the static fields of a class. (There
are a couple other types of global data used with C++/CLI: bare global
variables and gcroots.)

Of course, it's rather more complex than that, but you get the idea. If
you allow references into the midst of objects, then it's much more
difficult to decide what is referenced and what isn't.

In C# and Java, every reference that you can directly manipulate in
code is to a valid object on the heap. That's why, if you want to treat
an int as an object (and thus have a reference to it) then the CLR has
to create an object wrapper for it and put it on the heap.

I still don't see any need for a wrapper. Do you mean for reference counting?

The wrapper is there so that the int on the heap can be treated like
any other object, with a type pointer, virtual methods, etc. If it were
just stored on the heap as a plain integer, there'd be no way for your
code (and more importantly, the garbage collector) to tell it apart
from a float or an object reference at runtime.

Okay, now I am getting it.

>
Boxing lets you write a method like this:

public static void PrintIt(object foo)
{
Console.WriteLine("Thanks for this " + foo.GetType().Name + ": " +
foo.ToString());
}

And then pass in *any* value, whether it's an integer, a structure, or
an object reference. An unboxed integer is just a number, with no type
information other than that stored in the compiler's internals; a boxed

Which cease to exist at runtime.

integer is a full-fledged instance of a class derived from
System.Object.

Jesse

Jan 14 '07 #47

Peter Olcott

"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@q2g2000cwa.googlegro ups.com...

>
Peter Olcott wrote:
>"Bruce Wood" <br*******@canada.comwrote in message
news:11**********************@a75g2000cwd.googleg roups.com...
That's because in C# (and Java) you _can't_ say:

int x = 3;
int *p = &x;

because the "&" operator simply doesn't exist. You can't take the
address of an arbitrary variable.

I still don't see any reason why a completely type safe language can not be
constructed without the need for any runtime overhead. You could even allow
construct such as the above, and still be completely type safe, merely
disallow
type casting.

>
I also know that it must somehow support GC, and that is why it is needed.

Well, more to the point, a language that supports garbage collection
can't allow one to take addresses of arbitrary memory locations, as the
garbage collector could then never determine what objects were
referenced and which weren't (because an address into the midst of an
object would then be legal).

It could do this, but, then you have the issue of reference counting, more
extra
overhead. You don't have this problem when data is simply passed by address
with
no assignment to another pointer variable.

C# supports pass-by-reference using the "ref" keyword.

However, I don't see how a language that allowed one to take the
address of arbitrary data could implement garbage collection. Even with
reference counting, the theory is that an _object_ counts references to
itself. An int, however, isn't an object. You're faced with the problem
of an object counting references to itself _or piece of data that it
holds_. How could you engineer a system whereby object A could keep
track of this sort of thing:

int *p = &(A.X);
int *q = p;

How does the object A now know that there are two references to it, p
and q, which point to a field inside A and not to A itself?

I don't see how you could automate this kind of reference counting,
even in C++, but then I'm no C++ guru.

>Is it something like maintaining a chain of pointers indicating who owns
what?

Well, sort of. The GC walks the stack and all static objects, looking
for references to objects on the heap. It then follows references
stored in those objects, etc, until it exhausts the network of
references. Any objects left thus unmarked are available for
collection.

Global data is disallowed?

No. Global data is allowed. That's what I meant by "static".

Of course, it's rather more complex than that, but you get the idea. If
you allow references into the midst of objects, then it's much more
difficult to decide what is referenced and what isn't.

In C# and Java, every reference that you can directly manipulate in
code is to a valid object on the heap. That's why, if you want to treat
an int as an object (and thus have a reference to it) then the CLR has
to create an object wrapper for it and put it on the heap.

I still don't see any need for a wrapper. Do you mean for reference counting?

C# and Java don't do reference counting. They walk the network of
object references at garbage collection time. "Mark and sweep."

I guess a good summary would be to say that the more regular the
situation, the easier it is to write good code to deal with it. By
forcing every collectable object to be the same, and allowing
references only to objects on the heap (apart from pass-by-ref, which
doesn't enter into garbage collection), C# and Java make it easier on
the garbage collector, which allows the GC to be more efficient.

Ah so we could create a new parameter qualifier that works like [out] and [ref]
yet in the opposite direction. We could have an [in] parameter qualifier that
allows all large objects (larger than int) to be passed by reference, yet these
are all read-only objects. The compiler does not allow writing to them. This way
we avoid the unnecessary overhead of making copies of large objects just to
avoid accidentally making changes to these large objects.

>
Once you open up the language to allow arbitrary addressing of objects
and the values within them, you create a nightmare situation for the
garbage collector. Not that a sufficiently clever team of people
couldn't do it, I suppose, but it adds a lot of additional complexity,
and one has to ask exactly what would be gained? Java has demonstrated
that you can write perfectly good code without the ability to take
arbitrary addresses, pointer arithmetic, and the other stuff that C and
C++ pointers provide. There are some domains where the power of C / C++
pointers is arguably a great boon, but for most programming problems it
isn't required. So, you don't lose very much, and you gain a much
simpler garbage collector and better run-time security.

And yes, in .NET 2.0 you can pretty-much avoid boxing (and unboxing)
altogether. It was difficult in .NET 1.1 because all of the standard
collections were collections of Object, and so storing values in a
Hashtable or an ArrayList (aka Vector in C++) meant incurring boxing
overhead. Even in .NET 1.1, however, you could roll your own
collections that didn't box or unbox, but they had to be type-specific.
.NET 2.0's generics (aka templates in C++) eliminate this problem. I
wouldn't say that boxing is a thing of the past, but more than 90% of
boxing in .NET 1.1 was in collections, and that's no longer necessary.

So the runtime penalty is almost non-existent, assuming that you use
appropriate language constructs.

Personally, I'm glad that arbitrary addressing was never put into Java
or C#. When I moved from C / C++ to Java I wondered how I would ever do
without the "&" operator, but I quickly realized that for the type of
software I write (business software) it really isn't needed. If,
however, I ever go back to writing real-time switching systems, I will
no doubt want C++ back again. Each tool has its uses, and C# is, in my
opinion, better suited to most day-to-day programming problems than is
C++. However, there are places that C# won't take you, where C++ is
much better suited.

It might be possible to design a language that has essentially all of the
functionally capabilities of the lower level languages, without the requirement
of ever directly dealing with pointers. I myself have always avoided pointers,
(since the early 1980's) they were always too difficult to debug. Instead of
using pointers I used static arrays, at least in this case I could print out the
subscripts. Now that I know C++, I can still avoid pointers by using the STL
constructs.

Jan 14 '07 #48

Peter Olcott

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...

Peter Olcott <No****@SeeScreen.comwrote:

<snip>

>It is good to know that aggregate data can be passed by reference without the
boxing and unboxing overhead, if need be.

Normally aggregate data is stored in a reference type to start with,
where there's no boxing penalty anyway.

Last time you were concerned with the performance penalty of boxing and
unboxing, we proved that in the benchmark you were worried about, the
cost of boxing and unboxing was negligible. Now, do you have a
different evidence-based reason for worrying about the penalty? If not,
I'd suggest you get some more experience (and start profiling) before
worrying about it any more.

I want to fully understand exactly how the underlying architecture works so that
I can design it from the ground up using the best means. With C++ I already know
exactly what kind of machine code that anything and everything will translate
into. I need to acquire this degree of understanding of .NET before I begin
using it.

The systems that I am developing are not business information systems where
something can be 10,000-fold slower than necessary and there is no way for
anyone to notice the difference. In some cases a two-fold difference in the
speed of an elemental operation can noticeably effect response time. I am not
comfortable switching to C# until I know every detail of exactly how to at least
match the performance of native code C++.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Jan 14 '07 #49

Peter Olcott

"Barry Kelly" <ba***********@gmail.comwrote in message
news:96********************************@4ax.com...

Peter Olcott wrote:

>Couldn't there possibly be a way to create safe code that does not ever
require
any extra runtime overhead? Couldn't all the safety checking somehow be done
at
compile time?

What safety checking are you talking about? Work with the C# types (such
as 'int') and there won't be any boxing, the work will be done at
compile time, and you won't pay any costs. You only need to worry about
boxing if you need to store an int or other value type into a location
of type 'object' (which isn't too often, but occasionally useful, in my
experience).

If you're talking about creating a pointer to int, and keeping it safe,
then the only verifiable way to do that is via ref parameters we talked
about in another branch of this thread. That's because the compiler can
guarantee things relating to the flow of code. Storing such references
in other structures isn't allowed because it can trivially create
dangling references:

// theoretical field type
ref int _savedX;

void Foo(ref int x)
{
_savedX = ref x;
}

void Bar(int x)
{
Foo(ref x);
}

void Baz()
{
Bar(42);
Console.WriteLine(_savedX); // uh oh - reading from invalid location
}

That's why such reference are restricted to parameters only.

That would seem to be a fine restriction. Now if we can only add an [in]
parameter qualifier that passes all large objects by reference, yet makes them
read-only. Objects the size of [int] or smaller can be passed by value, yet
still as read-only. The compiler flags all write access to these [in] parameters
as an error, at compile time. We can do the same sort of thing for the [out]
parameter qualifier, (now all data is passed by reference, and is read-write)
and thus have no need for the [ref] parameter qualifier.

Now the programmer could do the right thing with these parameters without even
the need for understanding the underlying mechanisms of pass by value or pass by
reference. Now it becomes pass by I-want-to-change-it and pass by
I-want-to-make-sure-it-wont-be-changed.

>
-- Barry

--
http://barrkel.blogspot.com/

Jan 14 '07 #50

Similar topics