/CLR floating point performance, inter-assembly function call performance

Bern McCarty

I have run an experiment to try to learn some things about floating point
performance in managed C++. I am using Visual Studio
2003. I was hoping to get a feel for whether or not it would make sense to
punch out from managed code to native code (I was using
IJW) in order to do some amount of floating point work and, if so, what that
certain amount of floating point work was
approximately.

To attempt to do this I made a program that applys a 3x3 matrix to an array
of 3D points (all doubles here folks). The program
contains a function that applies 10 different matrices to the same test data
set of 5,000,000 3D points. It does this by invoking
another workhorse function that does the actual floating point operations.
That function takes an input array of 3D points, an
output array of 3D points, a point count, and the matrix to use. There are
no __gc types in this program. It's just pointers and
structs and native arrays. The outer test function looks like this:

void test_applyMatrixToDPoints(TestData *tdP, int ptsPerMultiply)
{
int jIterations = tdP->pointCnt / ptsPerMultiply;
for (int i = 0 ; i < tdP->matrixCnt ; ++i)
{
for (int j = 0 ; j < jIterations; ++j)
{
// managed-to-native transitions happen here in V2
DMatrix3d_multiplyDPoint3dArray(tdP->matrices + i,
&tdP->outPts[j*ptsPerMultiply],
&tdP->inPts[j*ptsPerMultiply],
ptsPerMultiply);
}
}
}

The program calls the above routine 8 times and records the time elapsed
during each call. On the first call the above function
calls the workhorse function only once for each of the 10 matrices. In
other words, it applies a matrix to all of the 5,000,000
points in the test data set with a single call to the other workhorse
function. In the next call to the above function it passes
only 50,000 points per-call to the other routine, then 5,000, then 500, et
cetera, until we get all of the way down to 5, and then
finally 1 where there is a function call to
DMatrix3d_multiplyDPoint3dArray() for each and every of the 5,000,000 3D
points in the
test data set.

I was hoping someone could help interpret the results. At first I made 3
versions of this program. In all 3 of these versions
the DMatrix3d_multiplyDPoint3dArray function was in a geometry.dll and the
rest of the code was in my test.exe. The 3 versions
were merely different combinations of native versus IL for the two
executables:

test.exe geometry.dll (contains workhorse function)
-------- ----------------
v1) native native
v2) managed native
v3) managed managed

Here are the results. All numbers are elapsed time in seconds for calls to
the outer function described.

Native->Native:
0.953
0.968
0.968
0.953
0.968
0.952
1.093
1.39
Final run is 146% of first run.
Final run is 127% of previous run

Managed->Native
0.968
0.968
0.968
0.969
0.968
0.968
1.124
1.952
Final run is 202% of first run.
Final run is 174% of previous run

Managed->Managed
0.984
1.016
0.985
1
1
1.032
1.516
4.469
Final run is 454% of first run.
Final run is 295% of previous run

This surprised me in two ways. First, I thought that for version 2 the
penalty imposed by managed->native transitions would be
worse. It's there, you can see performance drop off more as the call
granularity becomes very fine toward the end, but it isn't
as much as I might have guessed it would be. More surprising was that the
managed->managed version, which didn't have any
manged->native transitions slowing it down at all, dropped off far worse!
The early calls to the test function compare very
closely between versions 2 and 3, suggesting that the raw floating point
performance of the managed versus native workhorse
function is quite similar. So this seemed to point the finger at function
call overhead. For some reason function call overhead
is just higher for managed code than for native? On a hunch I decided to
make a 4rth version of the program that was also
managed->managed but which eliminated the inter-assembly call. Instead I
just linked everything from geometry.dll right into
test.exe. It made a big difference. The results are below. Is there some
security/stack-walking stuff going on in the inter-DLL
case maybe? Or does it really make sense that managed, inter-assembly calls
are that much slower than the equivalent
intra-assembly call? Explanations welcomed. The inter-assembly version
takes 217% of the time that the intra-assembly version
takes on the final call when the call granularity is fine. That seems
awfully harsh.

Managed->Managed (one big test.exe)
1
0.999
0.984
1.015
0.984
1.015
1.093
2.061
Final run is 206% of first run.
Final run is 189% of previous run.

Even with the improvement yielded by eliminating the inter-assembly calls,
the relative performance between the version that has
to make managed->native transitions and the all managed version is difficult
for me to comprehend. What is it with
managed->managed function call overhead that seems worse even than
managed->native function call overhead?

I tried to make sure that page faults weren't affecting my test runs and the
results I got were very consistent from run to run.

Bern McCarty
Bentley Sytems, Inc.

P.S. For the curious, here is what DMatrix3d_multiplyDPoint3dArray looks
like. There are no function calls made and it is all compiled into IL.

Public void DMatrix3d_multiplyDPoint3dArray
(
const DMatrix3d *pMatrix,
DPoint3d *pResult,
const DPoint3d *pPoint,
int numPoint
)
{
int i;
double x,y,z;
DPoint3d *pResultPoint;

for (i = 0, pResultPoint = pResult;
i < numPoint;
i++, pResultPoint++
)
{
x = pPoint[i].x;
y = pPoint[i].y;
z = pPoint[i].z;

pResultPoint->x = pMatrix->column[0].x * x
+ pMatrix->column[1].x * y
+ pMatrix->column[2].x * z;

pResultPoint->y = pMatrix->column[0].y * x
+ pMatrix->column[1].y * y
+ pMatrix->column[2].y * z;

pResultPoint->z = pMatrix->column[0].z * x
+ pMatrix->column[1].z * y
+ pMatrix->column[2].z * z;

}

Nov 17 '05 #1

Subscribe Reply

4092

Yan-Hong Huang[MSFT]

Hello Bern,

Generally speaking, the v1 JIT does not currently perform all the
FP-specific optimizations that the VC++ backend does, making floating point
operations more expensive for now. That may be why managed->managed is more
expensive than managed->unmanaged in your test.

So for areas which make heavy use of floating point arithmetic, please use
profilers to pick the fragments where the overhead is costing you most, and
Keep the whole fragment in unmanaged space.

Also, work to minimize the number of transitions you make. If you have some
unmanaged code or an interop call sitting in a loop, make the entire loop
unmanaged. That way you'll only pay the transition cost twice, rather than
for each iteration of the loop.

By looking into ILCode, we can see that when InterOping, there are some
extra IL instructions. So minimizing the number of transitions can save
many IL instructions and improve performance.

For some more information, you can refer to this chapter online:
"Chapter 7 ¡ª Improving Interop Performance"
http://msdn.microsoft.com/library/en...pt07.asp?frame
=true#scalenetchapt07 _topic12

Hope that helps.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #2

Bern McCarty

From reading various things I had already recognized the things that you
state as the current conventional wisdom. I went to the trouble to post my
results in the hopes of getting some feedback on why it might be that my
results run very much against that conventional wisdom. Please consider:

1) Floating point performance of managed code. At least in this little
test scenario floating point performance of managed code doesn't seem to be
a problem at all. In the first call out of the 8 in a test run the
DMatrix3d_multiplyDPoint3dArray function is asked to apply the matrix to a
whopping 5,000,000 3D points per call. So it is just sitting there doing
floating point operations in a 5,000,000 iteration loop and there are no
function calls in that loop at all. The managed version took only 3% longer
in that case than the all native version. It seems logical then to rule out
floating point performance as the culprit when things quickly change for the
worse in the later calls where the call granularity to
DMatrix3d_multiplyDPoint3dArray becomes very fine. It makes more sense to
assign the slowdown observed in the fine-grained call cases on function call
overhead, not on floating point performance.

2) The expense of transitions. What am I doing wrong? The version of my
test program that involves a transition in the call from
test_applyMatrixToDPoints->DMatrix3d_multiplyDPoint3dArray is actually
FASTER than the all managed version (true for both the intra-assembly and
inter-assembly call cases). Furthermore, the more finely-grained the calls
are the more the native->managed version outperforms the managed-managed
versions. Since we already established that raw floating point performance
of the loop inside of the DMatrix3d_multiplyDPoint3dArray function is very
equivalent between the managed and native versions, and the conventional
wisdom is that native->managed transitions are expensive and bad, then what
is to blame for the poor relative performance of the managed->managed
versions? The managed->managed version is flat-out beaten by the version
that does a transition for each and every call. It would seem that there is
some serious penalty associated with making regular managed->managed
function calls - not managed->native calls. What might be responsible for
it and is it something I have any control over?

3) The surprising difference in cost between inter-assembly and
intra-assembly managed->managed calls. Can someone explain this difference
and is there anything that can be done about it besides making my program
one enormous executable?

4) How can I step through JIT compiled code in assembly language in a
debugger for a release executable so that I can see what is going on? I
want the JIT to produce "non debug" x86 instructions and yet I want to step
through them to see what they do. Tips appreciated. Can I do this with the
VS.NET debugger? Windbg? How?

"Yan-Hong Huang[MSFT]" <yh*****@online.microsoft.com> wrote in message
news:kG**************@cpmsftngxa10.phx.gbl...

Hello Bern,

Generally speaking, the v1 JIT does not currently perform all the
FP-specific optimizations that the VC++ backend does, making floating point operations more expensive for now. That may be why managed->managed is more expensive than managed->unmanaged in your test.

So for areas which make heavy use of floating point arithmetic, please use
profilers to pick the fragments where the overhead is costing you most, and Keep the whole fragment in unmanaged space.

Also, work to minimize the number of transitions you make. If you have some unmanaged code or an interop call sitting in a loop, make the entire loop
unmanaged. That way you'll only pay the transition cost twice, rather than
for each iteration of the loop.

By looking into ILCode, we can see that when InterOping, there are some
extra IL instructions. So minimizing the number of transitions can save
many IL instructions and improve performance.

For some more information, you can refer to this chapter online:
"Chapter 7 ¡ª Improving Interop Performance"
http://msdn.microsoft.com/library/en...pt07.asp?frame =true#scalenetchapt07 _topic12

Hope that helps.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #3

Yan-Hong Huang[MSFT]

Hi Bern,

By using ildasm.exe, you can look into the IL code of the assembly to see
the difference between inter-assembly and intra-assembly managed->managed
calls.

At the same time, I have forwarded your questions to our product team for
their opinions on it. I will return here as soon as possilble.

Thanks.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #4

Kang Su Gatlin [MS]

Bern, you're seeing what looks like a manifestation of the "double thunk"
(aka "double p/invoke") problem. The problem is that when your managed
code calls the managed code in the DLL, it first goes through a native stub
(when using the Win32 DLL mechanism), so you ended up transitioning from
managed to native and then back to managed.

Try #using the DLL which you have compiled managed, rather than the
standard Win32 DLL mechanism. This should help. Let us know if that
helps, or if this makes no sense.
Thanks,

Kang Su Gatlin
Visual C++ Program Manager

--------------------
| From: "Bern McCarty" <be**********@bentley.com>
| References: <eS**************@TK2MSFTNGP09.phx.gbl>
<kG**************@cpmsftngxa10.phx.gbl>
| Subject: Re: /CLR floating point performance, inter-assembly function
call performance
| Date: Thu, 6 May 2004 08:59:11 -0400

|
| From reading various things I had already recognized the things that you
| state as the current conventional wisdom. I went to the trouble to post
my
| results in the hopes of getting some feedback on why it might be that my
| results run very much against that conventional wisdom. Please consider:
|
| 1) Floating point performance of managed code. At least in this little
| test scenario floating point performance of managed code doesn't seem to
be
| a problem at all. In the first call out of the 8 in a test run the
| DMatrix3d_multiplyDPoint3dArray function is asked to apply the matrix to a
| whopping 5,000,000 3D points per call. So it is just sitting there doing
| floating point operations in a 5,000,000 iteration loop and there are no
| function calls in that loop at all. The managed version took only 3%
longer
| in that case than the all native version. It seems logical then to rule
out
| floating point performance as the culprit when things quickly change for
the
| worse in the later calls where the call granularity to
| DMatrix3d_multiplyDPoint3dArray becomes very fine. It makes more sense to
| assign the slowdown observed in the fine-grained call cases on function
call
| overhead, not on floating point performance.
|
| 2) The expense of transitions. What am I doing wrong? The version of my
| test program that involves a transition in the call from
| test_applyMatrixToDPoints->DMatrix3d_multiplyDPoint3dArray is actually
| FASTER than the all managed version (true for both the intra-assembly and
| inter-assembly call cases). Furthermore, the more finely-grained the
calls
| are the more the native->managed version outperforms the managed-managed
| versions. Since we already established that raw floating point performance
| of the loop inside of the DMatrix3d_multiplyDPoint3dArray function is very
| equivalent between the managed and native versions, and the conventional
| wisdom is that native->managed transitions are expensive and bad, then
what
| is to blame for the poor relative performance of the managed->managed
| versions? The managed->managed version is flat-out beaten by the version
| that does a transition for each and every call. It would seem that there
is
| some serious penalty associated with making regular managed->managed
| function calls - not managed->native calls. What might be responsible for
| it and is it something I have any control over?
|
| 3) The surprising difference in cost between inter-assembly and
| intra-assembly managed->managed calls. Can someone explain this
difference
| and is there anything that can be done about it besides making my program
| one enormous executable?
|
| 4) How can I step through JIT compiled code in assembly language in a
| debugger for a release executable so that I can see what is going on? I
| want the JIT to produce "non debug" x86 instructions and yet I want to
step
| through them to see what they do. Tips appreciated. Can I do this with
the
| VS.NET debugger? Windbg? How?
|
| "Yan-Hong Huang[MSFT]" <yh*****@online.microsoft.com> wrote in message
| news:kG**************@cpmsftngxa10.phx.gbl...
| > Hello Bern,
| >
| > Generally speaking, the v1 JIT does not currently perform all the
| > FP-specific optimizations that the VC++ backend does, making floating
| point
| > operations more expensive for now. That may be why managed->managed is
| more
| > expensive than managed->unmanaged in your test.
| >
| > So for areas which make heavy use of floating point arithmetic, please
use
| > profilers to pick the fragments where the overhead is costing you most,
| and
| > Keep the whole fragment in unmanaged space.
| >
| > Also, work to minimize the number of transitions you make. If you have
| some
| > unmanaged code or an interop call sitting in a loop, make the entire
loop
| > unmanaged. That way you'll only pay the transition cost twice, rather
than
| > for each iteration of the loop.
| >
| > By looking into ILCode, we can see that when InterOping, there are some
| > extra IL instructions. So minimizing the number of transitions can save
| > many IL instructions and improve performance.
| >
| > For some more information, you can refer to this chapter online:
| > "Chapter 7 ¡ª Improving Interop Performance"
| >
|
http://msdn.microsoft.com/library/en...pt07.asp?frame
| > =true#scalenetchapt07 _topic12
| >
| > Hope that helps.
| >
| > Best regards,
| > Yanhong Huang
| > Microsoft Community Support
| >
| > Get Secure! ¨C www.microsoft.com/security
| > This posting is provided "AS IS" with no warranties, and confers no
| rights.
| >
|
|
|

Nov 17 '05 #5

Yan-Hong Huang[MSFT]

Hello Bern,

Are you still monitoring this thread? We just hold a discusstion between
PSS, SDE and PM.

The listed matrix of tested combination is this:

test.exe geometry.dll (contains workhorse function)
-------- ----------------
v1) native native
v2) managed native
v3) managed managed

The key is that we think that the third variation is using exported
functions and an import library to call the function in geometry.dll, as is
certainly the case with the first two. If this is the case, then it is
mistaken that there are no transitions in this scenario. In fact, there are
twice as many transitions in variation 3 as in variation 2. The reason for
this is the import libraries. Import libraries are a native construct. Any
time a function call is made from managed code to a DLL through a stub in
the import lib, a managed-native transition must happen. And then, since
the actual implementation of the function in the DLL is managed, there must
be another transition back to managed. This is very costly, as you found
out.

The good news is that there is a way around these transitions for the
managed/managed case. Here is a small example:

Code for DLL:
public __value class Utils { // Must have a public
managed type (__value or __gc)
public:
static int func(int i, int j) { // Must be static
unless you don't mind creating instances
return i + j;
}
};

Code for EXE:
#using <testdll.dll> // Pull in the types
defined in assembly testdll.dll

int main() {
return Utils::func(0, 0); // Call the function
}

This will eliminate all transitions from the call from the exe into the DLL.

I will email our SDE and let him look into this post also. If you have any
more concerns, please feel free to post here. Or you can contact us by
removing online from my email address here. Thanks very much.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #6

Bern McCarty

Yes I'm here. Thanks for the answer. That makes a certain amount of sense.
I'll see if I can verify it.

I gather than in Whidbey the performance of my inter-assembly,
managed->managed version would be much better without my changing anything.
Yes?

-Bern

"Yan-Hong Huang[MSFT]" <yh*****@online.microsoft.com> wrote in message
news:JX**************@cpmsftngxa10.phx.gbl...

Hello Bern,

Are you still monitoring this thread? We just hold a discusstion between
PSS, SDE and PM.

The listed matrix of tested combination is this:

test.exe geometry.dll (contains workhorse function)
-------- ----------------
v1) native native
v2) managed native
v3) managed managed

The key is that we think that the third variation is using exported
functions and an import library to call the function in geometry.dll, as is certainly the case with the first two. If this is the case, then it is
mistaken that there are no transitions in this scenario. In fact, there are twice as many transitions in variation 3 as in variation 2. The reason for
this is the import libraries. Import libraries are a native construct. Any
time a function call is made from managed code to a DLL through a stub in
the import lib, a managed-native transition must happen. And then, since
the actual implementation of the function in the DLL is managed, there must be another transition back to managed. This is very costly, as you found
out.

The good news is that there is a way around these transitions for the
managed/managed case. Here is a small example:

Code for DLL:
public __value class Utils { // Must have a public
managed type (__value or __gc)
public:
static int func(int i, int j) { // Must be static
unless you don't mind creating instances
return i + j;
}
};

Code for EXE:
#using <testdll.dll> // Pull in the types
defined in assembly testdll.dll

int main() {
return Utils::func(0, 0); // Call the function
}

This will eliminate all transitions from the call from the exe into the DLL.
I will email our SDE and let him look into this post also. If you have any
more concerns, please feel free to post here. Or you can contact us by
removing online from my email address here. Thanks very much.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #7

Bern McCarty

I tried to take the suggestion of doing a #using <geometry.dll> instead of
including the corresponding header files, but when I did that the result
would not compile:

C:\mycode\geomTest\\test.cpp(74) : error C3861:
'DMatrix3d_multiplyDPoint3dArray': identifier not found, even with
argument-dependent lookup

Then I thought, well, maybe I should include the header files AND do a
#using <geometry.dll> but make sure to NOT link with the geometry.lib. Then
the problem just moves to link time:

test.obj : error LNK2001: unresolved external symbol "void __cdecl
DMatrix3d_multiplyDPoint3dArray(struct _dMatrix3d const *,struct _dPoint3d
*,struct _dPoint3d const *,int)" (?
bsiDMatrix3d_multiplyDPoint3dArray@@$$J0YAXPBU_dMa trix3d@@PAU_dPoint3d@@PBU2
@H@Z)

Here is what I can find on the function in the disassembled geometry.dll (I
omitted the body):

..method /*0600003F*/ public static void modopt([mscorlib/* 23000001
*/]System.Runtime.CompilerServices.CallConvCdecl/* 01000001 */)
DMatrix3d_multiplyDPoint3d(valuetype _dMatrix3d/* 02000005 */
modopt([Microsoft.VisualC/* 23000002 */]Microsoft.VisualC.IsConstModifier/*
01000002 */)* pMatrix,
valuetype _dPoint3d/* 02000006 */*
pPoint) cil managed
// SIG: 00 02 20 05 01 0F 20 09 11 14 0F 11 18

Perhaps I am doing something wrong, but it appears to me that you cannot
supply the compiler/linker with the information that it needs to call global
functions that were compiled into IL via /CLR by simply referencing the
assembly at compile time. Does that mean that to avoid the inter-assembly
double-P/Invoke that I have no choice but to wrap all of the functionality
in my geometry library in GC classes? That would be a shame since I am able
to call it as is just fine - it is just that it is too slow.

Will the double P/Invoke that I am seeing in this case go away as of
Whidbey?

-Bern

"Bern McCarty" <be**********@bentley.com> wrote in message
news:eu**************@tk2msftngp13.phx.gbl...

Yes I'm here. Thanks for the answer. That makes a certain amount of sense. I'll see if I can verify it.

I gather than in Whidbey the performance of my inter-assembly,
managed->managed version would be much better without my changing anything. Yes?

-Bern

"Yan-Hong Huang[MSFT]" <yh*****@online.microsoft.com> wrote in message
news:JX**************@cpmsftngxa10.phx.gbl...
Hello Bern,

Are you still monitoring this thread? We just hold a discusstion between
PSS, SDE and PM.

The listed matrix of tested combination is this:

test.exe geometry.dll (contains workhorse function)
-------- ----------------
v1) native native
v2) managed native
v3) managed managed

The key is that we think that the third variation is using exported
functions and an import library to call the function in geometry.dll, as

is
certainly the case with the first two. If this is the case, then it is
mistaken that there are no transitions in this scenario. In fact, there

are
twice as many transitions in variation 3 as in variation 2. The reason for this is the import libraries. Import libraries are a native construct. Any time a function call is made from managed code to a DLL through a stub in the import lib, a managed-native transition must happen. And then, since
the actual implementation of the function in the DLL is managed, there

must
be another transition back to managed. This is very costly, as you found
out.

The good news is that there is a way around these transitions for the
managed/managed case. Here is a small example:

Code for DLL:
public __value class Utils { // Must have a public
managed type (__value or __gc)
public:
static int func(int i, int j) { // Must be static
unless you don't mind creating instances
return i + j;
}
};

Code for EXE:
#using <testdll.dll> // Pull in the types defined in assembly testdll.dll

int main() {
return Utils::func(0, 0); // Call the function }

This will eliminate all transitions from the call from the exe into the

DLL.

I will email our SDE and let him look into this post also. If you have any more concerns, please feel free to post here. Or you can contact us by
removing online from my email address here. Thanks very much.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no

rights.

Nov 17 '05 #8

Yan-Hong Huang[MSFT]

Hi Bern,

Based on my experience, the best way is to verify it by testing on Whidbey.
You can install one in MSDN subscriber download.

For the second issue, I think you need to use __GC wrapper class to export
it. Please refer to MSDN for the info of it. I think this is the
"MCppWrapper Sample: Demonstrates Wrapping a C++ DLL with Managed
Extensions"
http://msdn.microsoft.com/library/de...us/vcsample/ht
ml/vcsammcppwrappersampledemonstrateswrappingcdllwith managedextensions.asp

Do you have any more concerns on the performance issue yet? If yes, please
feel free to post here. I am glad to work with you on it. Thanks very much.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #9

Bern McCarty

Yes I am concerned about performance. I had hoped that IJW could be used to
compile nearly all of our existing C++ application into IL and that that
would eliminate the need for many managed->native transitions and would also
free us to begin using GC types throughout our application over time. Our
application consists of quite a number of .dlls and there are tons of
inter-dll calls. But now what I've learned is that, though the code
compiles, links and runs, every inter-dll call is suffering the double
P/Invoke problem so indeed my code is littered with managed->unmanaged
transitions.

Sure I could wrap every single function/method in my entire application in a
GC class, but then IJW isn't at all suitable for what I thought it was. Like
I said, it compiles, links and runs which is impressive. It's just too slow
and that's too bad. I would still like an answer to know if the double
P/Invoke problem will really be fixed in the final Whidbey release. I saw
where someone from Microsoft hedged on that saying that it might not be. I
hope that is not the case.

As for the Visual Studio 2005 Tech Preview on MSDN, I 've already looked at
it. I had so much trouble with it I gave up on it. I found myself editing
delivered headers just to try to get stuff to compile. Then the result
would crash. I haven't seen anyone else posting C++ issues in here that
related to this Whidbey build and I kind of reached the conclusion that the
VC++ team didn't really circle the wagons for this particular build. I can
only assume that they have other better quality builds that people in other
programs have access to. I also found that "search" did not work for the
MSDN library that came with the build and I find that terribly crippling.

Bern McCarty
Bentley Systems, Inc.
"Yan-Hong Huang[MSFT]" <yh*****@online.microsoft.com> wrote in message
news:Kd**************@cpmsftngxa10.phx.gbl...

Hi Bern,

Based on my experience, the best way is to verify it by testing on Whidbey. You can install one in MSDN subscriber download.

For the second issue, I think you need to use __GC wrapper class to export
it. Please refer to MSDN for the info of it. I think this is the
"MCppWrapper Sample: Demonstrates Wrapping a C++ DLL with Managed
Extensions"
http://msdn.microsoft.com/library/de...us/vcsample/ht ml/vcsammcppwrappersampledemonstrateswrappingcdllwith managedextensions.asp

Do you have any more concerns on the performance issue yet? If yes, please
feel free to post here. I am glad to work with you on it. Thanks very much.
Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #10

Yan-Hong Huang[MSFT]

Hello Bern,

I have contacted our VC++ developer on it. Unfortunately, under this
situation, it will still cost that much even in VS Whidbey. Here is the
response from the developer.

-----------------------------

It will still cost that much. The problem is that the managed DLL loading
mechanism only exposes managed types not stand alone functions, and
exported functions must be callable from native code because of the import
lib. Then the kicker is that native code cannot use or call to functions
that are members of native types.

If the API needs to be exposed to both native and managed code through one
dll, it is possible to do this. Here is my trivial example again, extended
to expose a managed interface, and a native interface.

File: testdll.cpp

Compile: cl /clr /LD testdll.cpp
// managed interface
public __value class Utils {
public:
static int func(int i, int j) {
return i + j;
}
};

// native interface
__declspec(dllexport)
int func(int i, int j) {
return Utils::func(i, j);
}

File: managed.cpp

Compile: cl /clr managed.cpp
// Use the managed mechanism to access the API. Note that #using pulls in
types, not standalone functions,
// so our API must be a member of a managed type, in this case the class
Utils. Also, we have not linked
// to the import lib.
#using <testdll.dll>

int main() {
return Utils::func(0, 0);
}

File: native.cpp

Compile: cl native.cpp /link testdll.lib
// Now we link with the import lib.
int func(int, int);
int main() {
return func(0, 0);
}

-----------------------------

I totally understand that you need a lot of work to migrate the code to
managed c++ wrapper class if the performance is quite important for you.
However, from what we have discussed till now, it seems there is no other
easy way to implement it yet. As that developer mentioned, managed DLL
loading mechanism only exposes managed types not stand alone functions. For
the time being, in order to improve the performance of inter-assembly
calls, we need to implement the exported functions as wrapper class
functions.

If you have any more concerns on it, please feel free to post here. Thanks
very much for your understanding.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #11

Bern McCarty

Thank you for the answer. It was very important for me to know that this
will remain the same in Whidbey. The Quake II .NET effort
(http://www.codeproject.com/managedcpp/Quake2.asp) is something I've seen
Microsoft demonstrate a couple of times. I found that quite compelling. I
guess the Quake II .NET program suffers from the double P/Invoke on it's
inter-mixed-mode-assembly calls too then? Or did they wrap everything that
is called inter-assembly and then change every call-site to call the wrapped
functions instead?

It seems to me that the suggestion for how to wrap things is backwards. My
implementation methods already exist. I don't want to touch them. I just
want to wrap them as they are. I would think to do it like this instead:

Compile: cl /clr /LD testdll.cpp
// managed interface
public __value class Utils {
public:
static int func(int i, int j) {
myNameSpace::func(i, j);
}
};

namespace myNameSpace {
// native interface
__declspec(dllexport)
int func(int i, int j) {
return i + j;
}
}

After seeing Quake II .NET my thoughts were that I could take our large,
complex and extensible multi-dll program and arrange our build process so
that we could experiment with a /CLR compiled version for a good long time
while we learned the ins and outs of C++ interop and how to begin to
introduce GC types into it's implementation and its documented interfaces.
Furthermore I thought that I could slowly add the /CLR switch to individual
source files as I found time to conquer them. That has turned out to be a
fair amount of work for us because many of our source files are in fact .c
files. When compiling them with /CLR I ran into problems at link time and
ultimately realized that Microsoft was desupporting the /CLR switch for C
source code anyway. Then I realized that modules needed to be converted to
C++ prior to adding the /CLR switch and that's where it begins to take quite
a bit of effort in such a large application.

But since these many .dlls call each other, and since to get decent
inter-assembly call performance I need to wrap all my functions as static
methods of a GC type and, here is the kicker, since I have to then change
each and every call site to functions that have been compiled with /CLR to
instead call the GC wrapped versions of the functions, it now becomes a
rather complex task to effect this transition slowly over time. You have to
leave call-sites that aren't compiled into IL yet alone, yet you have to
alter all the others. The equation of what is IL versus x86 is always
changing as I manage to add the /CLR switch to new modules. I guess I would
have to arrange to use the preprocessor to substitute the right calls with
calls to the wrapped versions. Certainly possible, but a fair amount of
work.

It seems like lots of folks that see Quake II .NET are going to take it to
heart and try the same thing that I have tried and ultimately end up facing
the very same problem. It would be nice if the linker could optionally
generate and include GC wrappers for my exported functions. Imagine that I
supply a namespace::classname for a GC class that I want the linker to
create and then the linker dutifully adds static methods to that class which
simply wrap each of my exports. It could even generate a .h file for me that
mapped the native name to the appropriate method in the generated GC wrapper
class.

Then I could maintain both the tranditional all native build of my
application and a piece-meal mixed-mode build of our application while we
work toward adding the /CLR switch to virtually everything in the
application. I'm trying to follow the Quake II .NET lead, but my app is a
lot larger and more complex and doing it all in one fell swoop isn't
practical.

Bern McCarty
Bentley Systems, Inc.

"Yan-Hong Huang[MSFT]" <yh*****@online.microsoft.com> wrote in message
news:Ws**************@cpmsftngxa10.phx.gbl...

Hello Bern,

I have contacted our VC++ developer on it. Unfortunately, under this
situation, it will still cost that much even in VS Whidbey. Here is the
response from the developer.

-----------------------------

It will still cost that much. The problem is that the managed DLL loading
mechanism only exposes managed types not stand alone functions, and
exported functions must be callable from native code because of the import
lib. Then the kicker is that native code cannot use or call to functions
that are members of native types.

If the API needs to be exposed to both native and managed code through one
dll, it is possible to do this. Here is my trivial example again, extended
to expose a managed interface, and a native interface.

File: testdll.cpp

Compile: cl /clr /LD testdll.cpp
// managed interface
public __value class Utils {
public:
static int func(int i, int j) {
return i + j;
}
};

// native interface
__declspec(dllexport)
int func(int i, int j) {
return Utils::func(i, j);
}

File: managed.cpp

Compile: cl /clr managed.cpp
// Use the managed mechanism to access the API. Note that #using pulls in
types, not standalone functions,
// so our API must be a member of a managed type, in this case the class
Utils. Also, we have not linked
// to the import lib.
#using <testdll.dll>

int main() {
return Utils::func(0, 0);
}

File: native.cpp

Compile: cl native.cpp /link testdll.lib
// Now we link with the import lib.
int func(int, int);
int main() {
return func(0, 0);
}

-----------------------------

I totally understand that you need a lot of work to migrate the code to
managed c++ wrapper class if the performance is quite important for you.
However, from what we have discussed till now, it seems there is no other
easy way to implement it yet. As that developer mentioned, managed DLL
loading mechanism only exposes managed types not stand alone functions. For the time being, in order to improve the performance of inter-assembly
calls, we need to implement the exported functions as wrapper class
functions.

If you have any more concerns on it, please feel free to post here. Thanks
very much for your understanding.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #12

Yan-Hong Huang[MSFT]

Hi Bern,

We will do our best to see how Quake works in that way. Your idea is also
good and I will forward that to the product group. We are also seeking for
the most convenient way to migrate code efficiently.

Thanks very much.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #13

Yan-Hong Huang[MSFT]

Hello Bern,

Do you have any more concerns on it? If there is any we can do , please
feel free to post here.

Thanks very much.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! ¨C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 17 '05 #14

/CLR floating point performance, inter-assembly function call performance

Similar topics