1.1. One of our libraries is a mixed-mode dll assembly consisting of
one managed C++ library, and several unmanaged C++ libraries. We are
using managed C++ as a bridge between managed .NET code and unmanaged
C++ code, which I'm sure is a fairly common practice. The managed C++
library is compiled with /CLR whereas all other libraries are compiled
without /CLR because they are strictly native C++ code.
Now let me take you on a journey.
A class like the following is written in unmanaged C++:
////////////////////////////////////////////////////////////////
/// UnmanagedClass.h
class UnmanagedClass
{
public:
virtual bool returnsFalse() const;
};
/// UnmanagedClass.cpp
bool
UnmanagedClass::returnsFalse() const
{
return false;
}
////////////////////////////////////////////////////////////////
then consider the following managed code:
////////////////////////////////////////////////////////////////
/// ManagedClass.cpp
void
ManagedClass::foo()
{
UnmanagedClass* obj = new UnmanagedClass();
bool res = obj->returnsFalse();
assert(!res); // <= this assertion fails
}
////////////////////////////////////////////////////////////////
I have been able to consistently reproduce this problem by running an
example very similar to the above *before* executing any of my other
code. At first I could hardly believe what I was seeing -- I actually
watched a function return "false", but the caller got back "true"! Not
good at all. Looking at the machine code, the "returnsFalse" function
is as follows:
07778D90 push ebp
07778D91 mov ebp,esp
07778D93 sub esp,8
07778D96 mov dword ptr [ebp-8],0CCCCCCCCh
07778D9D mov dword ptr [ebp-4],0CCCCCCCCh
07778DA4 mov dword ptr [ebp-4],ecx
07778DA7 xor al,al
07778DA9 mov esp,ebp
07778DAB pop ebp
07778DAC ret 4
////////////////////////////////////////////////////////////////
The boolean result of the function is returned in the "AL" register,
which does equal 0 at the time when the "ret" instruction is executed.
Here are all the register values at the time "ret" is executed:
////////////////////////////////////////////////////////////////
EAX = 07751400 EBX = 0012F594 ECX = 07455B48 EDX = 07455B98
ESI = 001532A8 EDI = 00000000 EIP = 0775147E ESP = 0012F58C
EBP = 0012F5E4 EFL = 00000246
////////////////////////////////////////////////////////////////
The caller looks like this:
00000038 push 170470h
0000003d call F8D18698
00000042 movzx ebx,al <=== control returns here
(expecting result in "AL")
00000045 movzx eax,bl
00000048 mov dword ptr [ebp-0Ch],eax
0000004b nop
0000004c mov ebx,dword ptr [ebp-0Ch]
0000004f jmp 00000051
00000051 mov eax,ebx
00000053 pop ebx
00000054 pop esi
00000055 pop edi
00000056 mov esp,ebp
00000058 pop ebp
00000059 ret
////////////////////////////////////////////////////////////////
When control returns to the caller (at 000000042h), the registers
have changed to the following:
////////////////////////////////////////////////////////////////
EAX = 00000001 EBX = 07751470 ECX = 00000004 EDX = 00000000
ESI = 07455B48 EDI = 07455B98 EBP = 0012F63C ESP = 0012F624
////////////////////////////////////////////////////////////////
As you can see, the "AL" register has been set to "01". Actually the
entire EAX register has been set to "00000001". Since the "ret"
instruction should never modify the contents of the AL register, I had
another look at the point of time when the "ret 4" is executed. The
call stack reveals the likely culprit:
////////////////////////////////////////////////////////////////
libjss.dll!cse::ResourceRequirement::equals(
constcse::ResourceRequirement & rhs={...}) Line 27 C++
mscorwks.dll!7925c098()
^^^^^^^^^^^^^^^^^^^ probable culprit ^^^^^^^^^^^^^^^^^^^^^^^^^^^
libjss.dll!utl::equals<cse::ResourceRequirement>(
cse::ResourceRequirement* lhs = 0x07455b48,
cse::ResourceRequirement* rhs = 0x07455b98) Line 134 + 0x15
bytes C++
libjss.dll!jss.ResourceRequirement.Equals(
System.Object rhs = 0x04ce6ba0)
Line 33 + 0xc bytes C++
demo_cs.exe!demo_cs.demo_cs.evilBug() Line 143 + 0x9 bytes C#
demo_cs.exe!demo_cs.demo_cs.Main(string[] args = {Length=5})
Line 317 C#
////////////////////////////////////////////////////////////////
The register mangling must occur in mscorwks.dll, which I assume is
responsible for managing calls from managed into unmanaged code. I
found that if I compiled the native C++ projects with /CLR switch,
then the problem goes away (because there is no longer a transition
into unmanaged code). However, compiling with "/CLR" has significant
disadvantages, including:
1. can't use pre-compiled headers -- they make a BIG difference
compilation performance
2. debugging of unmanaged code becomes very difficult
I should say that I have followed the instructions for mixed-mode DLLs
that are described in a KB article called "Converting Managed Extensions
for C++ Projects from Pure Intermediate Language to Mixed Mode".
The debugging problem when using "/CLR" _also_ implicates mscorwks.dll.
Basically when I step into unmanaged C++ code, the debugger says "there
is no source code for the current location". Looking at the call stack,
I can see that mscorwks.dll!7925c098 is on TOP of the call stack, and
the native C++ code that I am trying to debug is next down on the call
stack. The end result is that debugging the native C++ code (that was
compiled with "/CLR") is a very painful exercise.
So I have to choose between:
a) having my code execute as I wrote it ("/CLR" switch turned ON for
native C++)
-OR-
b) being able to debug the code ("/CLR" switch turned OFF for native
C++)
In other words, if I want to be able to debug the code, then I must
accept the fact that the code will not always execute as I wrote it.
Not a good situation to be in! I don't see what I can do now except
try to get help.
Please ... help ....
-Adam McKee // sp**@adam-mckee.net