Dictionary and std::multimap

On Mon, 21 Apr 2008 12:05:00 -0700, Trecius
<Tr*****@discussions.microsoft.comwrote:

[...]
From the example, I have three sizes -- but this, of course can vary.
How
can I create a dictionary that would allow me to mimic the
std::multimap<>?

I don't know anything about std::multimap<specifically, but...

I've been trying Dictionary<int, List<DT>>, but I can't figure out how
to add
or access the dictionary.

To add:

Assuming:

Dictionary<int, List<DT>dict;
DT dtAdd;

Then:

if (dict.Contains(dtAdd.nSize))
{
dict[dtAdd.nSize].Add(dtAdd);
}
else
{
List<DTlist = new List<DT>();

list.Add(dtAdd);
dict.Add(dtAdd.nSize, list);
}

I'm not really sure what you mean by "access", but the above code includes
an example of how to actually access an element in the Dictionary<
instance.

One thing I would like to do is determine the
number of keys in the dictionary, which I can do by
dictionary.Key.Count, but
then how can I get a count of all the values associated with a specific
key?

For example, where "nSize" is a specific key:

dict[nSize].Count;

Hope that helps.

Pete

Jun 27 '08 #2

=?Utf-8?B?VHJlY2l1cw==?=

Thank you, Mr. Duniho. That is exactly what I'm looking for. Thank you again.
Trecius

"Peter Duniho" wrote:

On Mon, 21 Apr 2008 12:05:00 -0700, Trecius
<Tr*****@discussions.microsoft.comwrote:

[...]
From the example, I have three sizes -- but this, of course can vary.
How
can I create a dictionary that would allow me to mimic the
std::multimap<>?

I don't know anything about std::multimap<specifically, but...

I've been trying Dictionary<int, List<DT>>, but I can't figure out how
to add
or access the dictionary.

To add:

Assuming:

Dictionary<int, List<DT>dict;
DT dtAdd;

Then:

if (dict.Contains(dtAdd.nSize))
{
dict[dtAdd.nSize].Add(dtAdd);
}
else
{
List<DTlist = new List<DT>();

list.Add(dtAdd);
dict.Add(dtAdd.nSize, list);
}

I'm not really sure what you mean by "access", but the above code includes
an example of how to actually access an element in the Dictionary<>
instance.

One thing I would like to do is determine the
number of keys in the dictionary, which I can do by
dictionary.Key.Count, but
then how can I get a count of all the values associated with a specific
key?

For example, where "nSize" is a specific key:

dict[nSize].Count;

Hope that helps.

Pete

Jun 27 '08 #3

Peter Duniho wrote:

On Mon, 21 Apr 2008 12:05:00 -0700, Trecius
<Tr*****@discussions.microsoft.comwrote:

>[...]
From the example, I have three sizes -- but this, of course can vary.
How
can I create a dictionary that would allow me to mimic the
std::multimap<>?

I don't know anything about std::multimap<specifically, but...

>I've been trying Dictionary<int, List<DT>>, but I can't figure out
how to add
or access the dictionary.

To add:

Assuming:

Dictionary<int, List<DT>dict;
DT dtAdd;

Then:

if (dict.Contains(dtAdd.nSize))
{
dict[dtAdd.nSize].Add(dtAdd);
}
else
{
List<DTlist = new List<DT>();

list.Add(dtAdd);
dict.Add(dtAdd.nSize, list);
}

A little less redundant code:

List<DTmulti;

if (!dict.TryGetValue(dtAdd.nSize, out multi))
dict.Add(dt.nSize, multi = new List<DT>());

multi.Add(dtAdd);

>
I'm not really sure what you mean by "access", but the above code
includes an example of how to actually access an element in the
Dictionary<instance.

>One thing I would like to do is determine the
number of keys in the dictionary, which I can do by
dictionary.Key.Count, but
then how can I get a count of all the values associated with a
specific key?

For example, where "nSize" is a specific key:

dict[nSize].Count;

Hope that helps.

Pete

Jun 27 '08 #4

On Mon, 21 Apr 2008 14:19:22 -0700, Ben Voigt [C++ MVP]
<rb*@nospam.nospamwrote:

A little less redundant code:

List<DTmulti;

if (!dict.TryGetValue(dtAdd.nSize, out multi))
dict.Add(dt.nSize, multi = new List<DT>());

multi.Add(dtAdd);

Not that this would normally matter, but...in a previous test I found that
TryGetValue() was slower than calling Contains() followed by an access of
the specific element.

I never did figure out why that was, and it seems counter-intuitive to
me. But that's what I found.

Because of that, I tend to use Contains() rather than TryGetValue(). The
version of the above code with TryGetValue() only removes a single call to
Add(), at the expense of being slightly harder to read. Even if it
performed better, the reduced obviousness of the code would IMHO argue in
favor of the alternative, but being harder to read _and_ performing more
poorly is a pretty negative combination, even if hte differences are
slight.

YMMV. People who are really more C++ programmers than C# programmers have
a tendency to love highly-compressed code. There's a sort of "let's see
how much functionality we can squeeze into a single line" mentality going
on. :) Personally, I don't mind my code to be spread out a bit,
especially when it's easier to see what's going on.

Pete

Jun 27 '08 #5

Not that this would normally matter, but...in a previous test I found

that TryGetValue() was slower than calling Contains() followed by an
access of the specific element.

I never did figure out why that was, and it seems counter-intuitive to
me. But that's what I found.

Because of that, I tend to use Contains() rather than TryGetValue(). The
version of the above code with TryGetValue() only removes a
single call to Add(), at the expense of being slightly harder to
read. Even if it performed better, the reduced obviousness of the
code would IMHO argue in favor of the alternative, but being harder
to read _and_ performing more poorly is a pretty negative
combination, even if hte differences are slight.

Ok, you could use Contains and then either add a new element or retrieve the
existing one. You could even make a class implementing IDictionary that
auto-instantiates missing members and hides the details of how it does so.

>
YMMV. People who are really more C++ programmers than C# programmers
have a tendency to love highly-compressed code. There's a sort of
"let's see how much functionality we can squeeze into a single line"
mentality going on. :) Personally, I don't mind my code to be
spread out a bit, especially when it's easier to see what's going on.

Well, I wasn't actually trying to make the code denser.

I was trying to first establish an invariant (dict contains an element for
dtAdd.nSize).

Once the invariant is established you no longer need two versions of the
Add-to-list code that need to be maintained (ok, now it's a single function
call but it might not stay that way).

>
Pete

Jun 27 '08 #6

Not that this would normally matter, but...in a previous test I found

that TryGetValue() was slower than calling Contains() followed by an
access of the specific element.

I never did figure out why that was, and it seems counter-intuitive to
me. But that's what I found.

I'd like to see that benchmark... because disassembling the code there is no
way that can be true, unless TryGetValue causes a performance bug in the
JIT.

Also Jon has a very relevant blog entry here:
http://msmvps.com/blogs/jon.skeet/ar...tiful-one.aspx

Jun 27 '08 #7

Marc Gravell

Shame I didn't spot this thread earlier... But!

..NET 3.5 intruduces ILookup<TKey,TValuewhich is comparable to a
multimap; the default implementation (Lookup<TKey,TValue>) is immutable,
but it is trivial to write a mutable implementation; in fact, one is in
MiscUtil - EditableLookup<TKey,TValue>:

http://www.pobox.com/~skeet/csharp/miscutil/

And you can use it in 2.0 as well (the MiscUtil build stubs-out the
missing interfaces down in the 2.0 configuration).

Marc

Jun 27 '08 #8

On Tue, 22 Apr 2008 06:52:13 -0700, Ben Voigt [C++ MVP]
<rb*@nospam.nospamwrote:

>Not that this would normally matter, but...in a previous test I found
that TryGetValue() was slower than calling Contains() followed by an
access of the specific element.

I never did figure out why that was, and it seems counter-intuitive to
me. But that's what I found.

I'd like to see that benchmark... because disassembling the code there
is no
way that can be true, unless TryGetValue causes a performance bug in the
JIT.

Nothing would make me happier than to have another pair of eyes look at
the benchmark and explain my observations (if only to show that I did
something wrong in timing the code).

I don't have handy access to the code right now, but I'll post when I do.
Fair warning: I don't have time to clean it up and it was written in
pursuit of a different question. (Here's the context:
http://groups.google.com/group/micro...c79d93deffc692)

Pete

Jun 27 '08 #9

On Tue, 22 Apr 2008 17:29:32 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:

Hmmm... as expected, there is no clear winner.

Well, actually my expectation isn't that there's no clear winner, but
rather than the TryGetValue() version is consistently faster.

What I don't understand is why, at least on my computer (apparently not
yours), Contains() is consistently slower. These tests are very
repeatable, and while the variance is definitely higher on my computer
than yours, the slowest time for Contains() is only slower than one test
of TryGetValue(), and frankly that's only because it's the very first test
(which in my tests has consistently always been very slow...slow enough
that I think it'd actually be more fair to run a couple of throw-away
tests and then run five of each throwing out the high and low).

Speaking of that slow first test, I'll point out that even on your
computer, if you toss out the first run, Contains() is faster than
TryGetValue() for all but one other test. I'm not really convinced that
your tests show "no clear winner". I'd be more willing to accept that
statement if you presented a run that includes a couple of throw-away
tests and still showed a random distribution across both tests. Given the
apparently closer results on your computer, doing that is much more
important than it would be for my tests.

But in any case, I'm still without an explanation as to why there should
be such a clear difference on my computer.

Pete

Jun 27 '08 #10

"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...

On Tue, 22 Apr 2008 17:29:32 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:

>Hmmm... as expected, there is no clear winner.

Well, actually my expectation isn't that there's no clear winner, but
rather than the TryGetValue() version is consistently faster.

What I don't understand is why, at least on my computer (apparently not
yours), Contains() is consistently slower. These tests are very
repeatable, and while the variance is definitely higher on my computer
than yours, the slowest time for Contains() is only slower than one test
of TryGetValue(), and frankly that's only because it's the very first test
(which in my tests has consistently always been very slow...slow enough
that I think it'd actually be more fair to run a couple of throw-away
tests and then run five of each throwing out the high and low).

Speaking of that slow first test, I'll point out that even on your
computer, if you toss out the first run, Contains() is faster than
TryGetValue() for all but one other test. I'm not really convinced that
your tests show "no clear winner". I'd be more willing to accept that
statement if you presented a run that includes a couple of throw-away
tests and still showed a random distribution across both tests. Given the
apparently closer results on your computer, doing that is much more
important than it would be for my tests.

But in any case, I'm still without an explanation as to why there should
be such a clear difference on my computer.

Pete

When I say there is no clear winner, I mean a *Clear* winner between both
_HistDict2 and _HistDict3, I don't consider a difference of ~1% a clear
winner.

Your sample benchmark does not really measure the performance of ContainsKey
or TryGetValue, you are measuring a lot more.

So, let's only consider what happens in :

if (dict.ContainsKey(value))
..
if (dict.TryGetValue(value, out count))
...
and forget about the remainder of the methods.

When you look at the JITTED code you'll see that "ContainsKey" directly
calls into "FindEntry", while "TryGetValue" prepares a call stack before it
calls "FindEntry", after the return from "FindEntry", TryGetValue stores the
returned value (if Key is found) or a default value (if no Key found) in the
callers stack before returning.
So basically both are executing the same code (the FindEntry function),
except that TryGetValue takes a 15 instruction overhead in case the key not
found (which is most often the case in your benchmark) or an overhead of 16
instruction when the key was found.

Both foreach loops take:
loop with ContainsKey:
159/174
loop with TryGetValue:
174/190 instructions to complete (not found/found).
This represents a difference of 9% in number of instructions (but much less
in execution time (<5%)), in favor of "ContainsKey".
Obviously, the difference reflected by the benchmark is much less (< 2%), as
the foreach loop takes only 30% of the total execution time.

Here are the results of some new runs, now after the program was compiled
without /o (yes this runs faster than an optimized build).

Iterations: 5000
Data elements: 5000
Data range: 65536
Tests to run: 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2

_HistDict2: 13080 msec.
_HistDict2: 13080 msec.
_HistDict2: 13169 msec.
_HistDict2: 13060 msec.
_HistDict2: 13173 msec.
_HistDict2: 13068 msec.
_HistDict3: 13275 msec.
_HistDict3: 13167 msec.
_HistDict3: 13233 msec.
_HistDict3: 13181 msec.
_HistDict3: 13268 msec.
_HistDict3: 13175 msec.

Validating results...done.

Note that I changed your like here:

Console.WriteLine(hptd.Method.Name + ": " + _sw.ElapsedMilliseconds + "
msec.");

Willy.

Jun 27 '08 #11

So, let's only consider what happens in :

>
if (dict.ContainsKey(value))
..
if (dict.TryGetValue(value, out count))
...
and forget about the remainder of the methods.

When you look at the JITTED code you'll see that "ContainsKey"
directly calls into "FindEntry", while "TryGetValue" prepares a call
stack before it calls "FindEntry", after the return from "FindEntry",
TryGetValue stores the returned value (if Key is found) or a default
value (if no Key found) in the callers stack before returning.
So basically both are executing the same code (the FindEntry
function), except that TryGetValue takes a 15 instruction overhead in
case the key not found (which is most often the case in your
benchmark) or an overhead of 16 instruction when the key was found.

Victim of the 32 byte threshold for inlining then. TryGetValue is a very
simple function and *ought* to be inlined, but because it is over 32 bytes
of IL, it isn't.

Like I surmised earlier... JIT performance bug.

>
Both foreach loops take:
loop with ContainsKey:
159/174
loop with TryGetValue:
174/190 instructions to complete (not found/found).

Is this including the second call to FindEntry from the ContainsKey/found
variant? Seems like the call to operator[] should have added a lot more
than 15 instructions in that case.

This represents a difference of 9% in number of instructions (but
much less in execution time (<5%)), in favor of "ContainsKey".
Obviously, the difference reflected by the benchmark is much less (<
2%), as the foreach loop takes only 30% of the total execution time.

Here are the results of some new runs, now after the program was
compiled without /o (yes this runs faster than an optimized build).

Iterations: 5000
Data elements: 5000
Data range: 65536
Tests to run: 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2

_HistDict2: 13080 msec.
_HistDict2: 13080 msec.
_HistDict2: 13169 msec.
_HistDict2: 13060 msec.
_HistDict2: 13173 msec.
_HistDict2: 13068 msec.
_HistDict3: 13275 msec.
_HistDict3: 13167 msec.
_HistDict3: 13233 msec.
_HistDict3: 13181 msec.
_HistDict3: 13268 msec.
_HistDict3: 13175 msec.

Validating results...done.

Note that I changed your like here:

Console.WriteLine(hptd.Method.Name + ": " + _sw.ElapsedMilliseconds
+ " msec.");

Willy.

Jun 27 '08 #12

"Ben Voigt [C++ MVP]" <rb*@nospam.nospamwrote in message
news:ua****************@TK2MSFTNGP02.phx.gbl...

>So, let's only consider what happens in :

if (dict.ContainsKey(value))
..
if (dict.TryGetValue(value, out count))
...
and forget about the remainder of the methods.

When you look at the JITTED code you'll see that "ContainsKey"
directly calls into "FindEntry", while "TryGetValue" prepares a call
stack before it calls "FindEntry", after the return from "FindEntry",
TryGetValue stores the returned value (if Key is found) or a default
value (if no Key found) in the callers stack before returning.
So basically both are executing the same code (the FindEntry
function), except that TryGetValue takes a 15 instruction overhead in
case the key not found (which is most often the case in your
benchmark) or an overhead of 16 instruction when the key was found.

Victim of the 32 byte threshold for inlining then. TryGetValue is a very
simple function and *ought* to be inlined, but because it is over 32 bytes
of IL, it isn't.

The overhead is not (entirely) call overhead, TryGetValue returns the value
of the K/V pair or a default value depending on the type of the value as an
out argument. The call overhead is only 5 instructions on X86 and on X64
there is no overhead at all.

Like I surmised earlier... JIT performance bug.

>>
Both foreach loops take:
loop with ContainsKey:
159/174
loop with TryGetValue:
174/190 instructions to complete (not found/found).

Is this including the second call to FindEntry from the ContainsKey/found
variant? Seems like the call to operator[] should have added a lot more
than 15 instructions in that case.

What do you mean with second call to FindEntry? there is only one call to
FindEntry per loop.

>This represents a difference of 9% in number of instructions (but
much less in execution time (<5%)), in favor of "ContainsKey".
Obviously, the difference reflected by the benchmark is much less (<
2%), as the foreach loop takes only 30% of the total execution time.

Here are the results of some new runs, now after the program was
compiled without /o (yes this runs faster than an optimized build).

Iterations: 5000
Data elements: 5000
Data range: 65536
Tests to run: 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2

_HistDict2: 13080 msec.
_HistDict2: 13080 msec.
_HistDict2: 13169 msec.
_HistDict2: 13060 msec.
_HistDict2: 13173 msec.
_HistDict2: 13068 msec.
_HistDict3: 13275 msec.
_HistDict3: 13167 msec.
_HistDict3: 13233 msec.
_HistDict3: 13181 msec.
_HistDict3: 13268 msec.
_HistDict3: 13175 msec.

Validating results...done.

Note that I changed your like here:

Console.WriteLine(hptd.Method.Name + ": " + _sw.ElapsedMilliseconds
+ " msec.");

Willy.

Jun 27 '08 #13

Oops, Hit the send key too soon ;-)

Willy.

"Ben Voigt [C++ MVP]" <rb*@nospam.nospamwrote in message
news:ua****************@TK2MSFTNGP02.phx.gbl...

>So, let's only consider what happens in :

if (dict.ContainsKey(value))
..
if (dict.TryGetValue(value, out count))
...
and forget about the remainder of the methods.

When you look at the JITTED code you'll see that "ContainsKey"
directly calls into "FindEntry", while "TryGetValue" prepares a call
stack before it calls "FindEntry", after the return from "FindEntry",
TryGetValue stores the returned value (if Key is found) or a default
value (if no Key found) in the callers stack before returning.
So basically both are executing the same code (the FindEntry
function), except that TryGetValue takes a 15 instruction overhead in
case the key not found (which is most often the case in your
benchmark) or an overhead of 16 instruction when the key was found.

Victim of the 32 byte threshold for inlining then. TryGetValue is a very
simple function and *ought* to be inlined, but because it is over 32 bytes
of IL, it isn't.

Like I surmised earlier... JIT performance bug.

>>
Both foreach loops take:
loop with ContainsKey:
159/174
loop with TryGetValue:
174/190 instructions to complete (not found/found).

Is this including the second call to FindEntry from the ContainsKey/found
variant? Seems like the call to operator[] should have added a lot more
than 15 instructions in that case.

These are the loops I'm talking about.

foreach (int value in rgiValues)
{
if (dict.ContainsKey(value))
{
dict[value] = dict[value] + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 159 instructions to complete when Key is not found, 174 when
found.
Here ContainsKey directly calls FindEntry...

foreach (int value in rgiValues)
{
int count;
if (dict.TryGetValue(value, out count))
{
dict[value] = count + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 174 instructions when key is found, 190 when not found. Here
FindEntry is called from inside TryGetValue .
Note that this code does not take advantage of the value returned by
TryGetValue, which somewhat defeats it's purpose.

Willy.

Jun 27 '08 #14

Jon Skeet [C# MVP]

Willy Denoyette [MVP] <wi*************@telenet.bewrote:

These are the loops I'm talking about.

foreach (int value in rgiValues)
{
if (dict.ContainsKey(value))
{
dict[value] = dict[value] + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 159 instructions to complete when Key is not found, 174 when
found.
Here ContainsKey directly calls FindEntry...

Is that taking into account the indexer also calling FindEntry (on the
"get") though?

That's the odd part for me - ContainsKey followed by an indexer get
needs to look through the hashtable twice, whereas TryGetValue should
only need to look through it once.

Are you saying that the overhead for TryGetValue is the same as the
overhead of effectively calling FindEntry an extra time?

>
foreach (int value in rgiValues)
{
int count;
if (dict.TryGetValue(value, out count))
{
dict[value] = count + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 174 instructions when key is found, 190 when not found. Here
FindEntry is called from inside TryGetValue .

Note that this code does not take advantage of the value returned by
TryGetValue, which somewhat defeats it's purpose.

Yes it does - it uses "count + 1" instead of "dict[value] + 1", and
uses the actual return value (the boolean) to determine whether or not
it was found.

Which part of the result do you believe isn't being used?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

Jun 27 '08 #15

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP*********************@msnews.microsoft.com. ..

Willy Denoyette [MVP] <wi*************@telenet.bewrote:
>These are the loops I'm talking about.

foreach (int value in rgiValues)
{
if (dict.ContainsKey(value))
{
dict[value] = dict[value] + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 159 instructions to complete when Key is not found, 174 when
found.
Here ContainsKey directly calls FindEntry...

Is that taking into account the indexer also calling FindEntry (on the
"get") though?

Oh I see what Ben was talking about. No, I did not (but agreed I should
have), nor do I consider the possible intermediate GC run and the fact that
the Dictionary (the backing store) is dynamically extended during the run.
The Dictionary starts with the default size, but gets extended several times
during the run, each extention costs ~2000 cycles. The backing store finaly
gets that big that it ends on the LOH.

That's the odd part for me - ContainsKey followed by an indexer get
needs to look through the hashtable twice, whereas TryGetValue should
only need to look through it once.

It only has to look through the hashtable twice if the key was found.

Are you saying that the overhead for TryGetValue is the same as the
overhead of effectively calling FindEntry an extra time?

No I'm not. I'm only discussing the results of the benchmark, run with the
same command-line arguments as posted by Peter.
I'm only comparing both methods when the key is not found (which is quite
common with 64K possible random values for 5000 entries), in which case
ConTainsKey is the winner. Cases where the key is found are a lot more
complex to compare, TryGetValue is the clear winner when the majority of
keys are found in the dictionary, as illustrated here:

Iterations: 5000
Data elements: 5000
Data range: 100
Tests to run: 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2

_HistDict2: 2936 msec.
_HistDict2: 2915 msec.
_HistDict2: 2913 msec.
_HistDict2: 2912 msec.
_HistDict2: 2906 msec.
_HistDict2: 2924 msec.
_HistDict3: 2206 msec.
_HistDict3: 2204 msec.
_HistDict3: 2199 msec.
_HistDict3: 2200 msec.
_HistDict3: 2194 msec.
_HistDict3: 2204 msec.
Here the number of random keys is limitted to 100, and TryGetValue is ~25%
faster due to the extra FindEntry in ContainsKey.
It's obviouw why this is much faster than the test with 64K keys, only a
few extentions of the backing store have to be done in this case.

>>
foreach (int value in rgiValues)
{
int count;
if (dict.TryGetValue(value, out count))
{
dict[value] = count + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 174 instructions when key is found, 190 when not found. Here
FindEntry is called from inside TryGetValue .

>Note that this code does not take advantage of the value returned by
TryGetValue, which somewhat defeats it's purpose.

Yes it does - it uses "count + 1" instead of "dict[value] + 1", and
uses the actual return value (the boolean) to determine whether or not
it was found.

True, but only in the case were the key is found.
Note that the above code could have been written like....
...
dict.TryGetValue(value, out count)
dict[value] = count + 1;
...
Willy.

Jun 27 '08 #16

On Wed, 23 Apr 2008 17:48:41 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:

[...]
Here the number of random keys is limitted to 100, and TryGetValue is
~25% faster due to the extra FindEntry in ContainsKey.
It's obviouw why this is much faster than the test with 64K keys, only
a few extentions of the backing store have to be done in this case.

Okay, so I take it that the bottom line here is that there are at least
two issues at work:

TryGetValue() _is_ in fact "inferior" in the sense that it can't be
inlined and so results in more overhead for the operation when the key
isn't found. (This isn't to say it's actually an inferior approach...just
that there's something about it that could be considered "inferior").

The other issue is, of course, that the particular input I provided
exposes the worst-case scenario in which the key is usually not found. In
other words, even if TryGetValue() can be considered to be "inferior" in
some specific way, that is only going to be apparent in some specific
scenario.

The part I still don't understand: why TryGetValue() should be _faster_
when the key is found, given that you wrote that both forms of the whole
loop take 174 instructions in that case. Shouldn't the two approaches be
equivalent in that case, rather than TryGetValue() showing a marked
improvement? Is my lack of comprehension due to a naÃ¯ve understanding of
what "174 instructions" means? That is, that the instruction count isn't
a direct measure of the time the .NET code will take to execute?

I admit, I don't really know when you wrote about "instructions" what
exactly you're talking about. Is that IL instructions? Or the final x86
code? Either way, I can easily see how the count of instructions does not
necessarily tell you precisely how fast the code will execute.

In the end, I think I am most satisfied that, once again, it's been shown
that performance-tuning this sort of thing is fraught with difficulty, if
not entirely pointless. :)

Pete

Jun 27 '08 #17

"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...

On Wed, 23 Apr 2008 17:48:41 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:

>[...]
Here the number of random keys is limitted to 100, and TryGetValue is
~25% faster due to the extra FindEntry in ContainsKey.
It's obviouw why this is much faster than the test with 64K keys, only
a few extentions of the backing store have to be done in this case.

Okay, so I take it that the bottom line here is that there are at least
two issues at work:

TryGetValue() _is_ in fact "inferior" in the sense that it can't be
inlined and so results in more overhead for the operation when the key
isn't found. (This isn't to say it's actually an inferior approach...just
that there's something about it that could be considered "inferior").

The other issue is, of course, that the particular input I provided
exposes the worst-case scenario in which the key is usually not found. In
other words, even if TryGetValue() can be considered to be "inferior" in
some specific way, that is only going to be apparent in some specific
scenario.

The part I still don't understand: why TryGetValue() should be _faster_
when the key is found, given that you wrote that both forms of the whole
loop take 174 instructions in that case. Shouldn't the two approaches be
equivalent in that case, rather than TryGetValue() showing a marked
improvement? Is my lack of comprehension due to a naÃ¯ve understanding of
what "174 instructions" means? That is, that the instruction count isn't
a direct measure of the time the .NET code will take to execute?

I admit, I don't really know when you wrote about "instructions" what
exactly you're talking about. Is that IL instructions? Or the final x86
code? Either way, I can easily see how the count of instructions does not
necessarily tell you precisely how fast the code will execute.

In the end, I think I am most satisfied that, once again, it's been shown
that performance-tuning this sort of thing is fraught with difficulty, if
not entirely pointless. :)

Pete

Ok, let me show you the (raw) call graphs for both foreach loops (taken from
a CPU instrumentation analyzer).

1) ContainsKey - Key found case:

caller (the method) callee
Start
|
3
--------------------------->FindEntry
74
<---------------------------
9
--------------------------->get_Item
93 (includes a call to FindEntry)
<----------------------------
7
--------------------------->Insert
85
<----------------------------
19
|
----return to Start

2) TryGetValue - Key found case:

Start
|
6
--------------------------->TryGetValue (calls FindEntry)
94
<---------------------------
12
--------------------------->Insert
85
<----------------------------
17
|
----return to Start

3) ContainsKey - Key NOT found case:

caller (the method) callee
Start
|
3
--------------------------->FindEntry
41
<---------------------------
10
--------------------------->Insert
88
<----------------------------
17
|
----return to Start
4) TryGetValue - Key NOT found case:

Start
|
6
--------------------------->TryGetValue (calls FindEntry)
56
<---------------------------
10
--------------------------->Insert
88
<----------------------------
17
|
----return to Start
Note:
- the figures are X86 instruction numbers
- instruction # in callee include instruction counts of sub calls, the
actual call graph is deeper than shown here.

This gives us for:
ContainsKey
Key found: 290 instructions
Key not found: 159 instructions
TryGetValue
Key found: 214 instructions
Key not found: 177 instructions

Notice that ContainsKey is somewhat faster when the key does not exist,
while ContainsKey is considerably slower when the key exists.

Some more remarks:
Beware that the instruction count isn't a direct measure of the time taken
to execute a loop, it can only be used as a good estimate if a major portion
of the loop executes the same call graph. Modern processors have become so
fast, that the real bottleneck is the memory system (cache speed, sizes and
hierarchy). Running the same benchmark on a 20% faster CPU (clock speed)
won't yield a 20% performance gain, while running on a slower CPU with a
"better" memory system might give you better results.
The performance level of these kind of benchmarks are now constrained by the
memory system (memory bound) , things like memory bandwidth, throughput and
latency are since long more important than CPU clock rates.

Willy.

Jun 27 '08 #18