473,386 Members | 1,720 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

using ref keyword performance

Can you explain why using ref keyword for passing parameters works
slower that passing parameters by values itself.
I wrote 2 examples to test it:
//using ref
static void Main(string[] args)
{
List<TimeSpantimes = new List<TimeSpan>();
DateTime start;
DateTime end;
for (int j = 0; j < 1000; j++)
{
start = DateTime.Now;
int k = 0;
for (int i = 0; i < 30000000; i++)
{
Test(ref k);
}
end = DateTime.Now;
times.Add(end - start);
}

double secondDiffs = 0;
foreach (TimeSpan var in times)
{
secondDiffs += var.TotalSeconds;
}

Console.WriteLine("using ref!");
Console.WriteLine(secondDiffs/times.Count);
Console.ReadLine();
}

static void Test(ref int k)
{
int j = k;
}

// without ref
List<TimeSpantimes = new List<TimeSpan>();
DateTime start;
DateTime end;
for (int j = 0; j < 1000; j++)
{
start = DateTime.Now;
int k = 0;
for (int i = 0; i < 30000000; i++)
{
Test(k);
}
end = DateTime.Now;
times.Add(end - start);
}

double secondDiffs = 0;
foreach (TimeSpan var in times)
{
secondDiffs += var.TotalSeconds;
}

Console.WriteLine("without ref!");
Console.WriteLine(secondDiffs/times.Count);
Console.ReadLine();
}

static void Test(int k)
{
int j = k;
}
Results:
without ref!
0,256625
using ref!
0,26428125

Why 'without ref' test works faster that 'using ref' test?
As I understand in 'without ref' test, parameter is passed by value, so
a new storage location created each time we enter Test method.
And in 'using ref' test we pass parameter by reference. So rather than
creating a new storage location for the variable
in the function member declaration, the same storage location is used.
>From my point of view it must work faster.
Can you give some comments about this situation plz.
Thanx.

Nov 30 '06 #1
6 3653
Not sure I would trust the DateTime for such micro measurements. Have you
tried using one of the more precise methods available?

--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)
Nov 30 '06 #2
Why 'without ref' test works faster that 'using ref' test?
As I understand in 'without ref' test, parameter is passed by value,
so
a new storage location created each time we enter Test method.
And in 'using ref' test we pass parameter by reference. So rather than
creating a new storage location for the variable
in the function member declaration, the same storage location is used.
>From my point of view it must work faster.
There are two points of overhead that keep ref parameters from working faster:

1. The address of the value must be retrieved before passing it as a ref
parameter.
2. That pointer has to be dereferenced in the called method before the value
can be used.
Can you give some comments about this situation plz. Thanx.
The only real difference between passing a parameter by value and passing
parameter by reference is that a pointer is used to pass by reference. So,
the overhead is getting the address of the value to pass and dereferencing
the pointer to get the value in the method that is called.

The tests that you ran were not optimal for getting accurate timings. Here
is the code that I used.

First, this is my HighResolutionTimer class:

using System;
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Security;

namespace RefTest
{
public class HighResolutionTimer
{
// private fields...
private long m_Frequency;
private long m_StartCounter;
private long m_StopCounter;

// constructors...
public HighResolutionTimer() : this(false) { }
public HighResolutionTimer(bool start)
{
if (!QueryPerformanceFrequency(out m_Frequency))
{
Debug.WriteLine("HighResolutionTimer.ctor(): Error occurred while
calling QueryPerformanceFrequency.");
return;
}

if (start)
Start();
}

// win32 api methods...
[SuppressUnmanagedCodeSecurity]
[DllImport("kernel32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool QueryPerformanceCounter(
[Out] out long lpPerformanceCount);

[SuppressUnmanagedCodeSecurity]
[DllImport("kernel32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool QueryPerformanceFrequency(
[Out] out long lpFrequency);

// private methods...
private double CalcDuration()
{
return ((double)(m_StopCounter - m_StartCounter)) / (double)m_Frequency;
}

// public methods...
public void Reset()
{
m_StartCounter = 0;
m_StopCounter = 0;
}
public void Start()
{
Reset();
if (!QueryPerformanceCounter(out m_StartCounter))
Debug.WriteLine("HighResolutionTimer.Start(): Error occurred while
calling QueryPerformanceCounter.");
}
public double Stop()
{
if (!QueryPerformanceCounter(out m_StopCounter))
{
Debug.WriteLine("HighResolutionTimer.Stop(): Error occurred while
calling QueryPerformanceCounter.");
return Double.NaN;
}

return Duration;
}

// public overridden methods...
public override string ToString()
{
return CalcDuration().ToString("0.######") + " seconds";
}

// public properties...
public double Duration
{
get
{
return CalcDuration();
}
}
}
}

Second, here is a helper CodeTimer class that I use for timing code:

using System;

namespace RefTest
{
public static class CodeTimer
{
private static double Average(double[] values)
{
if (values == null)
throw new ArgumentNullException("values");

int valueCount = values.Length;

if (valueCount == 0)
return 0.0d;

double sum = 0.0d;
for (int i = 0; i < valueCount; i++)
sum += values[i];

return sum / valueCount;
}

public delegate void TimingCode();

public static double Execute(TimingCode code)
{
if (code == null)
throw new ArgumentNullException("code");

const int NUM_SAMPLES = 100;

double[] timings = new double[NUM_SAMPLES];
HighResolutionTimer timer = new HighResolutionTimer();
for (int i = 0; i < NUM_SAMPLES; i++)
{
timer.Reset();

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

timer.Start();
code();
timer.Stop();

timings[i] = timer.Duration;
}

return Average(timings);
}
}
}

And finally, here's the Program class for my test console application:

using System;
using System.Runtime.CompilerServices;

namespace RefTest
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("TestWithoutRef: {0:###,###,##0.000000}", CodeTimer.Execute(TestWithoutRefLoop));
Console.WriteLine("TestWithRef: {0:###,###,##0.000000}", CodeTimer.Execute(TestWithRefLoop));

Console.ReadLine();
}

static void TestWithRefLoop()
{
int result;
for (int i = 0; i < 50000000; i++)
result = TestWithRef(ref i);
}
static void TestWithoutRefLoop()
{
int result;
for (int i = 0; i < 50000000; i++)
result = TestWithoutRef(i);
}

[MethodImpl(MethodImplOptions.NoInlining)]
static int TestWithRef(ref int k)
{
return k;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static int TestWithoutRef(int k)
{
return k;
}
}
}

In VS 2005, create a new console application and add those files to get a
more optimal test. Here are the timings that I get:

TestWithoutRef: 0.192421 seconds
TestWithRef: 0.194921

So, according to my results, passing a parameter by reference 50,000,000
times results in approximately 2.5 milliseconds. Yee-ha! This is not something
to worry about. :-)

----

If you're interested in seeing what is going on under the covers, let's take
a look at the IL that is generated:

static void TestWithoutRef()
{
int result;
for (int i = 0; i < 50000000; i++)
result = TestWithoutRef(i);
}

..method private hidebysig static void TestWithoutRefLoop() cil managed
{
.maxstack 2
.locals init (
[0] int32 i)
L_0000: ldc.i4.0
L_0001: stloc.0
L_0002: br.s L_000f
L_0004: ldloc.0
L_0005: call int32 RefTest.Program::TestWithoutRef(int32)
L_000a: pop
L_000b: ldloc.0
L_000c: ldc.i4.1
L_000d: add
L_000e: stloc.0
L_000f: ldloc.0
L_0010: ldc.i4 50000000
L_0015: blt.s L_0004
L_0017: ret
}

static void TestWithRefLoop()
{
int result;
for (int i = 0; i < 50000000; i++)
result = TestWithRef(ref i);
}

..method private hidebysig static void TestWithRefLoop() cil managed
{
.maxstack 2
.locals init (
[0] int32 i)
L_0000: ldc.i4.0
L_0001: stloc.0
L_0002: br.s L_0010
L_0004: ldloca.s i
L_0006: call int32 RefTest.Program::TestWithRef(int32&)
L_000b: pop
L_000c: ldloc.0
L_000d: ldc.i4.1
L_000e: add
L_000f: stloc.0
L_0010: ldloc.0
L_0011: ldc.i4 50000000
L_0016: blt.s L_0004
L_0018: ret
}

These methods only differ by one byte in length and the reason is found at
L_0004. In TestWithoutRefLoop, the "ldloc.0" instruction is used. This simply
loads the local variable at index 0 ('i') onto the stack. Because we're passing
by value, that's all that's needed to make the call to TestWithoutRef(int32).
However, in TestWithRefLoop, the "ldloc.a i" instruction is used. This is
one byte larger because there is a byte for the instruction and a byte to
indicate the index of the local to use. And, instead of loading the specified
local variable onto the stack, it loads the *address* of said local variable
in order to set up the TestWithRef(int32&) method call. On my machine, when
I look at the optimized JITted code for these methods, I see the following
x86:

TestWithoutRefLoop:

00000000 push esi
00000001 xor esi,esi
00000003 mov ecx,esi
00000005 call dword ptr ds:[00913070h]
0000000b inc esi
0000000c cmp esi,2FAF080h
00000012 jl 00000003
00000014 pop esi
00000015 ret

TestWithRefLoop

00000000 push eax
00000001 xor eax,eax
00000003 mov dword ptr [esp],eax
00000006 xor edx,edx
00000008 mov dword ptr [esp],edx
0000000b cmp dword ptr [esp],2FAF080h
00000012 jge 00000029
00000014 lea ecx,[esp]
00000017 call dword ptr ds:[0091306Ch]
0000001d inc dword ptr [esp]
00000020 cmp dword ptr [esp],2FAF080h
00000027 jl 00000014
00000029 pop ecx
0000002a ret

Obviously, a lot more work is necessary at the x86 level to get the address
of this pointer.

Now, let's look at the methods that get called.

static int TestWithoutRef(int k)
{
return k;
}

..method private hidebysig static int32 TestWithoutRef(int32 k) cil managed
noinlining
{
.maxstack 8
L_0000: ldarg.0
L_0001: ret
}

static int TestWithRef(ref int k)
{
return k;
}

..method private hidebysig static int32 TestWithRef(int32& k) cil managed
noinlining
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldind.i4
L_0002: ret
}
In this case, TestWithRef has one additional instruction: "ldind.i4". This
instruction takes the managed pointer on the top of the evaluation stack
and loads the int32 value indirectly from it (hence "ldind"). IOW, this is
the pointer dereference that needs to happen before the value can be used
(in this case, returned).

For completeness, here's the x86 of the optimized JITted code:

TestWithoutRef

00000000 mov eax,ecx
00000002 ret

TestWithRef

00000000 mov eax,dword ptr [ecx]
00000002 ret

Obviously, there is a lot less going on here than at the calling site. The
only difference is the pointer dereference. So, most of the overhead that
we observed takes place in the calling site. But, IMO, it is neglible. There's
nothing to get worked up about. Take a deep breath. If you need to be concerned
about performance at this low of a level, you probably shouldn't be working
in a garbage-collected environment. :-)

Best Regards,
Dustin Campbell
Developer Express Inc.

Nov 30 '06 #3

Dustin Campbell
WoW! Thanx for great explanation of this situation!
Its all clear to me.
I wonder if this situation will repeat at 64x machine.
I'll try to check it according to your explanations.
Thanx! :)

Nov 30 '06 #4
Dustin Campbell
WoW! Thanx for great explanation of this situation!
Its all clear to me now.
I wonder if this situation will repeat at 64x machine.
I'll try to check it according to your explanations.
Thanx! :)

Nov 30 '06 #5
Using ref is faster but only with Value type objects. Reference type objects you are passing around a 4 or 8 byte pointer to the obj (depending on arch). The best way to test this is probably to make a struct that uses the StructLayout attribute and set the size property. Here is an example of a 1024 byte struct. I wrote some tests a while ago to examine the difference with various sized structs and it seems the speed difference is small unless you are using big structures (128 bytes plus). When I get home I will post the results.

If speed is this important to you might want to consider C++ instead of c#

[StructLayout(LayoutKind.Explicit,Size=1024)]
public struct TestValue
{
}
---
Posted via DotNetSlackers.com
Dec 1 '06 #6
Using ref is faster but only with Value type objects.

Do you have research to show that? I've exhaustively demonstrated the opposite.
Best Regards,
Dustin Campbell
Developer Express Inc.
Dec 1 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Ioannis Vranos | last post by:
Just some thought on it, wanting to see any explanations. It was advised in this newsgroups that we should avoid the use of keyword register. However it is a language feature, and if it...
9
by: Bryan Parkoff | last post by:
I have noticed that C programmers put static keyword beside global variable and global functions in C source codes. I believe that it is not necessary and it is not the practice in C++. Static...
18
by: Method Man | last post by:
If I don't care about the size of my executable or compile time, is there any reason why I wouldn't want to inline every function in my code to make the program run more efficient?
6
by: Elder Santos | last post by:
hi, everyone, i would like know why c# doesn't have the keyword "with", such as vb does? Thank u so much;; --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system...
3
by: Ben Galvin | last post by:
Hi, I'm looking for an equivalent to the C++ 'friend' keyword in C# (for those who don't know, this lets you give a specific class access to all the private/protected members of another class)....
3
by: Matt Swift | last post by:
If source contains lots of this. statements, does this incur any kind of minor performance hit, I'd imagine it does as it's causing a check to be performed? Thanks
3
by: Mehul Patel | last post by:
Our .Net team have been pondering about using keyword. We are using streams FileStream and BufferedStream. We use using keyword at FileStream, and not BufferedStream which wraps FileStream. So...
8
by: Laser Lu | last post by:
Sometimes, I need to do some time-consuming operations based on whether a specific keyword was contained in a lengthy string. Then, for a better performance, I wrapped that lengthy string into a...
8
by: Steven D'Aprano | last post by:
I'm writing a factory function that needs to use keywords in the produced function, not the factory. Here's a toy example: def factory(flag): def foo(obj, arg): if flag: # use the spam keyword...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.