Swap two integers without using temporary variable

in*************@gmail.com wrote:

OK, before you all stab me with your OT pitchfork, I just want to
mention that this question is indeed related to c.l.c++. My question
is, does the well-known method to swap two integers produce undefined
behaviour in c++? The method is:

a ^= b ^= a ^= b;

If you write it on a single line, it would appear to be UB since you modify
the variable a twice between sequence points (and even if by some extended
exegesis of the standard it would turn out to be well-defined, it would
still be poor form; but the in the absence of obvious separating sequence
points, the burden to prove well-defined behavior would be on whoever codes
that way. Presumably, it would take a whole paragraph in comments to
justify that line).

However

a ^= b;
b ^= a;
a ^= b;

has well-defined behavior.

It's a clever trick,

No, it isn't. It's just a form of code obfuscation. If you want to swap a
and be, just say so:

swap( a, b );

but I have a feeling something may go wrong if I
use it in C++, although quick tests with a few compilers produce
correct output. So is the code guaranteed to work correctly on all
compilers?

See above.
Best

Kai-Uwe Bux

Nov 26 '07 #2

On Nov 26, 1:13 pm, Kai-Uwe Bux <jkherci...@gmx.netwrote:

indrawati.ya...@gmail.com wrote:
OK, before you all stab me with your OT pitchfork, I just want to
mention that this question is indeed related to c.l.c++. My question
is, does the well-known method to swap two integers produce undefined
behaviour in c++? The method is:

a ^= b ^= a ^= b;

If you write it on a single line, it would appear to be UB since you modify
the variable a twice between sequence points (and even if by some extended
exegesis of the standard it would turn out to be well-defined, it would
still be poor form; but the in the absence of obvious separating sequence
points, the burden to prove well-defined behavior would be on whoever codes
that way. Presumably, it would take a whole paragraph in comments to
justify that line).

However

a ^= b;
b ^= a;
a ^= b;

has well-defined behavior.

I wanted to say so.

It's a clever trick,

No, it isn't. It's just a form of code obfuscation. If you want to swap a
and be, just say so:

swap( a, b );

and what is the implementation of swap?
One may like to implement the int swap as the above three lines.

regards,
FM.

Nov 26 '07 #3

terminator wrote:

On Nov 26, 1:13 pm, Kai-Uwe Bux <jkherci...@gmx.netwrote:
>indrawati.ya...@gmail.com wrote:
OK, before you all stab me with your OT pitchfork, I just want to
mention that this question is indeed related to c.l.c++. My question
is, does the well-known method to swap two integers produce undefined
behaviour in c++? The method is:

a ^= b ^= a ^= b;

If you write it on a single line, it would appear to be UB since you
modify the variable a twice between sequence points (and even if by some
extended exegesis of the standard it would turn out to be well-defined,
it would still be poor form; but the in the absence of obvious separating
sequence points, the burden to prove well-defined behavior would be on
whoever codes that way. Presumably, it would take a whole paragraph in
comments to justify that line).

However

a ^= b;
b ^= a;
a ^= b;

has well-defined behavior.

I wanted to say so.

It's a clever trick,

No, it isn't. It's just a form of code obfuscation. If you want to swap a
and be, just say so:

swap( a, b );

and what is the implementation of swap?

That is up to the implementor of the library.

One may like to implement the int swap as the above three lines.

So what? Even if std::swap should happen to be implemented in such a way
that it boils down to the above three lines for built-in scalar types, that
still would not imply that

a ^= b; b ^= a; a ^= b;

conveys intend as well and as succinctly as

std::swap( a, b );

There is no point in littering code with hacks. Note: the same applies to

{
int dummy = a;
a = b;
b = dummy;
}

although the case is somewhat weaker. You also would not want to write that
over and over again in your code.
Best

Kai-Uwe Bux

Nov 26 '07 #4

Marco Manfredini

in*************@gmail.com wrote:

OK, before you all stab me with your OT pitchfork, I just want to
mention that this question is indeed related to c.l.c++. My question
is, does the well-known method to swap two integers produce undefined
behaviour in c++? The method is:

a ^= b ^= a ^= b;

It's a clever trick, but I have a feeling something may go wrong if I
use it in C++, although quick tests with a few compilers produce
correct output. So is the code guaranteed to work correctly on all

A really *stone old* trick, that doesn't work if a and b occupy the same
storage location.

void swap(int &a , int &b)
{
a^=b; b^=a; a^= b;
}

int x=1;
swap(x,x);

will produce 0 in x and can hardly be considered a valid implementation
of swap - which btw. should terminate the other discussion as well.

Nov 26 '07 #5

Marco Manfredini wrote:

in*************@gmail.com wrote:
>OK, before you all stab me with your OT pitchfork, I just want to
mention that this question is indeed related to c.l.c++. My question
is, does the well-known method to swap two integers produce undefined
behaviour in c++? The method is:

a ^= b ^= a ^= b;

It's a clever trick, but I have a feeling something may go wrong if I
use it in C++, although quick tests with a few compilers produce
correct output. So is the code guaranteed to work correctly on all

A really *stone old* trick, that doesn't work if a and b occupy the same
storage location.

Ah that was it. I remembered vaguely a discussion of several tricks like
this on this group. Unfortunately, I was not able to recall the exact
shortcoming that was pointed out back then (I considered the case a == b
not &a == &b --- and a== b worked).

void swap(int &a , int &b)
{
a^=b; b^=a; a^= b;
}

int x=1;
swap(x,x);

will produce 0 in x and can hardly be considered a valid implementation
of swap - which btw. should terminate the other discussion as well.

Yup.
Thanks a lot

Kai-Uwe Bux

Nov 26 '07 #6

Ron Natalie

Kai-Uwe Bux wrote:

>
So what? Even if std::swap should happen to be implemented in such a way
that it boils down to the above three lines for built-in scalar types, that
still would not imply that

a ^= b; b ^= a; a ^= b;

conveys intend as well and as succinctly as

std::swap( a, b );

There is no point in littering code with hacks. Note: the same applies to

std::swap gets rid of having to worry about it. Presumably your
implementation knows the best way. Once you get past some simple
scalar values, the classes can define their own specialization for
swap.

The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;

might not actually allocate any memory for the temporary value, it
may in fact just be an transient register that may have been allocated
anyway.

LD R1, a
LD R2, b
ST a, R2
ST b, R1

where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2

Nov 26 '07 #7

On Nov 26, 10:56 am, indrawati.ya...@gmail.com wrote:

OK, before you all stab me with your OT pitchfork, I just want to
mention that this question is indeed related to c.l.c++. My question
is, does the well-known method to swap two integers produce undefined
behaviour in c++? The method is:

a ^= b ^= a ^= b;

It's a clever trick,

It's a stupid trick, which doesn't work reliably in any language.

As written above, it's undefined behavior in C++. But even if
you introduce sequence points, it doesn't work.

but I have a feeling something may go wrong if I
use it in C++, although quick tests with a few compilers produce
correct output.

Only in a few special cases. The basic algorithm doesn't work
reliably, so any implementation using it is bound to fail in
some cases.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 27 '07 #8

On Nov 26, 4:02 pm, Ron Natalie <r...@spamcop.netwrote:

Kai-Uwe Bux wrote:

So what? Even if std::swap should happen to be implemented in such a way
that it boils down to the above three lines for built-in scalar types, that
still would not imply that

a ^= b; b ^= a; a ^= b;

conveys intend as well and as succinctly as

std::swap( a, b );

There is no point in littering code with hacks. Note: the same applies to

std::swap gets rid of having to worry about it. Presumably your
implementation knows the best way. Once you get past some simple
scalar values, the classes can define their own specialization for
swap.

The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;

might not actually allocate any memory for the temporary value, it
may in fact just be an transient register that may have been allocated
anyway.

LD R1, a
LD R2, b
ST a, R2
ST b, R1

where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2

even old x86 machines support some 'exg' opcode that swaps two
registers in just one instruction ,so a good compiler can handle
triple assignment much better than you mentioned in case both ints
have register storage.The xor trick has less chance of such
optimization , but it is a joyfull solution to a programming delima(I
like it more than add/subtract solution).

regards,
FM

Nov 27 '07 #9

On Nov 27, 4:21 pm, terminator <farid.mehr...@gmail.comwrote:

On Nov 26, 4:02 pm, Ron Natalie <r...@spamcop.netwrote:
The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;

might not actually allocate any memory for the temporary value, it
may in fact just be an transient register that may have been allocated
anyway.

LD R1, a
LD R2, b
ST a, R2
ST b, R1

where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2

even old x86 machines support some 'exg' opcode that swaps two
registers in just one instruction ,so a good compiler can
handle triple assignment much better than you mentioned in
case both ints have register storage.The xor trick has less
chance of such optimization, but it is a joyfull solution to
a programming delima(I like it more than add/subtract
solution).

I seem to recall some 8086 compilers actually recognizing the
classical swap idiom (with the temporary) and generating the
exch instruction to implement it. Modern x86 compilers don't,
however, probably because on more recent x86 processors, xchg
has an implied lock prefix, which acts as a memory fence (which
in turn means that the instruction is considerably slower than
it would be otherwise).

I just ran some quick benchmarks on an Intel based Linux machine
here (using g++ 4.1.0, -O3), using the following "swappers":

struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;

struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;

struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,%[b] \n movl %%eax,%
[a]"
: [a] "+m" (a), [b] "+m" (b) : : "%eax" ) ;
}
} ;

On this particular machine (not sure of its actual spec's, which
processor or the clock frequency), I got:

ns per machine memory
iter. instr. accesses

SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
SwapperAsm: 84.4 4 4

Tests run on 500 million iterations. In this case, the actual
function was invoked within a virtual member function, and
swapped two member variables. And the last two columns do not
include the standard function prefix or postfix.

A quick glance at the generated assembler showed that none of
the three versions used any local variables.

And as you can see, the cost of the memory fence due to the
implicit lock prefix on the xchgl instruction is very, very
high. The generated code uses one less instruction, and has one
less memory access, but requires roughly 50 times more time to
run.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 28 '07 #10

Juha Nieminen

James Kanze wrote:

ns per machine memory
iter. instr. accesses

SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7

Couldn't the compiler recognize the "I'm trying to outsmart the
compiler when swapping these integers by using the xor trick" idiom and
change it to the classic swapping code?

After all, many compilers detect quite some idioms. For example many
compilers detect that something like "a = (a << 8) | (a >24);" is
actually a rotation and compile that as one single opcode.

Nov 28 '07 #11

Pete Becker

On 2007-11-28 09:08:01 -0500, Juha Nieminen <no****@thanks.invalidsaid:

James Kanze wrote:
>ns per machine memory
iter. instr. accesses

SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7

Couldn't the compiler recognize the "I'm trying to outsmart the
compiler when swapping these integers by using the xor trick" idiom and
change it to the classic swapping code?

No, because the xor hack has different semantics from a normal swap.
When you swap a variable with itself its value isn't changed. When you
xor it with itself it gets set to 0.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Nov 28 '07 #12

Juha Nieminen wrote:

James Kanze wrote:
ns per machine memory
iter. instr. accesses

SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7

Couldn't the compiler recognize the "I'm trying to outsmart the
compiler when swapping these integers by using the xor trick" idiom and
change it to the classic swapping code?

Only if it could prove that the swap didn't involve the same
objects. Since swapping usually takes place through references
and pointers, this would be non-trivial (and not applicable in
most of the really critical cases, in sort, for example). So
you have an optimization that is expensive to check, can rarely
be applied, and would never be applicable to well written code
anyway. Not very interesting.

After all, many compilers detect quite some idioms. For example many
compilers detect that something like "a = (a << 8) | (a >24);" is
actually a rotation and compile that as one single opcode.

That's because such an expression is *always* a rotation. And
because there's no easier nor more idiomatic way of expressing
it in C++. Neither is true for the xor swap.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 29 '07 #13

On Nov 28, 1:03 pm, James Kanze <james.ka...@gmail.comwrote:

On Nov 27, 4:21 pm, terminator <farid.mehr...@gmail.comwrote:

On Nov 26, 4:02 pm, Ron Natalie <r...@spamcop.netwrote:
The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;
might not actually allocate any memory for the temporary value, it
may in fact just be an transient register that may have been allocated
anyway.
LD R1, a
LD R2, b
ST a, R2
ST b, R1
where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2
even old x86 machines support some 'exg' opcode that swaps two
registers in just one instruction ,so a good compiler can
handle triple assignment much better than you mentioned in
case both ints have register storage.The xor trick has less
chance of such optimization, but it is a joyfull solution to
a programming delima(I like it more than add/subtract
solution).

I seem to recall some 8086 compilers actually recognizing the
classical swap idiom (with the temporary) and generating the
exch instruction to implement it. Modern x86 compilers don't,
however, probably because on more recent x86 processors, xchg
has an implied lock prefix, which acts as a memory fence (which
in turn means that the instruction is considerably slower than
it would be otherwise).

I just ran some quick benchmarks on an Intel based Linux machine
here (using g++ 4.1.0, -O3), using the following "swappers":

struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;

struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;

struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,%[b] \n movl %%eax,%
[a]"
: [a] "+m" (a), [b] "+m" (b) : : "%eax" ) ;
}
} ;

On this particular machine (not sure of its actual spec's, which
processor or the clock frequency), I got:

ns per machine memory
iter. instr. accesses

SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
SwapperAsm: 84.4 4 4

Tests run on 500 million iterations. In this case, the actual
function was invoked within a virtual member function, and
swapped two member variables. And the last two columns do not
include the standard function prefix or postfix.

A quick glance at the generated assembler showed that none of
the three versions used any local variables.

And as you can see, the cost of the memory fence due to the
implicit lock prefix on the xchgl instruction is very, very
high. The generated code uses one less instruction, and has one
less memory access, but requires roughly 50 times more time to
run.

I wrote that **if both ints have register storage class.Exchanging to
memory **has to lock** the bus (I would be suprised otherwise)for it
is the simplest facility to handle a semaphor.

regards,
FM.

Nov 29 '07 #14

On Nov 28, 5:08 pm, Juha Nieminen <nos...@thanks.invalidwrote:

James Kanze wrote:
ns per machine memory
iter. instr. accesses

SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7

Couldn't the compiler recognize the "I'm trying to outsmart the
compiler when swapping these integers by using the xor trick" idiom and
change it to the classic swapping code?

After all, many compilers detect quite some idioms. For example many
compilers detect that something like "a = (a << 8) | (a >24);" is
actually a rotation and compile that as one single opcode.

as soome body wrote upthread,'triple_xor_swap(a,a)' would reset 'a' to
zero .Actually my compiler generates 'xor r r' when it needs the 'r'
register to be zero.
OTH 'triple_assign_swap' has well defined behavior it will lead to
straight-forward code that dose not even need to be optimized for some
machines.on a machine that can simultaniously address source and
destination of a move instruction as memory locations ,we might have
something like this:

push reg;
move reg a;
move a b;
move b reg;
pop reg;

regards,
FM.

Nov 29 '07 #15

On Nov 29, 12:52 pm, James Kanze <james.ka...@gmail.comwrote:

Juha Nieminen wrote:
James Kanze wrote:
ns per machine memory
iter. instr. accesses
SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
Couldn't the compiler recognize the "I'm trying to outsmart the
compiler when swapping these integers by using the xor trick" idiom and
change it to the classic swapping code?

Only if it could prove that the swap didn't involve the same
objects. Since swapping usually takes place through references
and pointers, this would be non-trivial (and not applicable in
most of the really critical cases, in sort, for example). So
you have an optimization that is expensive to check, can rarely
be applied, and would never be applicable to well written code
anyway. Not very interesting.

After all, many compilers detect quite some idioms. For example many
compilers detect that something like "a = (a << 8) | (a >24);" is
actually a rotation and compile that as one single opcode.

That's because such an expression is *always* a rotation.

old C books say that it is shift,is the behavoir changed since then?
why not to have a shift (<<) operator along with a rotation(<<<)?

regards,
FM.

Nov 29 '07 #16

Victor Bazarov

terminator wrote:

On Nov 29, 12:52 pm, James Kanze <james.ka...@gmail.comwrote:
>Juha Nieminen wrote:
>>James Kanze wrote:
ns per machine memory
iter. instr. accesses
SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
Couldn't the compiler recognize the "I'm trying to outsmart the
compiler when swapping these integers by using the xor trick" idiom
and change it to the classic swapping code?

Only if it could prove that the swap didn't involve the same
objects. Since swapping usually takes place through references
and pointers, this would be non-trivial (and not applicable in
most of the really critical cases, in sort, for example). So
you have an optimization that is expensive to check, can rarely
be applied, and would never be applicable to well written code
anyway. Not very interesting.

>> After all, many compilers detect quite some idioms. For example
many compilers detect that something like "a = (a << 8) | (a >>
24);" is actually a rotation and compile that as one single opcode.

That's because such an expression is *always* a rotation.

old C books say that it is shift,is the behavoir changed since then?

Nope.

why not to have a shift (<<) operator along with a rotation(<<<)?

Not sure what James meant by the <<*always* a rotation>bit. My
guess is that in 1970 the machines for which C was developed didn't
have a rotation instruction, and that's why C didn't get a rotation
operator. Whether '<<<' would be the right symbol for it, I am not
sure. Triple-character operators are really only 50% different from
their two-character counterparts and would probably be much less
favorable than, say, '<%' or '>%'.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Nov 29 '07 #17

terminator wrote:

On Nov 29, 12:52 pm, James Kanze <james.ka...@gmail.comwrote:
>Juha Nieminen wrote:
James Kanze wrote:
ns per machine memory
iter. instr. accesses
SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
Couldn't the compiler recognize the "I'm trying to outsmart the
compiler when swapping these integers by using the xor trick" idiom and
change it to the classic swapping code?

Only if it could prove that the swap didn't involve the same
objects. Since swapping usually takes place through references
and pointers, this would be non-trivial (and not applicable in
most of the really critical cases, in sort, for example). So
you have an optimization that is expensive to check, can rarely
be applied, and would never be applicable to well written code
anyway. Not very interesting.

After all, many compilers detect quite some idioms. For example many
compilers detect that something like "a = (a << 8) | (a >24);" is
actually a rotation and compile that as one single opcode.

That's because such an expression is *always* a rotation.

old C books say that it is shift,is the behavoir changed since then?

I thaught a << 8 is a shift and a >24 is a shift. So,

( a << 8 ) | ( a >24 )

would be a rotation assuming a bitlength of 32.
[snip]
Best

Kai-Uwe Bux

Nov 29 '07 #18

Shadowman

James Kanze wrote:

On Nov 27, 4:21 pm, terminator <farid.mehr...@gmail.comwrote:
>On Nov 26, 4:02 pm, Ron Natalie <r...@spamcop.netwrote:
>>The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;

>>might not actually allocate any memory for the temporary value, it
may in fact just be an transient register that may have been allocated
anyway.

>> LD R1, a
LD R2, b
ST a, R2
ST b, R1

>>where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2

>even old x86 machines support some 'exg' opcode that swaps two
registers in just one instruction ,so a good compiler can
handle triple assignment much better than you mentioned in
case both ints have register storage.The xor trick has less
chance of such optimization, but it is a joyfull solution to
a programming delima(I like it more than add/subtract
solution).

I seem to recall some 8086 compilers actually recognizing the
classical swap idiom (with the temporary) and generating the
exch instruction to implement it. Modern x86 compilers don't,
however, probably because on more recent x86 processors, xchg
has an implied lock prefix, which acts as a memory fence (which
in turn means that the instruction is considerably slower than
it would be otherwise).

I just ran some quick benchmarks on an Intel based Linux machine
here (using g++ 4.1.0, -O3), using the following "swappers":

struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;

struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;

struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,%[b] \n movl %%eax,%
[a]"
: [a] "+m" (a), [b] "+m" (b) : : "%eax" ) ;
}
} ;

What's the advantage of using struct and operator(), instead of simply:

void swapperX(int& a, int& b) ?
--
SM
rot13 for email

Nov 29 '07 #19

Joe Greer

Kai-Uwe Bux <jk********@gmx.netwrote in
news:fi**********@murdoch.acc.Virginia.EDU:

terminator wrote:

>On Nov 29, 12:52 pm, James Kanze <james.ka...@gmail.comwrote:
>>Juha Nieminen wrote:
James Kanze wrote:
ns per machine memory
iter. instr. accesses
SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
Couldn't the compiler recognize the "I'm trying to outsmart the
compiler when swapping these integers by using the xor trick"
idiom and change it to the classic swapping code?

Only if it could prove that the swap didn't involve the same
objects. Since swapping usually takes place through references
and pointers, this would be non-trivial (and not applicable in
most of the really critical cases, in sort, for example). So
you have an optimization that is expensive to check, can rarely
be applied, and would never be applicable to well written code
anyway. Not very interesting.

After all, many compilers detect quite some idioms. For example
many
compilers detect that something like "a = (a << 8) | (a >24);"
is actually a rotation and compile that as one single opcode.

That's because such an expression is *always* a rotation.

old C books say that it is shift,is the behavoir changed since then?

I thaught a << 8 is a shift and a >24 is a shift. So,

( a << 8 ) | ( a >24 )

would be a rotation assuming a bitlength of 32.
[snip]
Best

Kai-Uwe Bux

I don't know about now, but in the olden days, << always 0 filled and >>
was machine dependent. On the machines I used back then, if a were
signed, then >1 filled that is, the sign bit got carried so that a >>
1 was always equivalent to a / 2. That would mean that the above would
generate a mess for a signed a rather than a rotation.

joe

Nov 29 '07 #20

REH

On Nov 29, 12:51 pm, Shadowman <funqbjzna...@pbzpnfg.argwrote:

James Kanze wrote:
On Nov 27, 4:21 pm, terminator <farid.mehr...@gmail.comwrote:
On Nov 26, 4:02 pm, Ron Natalie <r...@spamcop.netwrote:
The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;

>might not actually allocate any memory for the temporary value, it
may in fact just be an transient register that may have been allocated
anyway.

> LD R1, a
LD R2, b
ST a, R2
ST b, R1

>where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2

even old x86 machines support some 'exg' opcode that swaps two
registers in just one instruction ,so a good compiler can
handle triple assignment much better than you mentioned in
case both ints have register storage.The xor trick has less
chance of such optimization, but it is a joyfull solution to
a programming delima(I like it more than add/subtract
solution).

I seem to recall some 8086 compilers actually recognizing the
classical swap idiom (with the temporary) and generating the
exch instruction to implement it. Modern x86 compilers don't,
however, probably because on more recent x86 processors, xchg
has an implied lock prefix, which acts as a memory fence (which
in turn means that the instruction is considerably slower than
it would be otherwise).

I just ran some quick benchmarks on an Intel based Linux machine
here (using g++ 4.1.0, -O3), using the following "swappers":

struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;

struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;

struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,%[b] \n movl %%eax,%
[a]"
: [a] "+m" (a), [b] "+m" (b) : : "%eax" ) ;
}
} ;

What's the advantage of using struct and operator(), instead of simply:

void swapperX(int& a, int& b) ?

Better chances for optimization with templates like std::foreach, et.
al.

REH

Nov 29 '07 #21

On Nov 29, 4:10 pm, terminator <farid.mehr...@gmail.comwrote:

On Nov 29, 12:52 pm, James Kanze <james.ka...@gmail.comwrote:
Juha Nieminen wrote:
After all, many compilers detect quite some idioms. For example many
compilers detect that something like "a = (a << 8) | (a >24);" is
actually a rotation and compile that as one single opcode.

That's because such an expression is *always* a rotation.

I should have been more precise: on a 32 bit machine, it is
always a rotation (or it is always a rotation over the low 32
bits).

old C books say that it is shift,is the behavoir changed since
then?

You have old C books that say that the expression "(a << 8) | (a

>24)" is a shift? << and >are shift operators, but when
used in the above expression, they implement a rotation.

why not to have a shift (<<) operator along with a rotation(<<<)?

I don't know why C doesn't have a rotation operator; it has
everything else. There are a few algorithms where it would be
useful. (And some of those algorithms, such as DES encryption,
were known to the authors of C. If I'm not mistaken, the Unix
crypt() function uses a rotation.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 30 '07 #22

On Nov 29, 4:21 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:

[...]

Not sure what James meant by the <<*always* a rotation>bit.

The expression that Juha presented.

My guess is that in 1970 the machines for which C was
developed didn't have a rotation instruction, and that's why C
didn't get a rotation operator.

I think they did (although it's been a long, long time since I
looked at the machine instructions for a PDP-11). They didn't
have a machine instruction for pre-increment nor for
post-increment, however, but they're both there.

Whether '<<<' would be the right symbol for it, I am not sure.
Triple-character operators are really only 50% different from
their two-character counterparts and would probably be much
less favorable than, say, '<%' or '>%'.

Agreed. Even more favorable might be keywords, or "pre-defined"
functions: ror and rol, for example.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 30 '07 #23

On Nov 29, 7:21 pm, Joe Greer <jgr...@doubletake.comwrote:

Kai-Uwe Bux <jkherci...@gmx.netwrote
innews:fi**********@murdoch.acc.Virginia.EDU:

After all, many compilers detect quite some idioms. For example
many
compilers detect that something like "a = (a << 8) | (a >24);"
is actually a rotation and compile that as one single opcode.

>That's because such an expression is *always* a rotation.

old C books say that it is shift,is the behavoir changed since then?

I thaught a << 8 is a shift and a >24 is a shift. So,

( a << 8 ) | ( a >24 )

would be a rotation assuming a bitlength of 32.

Yes. I just sort of assumed that (and that everyone else would,
since it seems so obvious).

[snip]

I don't know about now, but in the olden days, << always 0
filled and >was machine dependent.

It depends on the type. As a general rule, I only use shifts
and bitwise operators on unsigned types, so there are no
problems.

Juha's expression is wide-spread, and is the idiomatic way to
express a rotation in C/C++. It would, I believe, be
immediately recognized as a rotation by anyone who has worked on
code where rotations were needed. (It's used in the reference
implementations of MD5 and SHA1, for example.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 30 '07 #24

On Nov 29, 3:54 pm, terminator <farid.mehr...@gmail.comwrote:

On Nov 28, 1:03 pm, James Kanze <james.ka...@gmail.comwrote:
On Nov 27, 4:21 pm, terminator <farid.mehr...@gmail.comwrote:
On Nov 26, 4:02 pm, Ron Natalie <r...@spamcop.netwrote:
The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;
might not actually allocate any memory for the temporary
value, it may in fact just be an transient register that
may have been allocated anyway.
LD R1, a
LD R2, b
ST a, R2
ST b, R1
where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2
even old x86 machines support some 'exg' opcode that swaps
two registers in just one instruction ,so a good compiler
can handle triple assignment much better than you
mentioned in case both ints have register storage.The xor
trick has less chance of such optimization, but it is a
joyfull solution to a programming delima(I like it more
than add/subtract solution).

I seem to recall some 8086 compilers actually recognizing
the classical swap idiom (with the temporary) and generating
the exch instruction to implement it. Modern x86 compilers
don't, however, probably because on more recent x86
processors, xchg has an implied lock prefix, which acts as a
memory fence (which in turn means that the instruction is
considerably slower than it would be otherwise).

I just ran some quick benchmarks on an Intel based Linux
machine here (using g++ 4.1.0, -O3), using the following
"swappers":
struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;

struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;

struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,%[b] \n movl %%eax,%
[a]"
: [a] "+m" (a), [b] "+m" (b) : : "%eax" ) ;
}
} ;

On this particular machine (not sure of its actual spec's,
which processor or the clock frequency), I got:

ns per machine memory
iter. instr. accesses

SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
SwapperAsm: 84.4 4 4

Tests run on 500 million iterations. In this case, the actual
function was invoked within a virtual member function, and
swapped two member variables. And the last two columns do not
include the standard function prefix or postfix.

A quick glance at the generated assembler showed that none of
the three versions used any local variables.

And as you can see, the cost of the memory fence due to the
implicit lock prefix on the xchgl instruction is very, very
high. The generated code uses one less instruction, and has
one less memory access, but requires roughly 50 times more
time to run.

I wrote that **if both ints have register storage
class.Exchanging to memory **has to lock** the bus (I would be
suprised otherwise)for it is the simplest facility to handle a
semaphor.

There is no such thing as "register storage class" in C++. If
you have to read both int's into a register, then the xchg
instruction is just an extra, unneeded instruction (on most
machines, anyway). And there is no real reason why xchg should
lock the bus, any more that incr or decr or any other
instruction which might require two (or more) memory accesses,
and even less reason why it should implement a memory fence.
You already have separate instructions/prefixes for those
things, which can be used in the cases where they are needed.
(The original 8086, and at least through the 80386, did not lock
the bus, nor does the normal exchange instruction on any other
architecture I've used.) And for most multi-thread algorithms,
xchg isn't enough; you need a cas instruction (or some other
combinations of instructions: g++ uses "lock add" for its atomic
increment, for example---which can't be implemented using xchg).

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 30 '07 #25