By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,305 Members | 1,588 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,305 IT Pros & Developers. It's quick & easy.

How to optimize an algorithm for G4/Pentium 4

P: n/a
Hi,

I wish to write an algorithm in C++. My intention is to run it on a Mac
G4, however it would be nice to have the same program compile and run on
a Pentium 4.

The program will have to do a lot of the following with 32 bit integers:

int a=an arbitrary value;
int b=an arbitrary value;
int c=an arbitrary value;

int d = a*b; (discard the upper 32 bits)
if (d==c) do something;
int e = 2 << an arbitrary value from 1...31;
int f = d & e;

And so on; it's a series of ands, unsigned multiplies, and so on. It's
all contained in a massive loop that goes trillions of times. There's
no floating-point at all.

These calculations can be done in parallel. How do I make GCC output
Altivec instructions or SIMD instructions? How do I make it do an
unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?

Also, is there a way of testing for carry? For example,

int a = something;
int b = something;
int c = a+b;

if (carry bit is set) do something;

Jul 23 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Richard wrote:
) Also, is there a way of testing for carry? For example,
)
) int a = something;
) int b = something;
) int c = a+b;
)
) if (carry bit is set) do something;

You mean like overflow ? If you're using unsigneds, simply test for c < a.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Jul 23 '05 #2

P: n/a
Richard Cavell wrote:

These calculations can be done in parallel. How do I make GCC output
Altivec instructions or SIMD instructions? How do I make it do an
unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?


AltiVec doesn't have a 32 x 32 multiply /per se/ but you might want to
check out vBasicOps:
<http://developer.apple.com/hardware/ve/downloads/vBasicOpsPB.sit.hqx>.
This would probably still be more efficient than straight scalar code.

For more AltiVec info check out the AltiVec mailing list at
<http://www.simdtech.org/altivec>.

Paul
Jul 23 '05 #3

P: n/a
Richard Cavell wrote:
I wish to write an algorithm in C++. My intention is to run it on a Mac
G4, however it would be nice to have the same program compile and run on
a Pentium 4. [snip] Also, is there a way of testing for carry? For example,

int a = something;
int b = something;
int c = a+b;

if (carry bit is set) do something;


"Carry" is universally understood only for unsigned addition.
Subtraction or signed quantities may cause confusion.

Some machines (such as x86) set the Carry bit after the subtraction
(x - y) to be the Borrow:
CarryOut(3 - 4) is 1.
Other machines (such as PowerPC) set the Carry bit after the subtraction
(x - y) to be the Carry from (x + ~y + 1) [both additions done
by the same adder at the same time, by setting the CarryIn to 1]
which is the complement of the Borrow:
CarryOut(3 - 4) is 0.

Depending on implementation technology, the PowerPC convention may
save a few hardware logic gates. The x86 convention is better for
many programming tasks because it saves instructions when the Borrow
of a subtraction is used as the CarryIn of a following addition or
logical operation.

--
John Reiser, jr*****@BitWagon.com
Jul 23 '05 #4

P: n/a
On 13/02/2005, Richard Cavell wrote in message
<cu**********@nnrp.waia.asn.au>:
it's a series of ands, unsigned multiplies, and so on. It's
all contained in a massive loop that goes trillions of times. There's
no floating-point at all.

These calculations can be done in parallel. How do I make GCC output
Altivec instructions or SIMD instructions? How do I make it do an
unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?


Note that GCC 4.0, which will be included free with OS X 10.4,
will do auto-vectorization and is expected to be far more
efficient at optimization:

http://developer.apple.com/macosx/tiger/xcode2.html

In the meantime my advice is simply to code the damn thing in
the simplest way possible and see what the compiler makes of it.
The optimization built into compilers these days continues to
surprise me.

Simon.
--
Using pre-release version of newsreader.
Please tell me if it does weird things.
Jul 23 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.