Connecting Tech Pros Worldwide Help | Site Map

How to optimize an algorithm for G4/Pentium 4

 
LinkBack Thread Tools Search this Thread
  #1  
Old July 23rd, 2005, 12:49 AM
Richard Cavell
Guest
 
Posts: n/a
Default How to optimize an algorithm for G4/Pentium 4

Hi,

I wish to write an algorithm in C++. My intention is to run it on a Mac
G4, however it would be nice to have the same program compile and run on
a Pentium 4.

The program will have to do a lot of the following with 32 bit integers:

int a=an arbitrary value;
int b=an arbitrary value;
int c=an arbitrary value;

int d = a*b; (discard the upper 32 bits)
if (d==c) do something;
int e = 2 << an arbitrary value from 1...31;
int f = d & e;

And so on; it's a series of ands, unsigned multiplies, and so on. It's
all contained in a massive loop that goes trillions of times. There's
no floating-point at all.

These calculations can be done in parallel. How do I make GCC output
Altivec instructions or SIMD instructions? How do I make it do an
unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?

Also, is there a way of testing for carry? For example,

int a = something;
int b = something;
int c = a+b;

if (carry bit is set) do something;


  #2  
Old July 23rd, 2005, 12:50 AM
Willem
Guest
 
Posts: n/a
Default Re: How to optimize an algorithm for G4/Pentium 4

Richard wrote:
) Also, is there a way of testing for carry? For example,
)
) int a = something;
) int b = something;
) int c = a+b;
)
) if (carry bit is set) do something;

You mean like overflow ? If you're using unsigneds, simply test for c < a.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
  #3  
Old July 23rd, 2005, 12:50 AM
Paul Russell
Guest
 
Posts: n/a
Default Re: How to optimize an algorithm for G4/Pentium 4

Richard Cavell wrote:[color=blue]
>
> These calculations can be done in parallel. How do I make GCC output
> Altivec instructions or SIMD instructions? How do I make it do an
> unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?
>[/color]

AltiVec doesn't have a 32 x 32 multiply /per se/ but you might want to
check out vBasicOps:
<http://developer.apple.com/hardware/ve/downloads/vBasicOpsPB.sit.hqx>.
This would probably still be more efficient than straight scalar code.

For more AltiVec info check out the AltiVec mailing list at
<http://www.simdtech.org/altivec>.

Paul
  #4  
Old July 23rd, 2005, 12:53 AM
John Reiser
Guest
 
Posts: n/a
Default Re: How to optimize an algorithm for G4/Pentium 4

Richard Cavell wrote:[color=blue]
> I wish to write an algorithm in C++. My intention is to run it on a Mac
> G4, however it would be nice to have the same program compile and run on
> a Pentium 4.[/color]
[snip][color=blue]
> Also, is there a way of testing for carry? For example,
>
> int a = something;
> int b = something;
> int c = a+b;
>
> if (carry bit is set) do something;[/color]

"Carry" is universally understood only for unsigned addition.
Subtraction or signed quantities may cause confusion.

Some machines (such as x86) set the Carry bit after the subtraction
(x - y) to be the Borrow:
CarryOut(3 - 4) is 1.
Other machines (such as PowerPC) set the Carry bit after the subtraction
(x - y) to be the Carry from (x + ~y + 1) [both additions done
by the same adder at the same time, by setting the CarryIn to 1]
which is the complement of the Borrow:
CarryOut(3 - 4) is 0.

Depending on implementation technology, the PowerPC convention may
save a few hardware logic gates. The x86 convention is better for
many programming tasks because it saves instructions when the Borrow
of a subtraction is used as the CarryIn of a following addition or
logical operation.

--
John Reiser, jreiser@BitWagon.com
  #5  
Old July 23rd, 2005, 12:55 AM
Simon Slavin
Guest
 
Posts: n/a
Default Re: How to optimize an algorithm for G4/Pentium 4

On 13/02/2005, Richard Cavell wrote in message
<cum90u$77f$1@nnrp.waia.asn.au>:
[color=blue]
> it's a series of ands, unsigned multiplies, and so on. It's
> all contained in a massive loop that goes trillions of times. There's
> no floating-point at all.
>
> These calculations can be done in parallel. How do I make GCC output
> Altivec instructions or SIMD instructions? How do I make it do an
> unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?[/color]

Note that GCC 4.0, which will be included free with OS X 10.4,
will do auto-vectorization and is expected to be far more
efficient at optimization:

http://developer.apple.com/macosx/tiger/xcode2.html

In the meantime my advice is simply to code the damn thing in
the simplest way possible and see what the compiler makes of it.
The optimization built into compilers these days continues to
surprise me.

Simon.
--
Using pre-release version of newsreader.
Please tell me if it does weird things.
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,840 network members.