How to optimize an algorithm for G4/Pentium 4

Richard Cavell

Hi,

I wish to write an algorithm in C++. My intention is to run it on a Mac
G4, however it would be nice to have the same program compile and run on
a Pentium 4.

The program will have to do a lot of the following with 32 bit integers:

int a=an arbitrary value;
int b=an arbitrary value;
int c=an arbitrary value;

int d = a*b; (discard the upper 32 bits)
if (d==c) do something;
int e = 2 << an arbitrary value from 1...31;
int f = d & e;

And so on; it's a series of ands, unsigned multiplies, and so on. It's
all contained in a massive loop that goes trillions of times. There's
no floating-point at all.

These calculations can be done in parallel. How do I make GCC output
Altivec instructions or SIMD instructions? How do I make it do an
unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?

Also, is there a way of testing for carry? For example,

int a = something;
int b = something;
int c = a+b;

if (carry bit is set) do something;

Jul 23 '05 #1

Subscribe Post Reply

1701

Willem

Richard wrote:
) Also, is there a way of testing for carry? For example,
)
) int a = something;
) int b = something;
) int c = a+b;
)
) if (carry bit is set) do something;

You mean like overflow ? If you're using unsigneds, simply test for c < a.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Jul 23 '05 #2

Paul Russell

Richard Cavell wrote:

These calculations can be done in parallel. How do I make GCC output
Altivec instructions or SIMD instructions? How do I make it do an
unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?

AltiVec doesn't have a 32 x 32 multiply /per se/ but you might want to
check out vBasicOps:
<http://developer.apple.com/hardware/ve/downloads/vBasicOpsPB.sit.hqx>.
This would probably still be more efficient than straight scalar code.

For more AltiVec info check out the AltiVec mailing list at
<http://www.simdtech.org/altivec>.

Paul

Jul 23 '05 #3

John Reiser

Richard Cavell wrote:

I wish to write an algorithm in C++. My intention is to run it on a Mac
G4, however it would be nice to have the same program compile and run on
a Pentium 4. [snip] Also, is there a way of testing for carry? For example,

int a = something;
int b = something;
int c = a+b;

if (carry bit is set) do something;

"Carry" is universally understood only for unsigned addition.
Subtraction or signed quantities may cause confusion.

Some machines (such as x86) set the Carry bit after the subtraction
(x - y) to be the Borrow:
CarryOut(3 - 4) is 1.
Other machines (such as PowerPC) set the Carry bit after the subtraction
(x - y) to be the Carry from (x + ~y + 1) [both additions done
by the same adder at the same time, by setting the CarryIn to 1]
which is the complement of the Borrow:
CarryOut(3 - 4) is 0.

Depending on implementation technology, the PowerPC convention may
save a few hardware logic gates. The x86 convention is better for
many programming tasks because it saves instructions when the Borrow
of a subtraction is used as the CarryIn of a following addition or
logical operation.

--
John Reiser, jr*****@BitWagon.com

Jul 23 '05 #4

Simon Slavin

On 13/02/2005, Richard Cavell wrote in message
<cu**********@nnrp.waia.asn.au>:

it's a series of ands, unsigned multiplies, and so on. It's
all contained in a massive loop that goes trillions of times. There's
no floating-point at all.

These calculations can be done in parallel. How do I make GCC output
Altivec instructions or SIMD instructions? How do I make it do an
unsigned multiply of 2 32-bit numbers and discard the upper 32 bits?

Note that GCC 4.0, which will be included free with OS X 10.4,
will do auto-vectorization and is expected to be far more
efficient at optimization:

http://developer.apple.com/macosx/tiger/xcode2.html

In the meantime my advice is simply to code the damn thing in
the simplest way possible and see what the compiler makes of it.
The optimization built into compilers these days continues to
surprise me.

Simon.
--
Using pre-release version of newsreader.
Please tell me if it does weird things.

Jul 23 '05 #5

Similar topics

Optimize my stored procedure

by: mjuricek | last post by:

I'm having some problems to optimize my stored procedure (select statement with joins) What I'm trying to do is calculate total work. My situation: I have 3 tables I'm using -Input (char...

Microsoft SQL Server

Fastest sort algorithm?

by: Richard Cavell | last post by:

Hi, I have a structure of 3 16-bit values. I have either 65000, 256000 or 1 million instances of this structure. I want to sort the lot of them according to the number contained in one of the...

C / C++

simple algorithm for finding primes

by: someone else | last post by:

hi all I'm a newbie to this group. my apologies if I break any rules. I've wrote a simple program to find the first 1,000,000 primes, and to find all primes within any range (up to 200 *...

C / C++

polynomial extended euclidean algorithm

by: Tiza Naziri | last post by:

Hi, Anybody have an idea on how to start writing a C code for generating the inverse of finite field GF(2^8) using extended Euclidean algorithm? What I mean is how to represent a polynomial,...

C / C++

How to optimize Code

by: Gaffar | last post by:

Hello, I am Handling a project in ( ASP.NET with C#.NET) in which a module is slow and inefficient. How to optimize the code. Please give me some suggestions regarding this. If you know any...

C# / C Sharp

optimize Pentium 4 and above bug!?

by: Ludde | last post by:

I beleave I have found a bug in the C++ compiler of Visual Studio 200 The correct output for the code below is "Color is: 99b2cce5" but if I turn on the Pentium 4 and above (/G7) options I get...

.NET Framework

ASPNET Performance Problem : Pentium 3 is better than Pentium 4

by: arkam | last post by:

Hi, I wanted to test and compare performances of Webservices / Assemblies and Remoting in dotnet using aspnet clients ! I performed a lot of tests and there are the results : (less is better)...

ASP.NET

Hi, how can I optimize the following code?

by: kenneth | last post by:

I was trying to use multiple thread to optimize my following code, but met some problems, anyone can help me? k are initialized. int computePot() { int i, j; for( i=0; i<500; i++ ) { for(...

C / C++

Optimize an IEnumerable

by: timor.super | last post by:

Hi group, imagine I want to find number of values of a word in a string ... Look at my code : List<stringthingsToFind = new List<string>(new string {"error", "values"}); MyClass myClass =...

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server