must go faster! - .NET Framework

bill

I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as I
can) at once.

Any ideas would be appreciatted.

Bill

Nov 17 '05 #1

Subscribe Post Reply

1229

Steve McLellan

Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops. That
said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...

I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as
I can) at once.

Any ideas would be appreciatted.

Bill

Nov 17 '05 #2

bill

Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill

"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...

Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill

"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl... Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill

Nov 17 '05 #3

Fredrik Wahlgren

Multiplication should be faster than division. Thus, instead of division by
91.0222, you can multiply by 0.010986328125

/Fredrik

"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...

Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill

"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill

"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill

Nov 17 '05 #4

Steve McLellan

Hi,

Like Fredrick said, you can make that a multiplication. Vectorised code is
useful for doing (like we do here) multiple iterations of exponentials,
logs, fourier transforms etc over 65000 digit chunks. For 900 blocks I'd
expect this kind of thing to be practically instantaneous on a modern
processor.

Steve

"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...

Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill

"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do
a lot of intensive maths and use vector libraries (from Apple and Intel)
to take care of the low-level stuff, like multiplication and division. We
use the Intel Integrated Performance Primitives library and the
performance is incredible (especially on Intel libraries) over doing
hand-coded loops. That said, it may be possible to squeeze some
performance out simply by optimising the algorithm (as you say, combining
the multiplication and division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill

"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do
a lot of intensive maths and use vector libraries (from Apple and Intel)
to take care of the low-level stuff, like multiplication and division. We
use the Intel Integrated Performance Primitives library and the
performance is incredible (especially on Intel libraries) over doing
hand-coded loops. That said, it may be possible to squeeze some
performance out simply by optimising the algorithm (as you say, combining
the multiplication and division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill

Nov 17 '05 #5

Gilles Vollant [MVP]

did you try compile with SSE and SSE2 (with Visual C++ 6.0/processor pack or
Visual Studio .Net) ?
pehaps this can help. This is just an idea...

Nov 17 '05 #6

Gilles Vollant [MVP]

did you try compile with SSE and SSE2 (with Visual C++ 6.0/processor pack or
Visual Studio .Net) ?
pehaps this can help. This is just an idea...

Nov 17 '05 #7

Fred Hebert

I am new to MSC and .NET, coming form a Unix/Linux & Borland background. I
am evaluating using MS tools instead of Borland tools for our PC/Windows
apps.

Anyhow, are you guys telling me that the expression "32768 * 360" is not
optimized by the compiler? I thought all compilers would optimize all
constant mathematical expressions. For documentation purposes I often use
things like "60 * 60 * 24" to represent seconds in a day rather than just
putting 86400. I have looked at the actual code generated and the
constants have been evaluated to a single value, at least on other
compilers...

Back to the original question, I had some ideas about your original
discussion:

1. Array libraries are great. It is often much quicker to look up complex
math than to calculate it.

2. You didn't specify what you were doing with these numbers, or the degree
of accuracy needed. Can you use binary math instead of real math? I mean
there are several published routines that use binary approximations for
things like sin(), cos() and others that are really fast, and the accuracy
is good enough for calculating rotations of thing being displayed on the
screen.

3. Again I was not sure if your example was exactly what you were trying to
do, or just a short for instance, but there are a lot of shortcuts you can
take if high precision is not necessary. Take a look at some of the math
that gamers use to do quick calculations for drawing and scaling objects on
the screen. They are not high precision, but they are FAST.

Nov 17 '05 #8

Carl Daniel [VC++ MVP]

"Fred Hebert" <fh*****@hotmail.com> wrote in message
news:Xn*******************************@207.46.248. 16...

I am new to MSC and .NET, coming form a Unix/Linux & Borland background. I
am evaluating using MS tools instead of Borland tools for our PC/Windows
apps.

Anyhow, are you guys telling me that the expression "32768 * 360" is not
optimized by the compiler? I thought all compilers would optimize all
constant mathematical expressions. For documentation purposes I often use
things like "60 * 60 * 24" to represent seconds in a day rather than just
putting 86400. I have looked at the actual code generated and the
constants have been evaluated to a single value, at least on other
compilers...

Of course they are - I'm not sure how you reached that conclusion based on
this thread, but...

-cd

Nov 17 '05 #9

Fred Hebert

"Carl Daniel [VC++ MVP]"
<cp*****************************@mvps.org.nospam > wrote in
news:ON**************@TK2MSFTNGP12.phx.gbl:

Of course they are - I'm not sure how you reached that conclusion
based on this thread, but...

-cd

From the statement "I realize I could do just one multiply (instead of
multiply and divide) but" in the first message, and "Multiplication should
be faster than division. Thus, instead of division by 91.0222, you can
multiply by 0.010986328125" in the fourth message. I would not think that
made any difference.

I haven't looked at the machine code generated by the MS compiler yet, and
probably won't. I just thought it sounded odd to be worrying about things
like that or order of operators. My experience is that modern compilers
generally optimize those things pretty well.

The guys sounded like they were fairly familiar with math routines, I was
just wondering why they were worrying about something that I thought was
insignificant. 20 years ago programmers had to be concerned about the small
details. Good programmers often took time to "optimize" their code for the
compiler, manually align structures on word boundaries, etc...

Aren't modern compilers wonderful?

Nov 17 '05 #10

Bruno van Dooren

it has been a long time since i looked at the details, but division is
generally slower than multiplication. the reason is that the algorithms
involved in division are more complex than the routines for multiplication.

the sort of optimization you talk about is done only with integer operations
(at least that is my experience with the TI compiler for DSPs). the reason
is that most floating point math is impossible to convert to a 100 %
equivalent result if decimal values are used.

for example, in my naive programming newbie years i had a piece of code like
this:

current = 0;
while(current != 1.2)
{
//do stuff
current+= 0.3;
}

can you guess what happened: right the loop never finished. the reason for
this is that 4 times 0.3 is not equal to 1.2 (check floating point
definition if you don't believe me).

floating point math is not simple, and there are a number of rules you have
to follow if you want to have accurate results.

multiplying with the reverse of a number is not 100% the same as doing a
division. it would be inexcusable if a compiler changed the functionality of
a program without guaranteeing the same results.

dividing an integer by 2 however is often replaced by shifting the bits 1
position to the right, because it is faster and it has the exact same
result.

kind regards,
Bruno.
"Fred Hebert" <fh*****@hotmail.com> wrote in message
news:Xn*******************************@207.46.248. 16...

"Carl Daniel [VC++ MVP]"
<cp*****************************@mvps.org.nospam > wrote in
news:ON**************@TK2MSFTNGP12.phx.gbl:

Of course they are - I'm not sure how you reached that conclusion
based on this thread, but...

-cd

From the statement "I realize I could do just one multiply (instead of
multiply and divide) but" in the first message, and "Multiplication should
be faster than division. Thus, instead of division by 91.0222, you can
multiply by 0.010986328125" in the fourth message. I would not think that
made any difference.

I haven't looked at the machine code generated by the MS compiler yet, and
probably won't. I just thought it sounded odd to be worrying about things
like that or order of operators. My experience is that modern compilers
generally optimize those things pretty well.

The guys sounded like they were fairly familiar with math routines, I was
just wondering why they were worrying about something that I thought was
insignificant. 20 years ago programmers had to be concerned about the
small
details. Good programmers often took time to "optimize" their code for the
compiler, manually align structures on word boundaries, etc...

Aren't modern compilers wonderful?

Nov 17 '05 #11

Severian

On Fri, 18 Mar 2005 09:31:11 +0100, "Bruno van Dooren"
<mi******@hotmail.com> wrote:

<snip>

dividing an integer by 2 however is often replaced by shifting the bits 1
position to the right, because it is faster and it has the exact same
result.

In standard C and C++, this is necessarily true only for unsigned
values!

The result of right-shifting a signed quantity is
implementation-defined.

--
Sev

Nov 17 '05 #12

Similar topics

Faster than STL string class?

by: YinTat | last post by:

Hi, I learned C++ recently and I made a string class. A code example is this: class CString { public: inline CString(const char *rhs) { m_size = strlen(rhs);

C / C++

why prefix increment is faster than postfix increment?

by: jrefactors | last post by:

I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1. i++ ++i Please advise. thanks!!

C / C++

Point on Line Segment in 2D. Which code is faster ? Can you improve it ?

by: Skybuck Flying | last post by:

Hi, I needed a method to determine if a point was on a line segment in 2D. So I googled for some help and so far I have evaluated two methods. The first method was only a formula, the second...

C / C++

Are even square jagged arrays faster?

by: James dean | last post by:

I done a test and i really do not know the reason why a jagged array who has the same number of elements as a multidimensional array is faster here is my test. I assign a value and do a small...

C# / C Sharp

field must appear in the GROUP BY clause or be used in an aggregatefunction?

by: Bill Moran | last post by:

Hey all. I've hit an SQL problem that I'm a bit mystified by. I have two different questions regarding this problem: why? and how do I work around it? The following query: SELECT GCP.id,...

PostgreSQL Database

At least one object must implement IComparable

by: fniles | last post by:

I have a collection inside a class, sometimes when I add to the collection, I get the error "At least one object must implement IComparable". What does the error mean ? Thanks. Public Class...

Visual Basic .NET

What's faster, saving an HTML DOM node as a variable, or using getElementById?

by: ctman770 | last post by:

Hi Everyone, Is it faster to save the precise location of an html dom node into a variable in js, or to use getElementById everytime you need to access the node? I want to make my application...

Javascript

I could use some help making this Python code run faster using only Python code.

by: Python Maniac | last post by:

I am new to Python however I would like some feedback from those who know more about Python than I do at this time. def scrambleLine(line): s = '' for c in line: s += chr(ord(c) | 0x80)...

Python

There must be a faster way to dup check in Access than this...

by: Nimion | last post by:

Sorry if I posted in the wrong forum, but since I'm looking at VB code I have a 50/50 chance at being wrong. :) I've been trying a variety of methods to speed up the checking for duplicates. ...

Visual Basic 4 / 5 / 6

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General