473,395 Members | 1,527 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

must go faster!

I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as I
can) at once.

Any ideas would be appreciatted.

Bill
Nov 17 '05 #1
11 1229
Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops. That
said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as
I can) at once.

Any ideas would be appreciatted.

Bill

Nov 17 '05 #2
Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill





"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill

"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl... Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill


Nov 17 '05 #3
Multiplication should be faster than division. Thus, instead of division by
91.0222, you can multiply by 0.010986328125

/Fredrik

"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill





"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill


"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill




Nov 17 '05 #4
Hi,

Like Fredrick said, you can make that a multiplication. Vectorised code is
useful for doing (like we do here) multiple iterations of exponentials,
logs, fourier transforms etc over 65000 digit chunks. For 900 blocks I'd
expect this kind of thing to be practically instantaneous on a modern
processor.

Steve

"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill





"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do
a lot of intensive maths and use vector libraries (from Apple and Intel)
to take care of the low-level stuff, like multiplication and division. We
use the Intel Integrated Performance Primitives library and the
performance is incredible (especially on Intel libraries) over doing
hand-coded loops. That said, it may be possible to squeeze some
performance out simply by optimising the algorithm (as you say, combining
the multiplication and division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill


"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do
a lot of intensive maths and use vector libraries (from Apple and Intel)
to take care of the low-level stuff, like multiplication and division. We
use the Intel Integrated Performance Primitives library and the
performance is incredible (especially on Intel libraries) over doing
hand-coded loops. That said, it may be possible to squeeze some
performance out simply by optimising the algorithm (as you say, combining
the multiplication and division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill



Nov 17 '05 #5
did you try compile with SSE and SSE2 (with Visual C++ 6.0/processor pack or
Visual Studio .Net) ?
pehaps this can help. This is just an idea...
Nov 17 '05 #6
did you try compile with SSE and SSE2 (with Visual C++ 6.0/processor pack or
Visual Studio .Net) ?
pehaps this can help. This is just an idea...

Nov 17 '05 #7
I am new to MSC and .NET, coming form a Unix/Linux & Borland background. I
am evaluating using MS tools instead of Borland tools for our PC/Windows
apps.

Anyhow, are you guys telling me that the expression "32768 * 360" is not
optimized by the compiler? I thought all compilers would optimize all
constant mathematical expressions. For documentation purposes I often use
things like "60 * 60 * 24" to represent seconds in a day rather than just
putting 86400. I have looked at the actual code generated and the
constants have been evaluated to a single value, at least on other
compilers...

Back to the original question, I had some ideas about your original
discussion:

1. Array libraries are great. It is often much quicker to look up complex
math than to calculate it.

2. You didn't specify what you were doing with these numbers, or the degree
of accuracy needed. Can you use binary math instead of real math? I mean
there are several published routines that use binary approximations for
things like sin(), cos() and others that are really fast, and the accuracy
is good enough for calculating rotations of thing being displayed on the
screen.

3. Again I was not sure if your example was exactly what you were trying to
do, or just a short for instance, but there are a lot of shortcuts you can
take if high precision is not necessary. Take a look at some of the math
that gamers use to do quick calculations for drawing and scaling objects on
the screen. They are not high precision, but they are FAST.
Nov 17 '05 #8
"Fred Hebert" <fh*****@hotmail.com> wrote in message
news:Xn*******************************@207.46.248. 16...
I am new to MSC and .NET, coming form a Unix/Linux & Borland background. I
am evaluating using MS tools instead of Borland tools for our PC/Windows
apps.

Anyhow, are you guys telling me that the expression "32768 * 360" is not
optimized by the compiler? I thought all compilers would optimize all
constant mathematical expressions. For documentation purposes I often use
things like "60 * 60 * 24" to represent seconds in a day rather than just
putting 86400. I have looked at the actual code generated and the
constants have been evaluated to a single value, at least on other
compilers...


Of course they are - I'm not sure how you reached that conclusion based on
this thread, but...

-cd
Nov 17 '05 #9
"Carl Daniel [VC++ MVP]"
<cp*****************************@mvps.org.nospam > wrote in
news:ON**************@TK2MSFTNGP12.phx.gbl:

Of course they are - I'm not sure how you reached that conclusion
based on this thread, but...

-cd


From the statement "I realize I could do just one multiply (instead of
multiply and divide) but" in the first message, and "Multiplication should
be faster than division. Thus, instead of division by 91.0222, you can
multiply by 0.010986328125" in the fourth message. I would not think that
made any difference.

I haven't looked at the machine code generated by the MS compiler yet, and
probably won't. I just thought it sounded odd to be worrying about things
like that or order of operators. My experience is that modern compilers
generally optimize those things pretty well.

The guys sounded like they were fairly familiar with math routines, I was
just wondering why they were worrying about something that I thought was
insignificant. 20 years ago programmers had to be concerned about the small
details. Good programmers often took time to "optimize" their code for the
compiler, manually align structures on word boundaries, etc...

Aren't modern compilers wonderful?
Nov 17 '05 #10
it has been a long time since i looked at the details, but division is
generally slower than multiplication. the reason is that the algorithms
involved in division are more complex than the routines for multiplication.

the sort of optimization you talk about is done only with integer operations
(at least that is my experience with the TI compiler for DSPs). the reason
is that most floating point math is impossible to convert to a 100 %
equivalent result if decimal values are used.

for example, in my naive programming newbie years i had a piece of code like
this:

current = 0;
while(current != 1.2)
{
//do stuff
current+= 0.3;
}

can you guess what happened: right the loop never finished. the reason for
this is that 4 times 0.3 is not equal to 1.2 (check floating point
definition if you don't believe me).

floating point math is not simple, and there are a number of rules you have
to follow if you want to have accurate results.

multiplying with the reverse of a number is not 100% the same as doing a
division. it would be inexcusable if a compiler changed the functionality of
a program without guaranteeing the same results.

dividing an integer by 2 however is often replaced by shifting the bits 1
position to the right, because it is faster and it has the exact same
result.

kind regards,
Bruno.
"Fred Hebert" <fh*****@hotmail.com> wrote in message
news:Xn*******************************@207.46.248. 16...
"Carl Daniel [VC++ MVP]"
<cp*****************************@mvps.org.nospam > wrote in
news:ON**************@TK2MSFTNGP12.phx.gbl:

Of course they are - I'm not sure how you reached that conclusion
based on this thread, but...

-cd


From the statement "I realize I could do just one multiply (instead of
multiply and divide) but" in the first message, and "Multiplication should
be faster than division. Thus, instead of division by 91.0222, you can
multiply by 0.010986328125" in the fourth message. I would not think that
made any difference.

I haven't looked at the machine code generated by the MS compiler yet, and
probably won't. I just thought it sounded odd to be worrying about things
like that or order of operators. My experience is that modern compilers
generally optimize those things pretty well.

The guys sounded like they were fairly familiar with math routines, I was
just wondering why they were worrying about something that I thought was
insignificant. 20 years ago programmers had to be concerned about the
small
details. Good programmers often took time to "optimize" their code for the
compiler, manually align structures on word boundaries, etc...

Aren't modern compilers wonderful?

Nov 17 '05 #11
On Fri, 18 Mar 2005 09:31:11 +0100, "Bruno van Dooren"
<mi******@hotmail.com> wrote:

<snip>
dividing an integer by 2 however is often replaced by shifting the bits 1
position to the right, because it is faster and it has the exact same
result.


In standard C and C++, this is necessarily true only for unsigned
values!

The result of right-shifting a signed quantity is
implementation-defined.

--
Sev
Nov 17 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

23
by: YinTat | last post by:
Hi, I learned C++ recently and I made a string class. A code example is this: class CString { public: inline CString(const char *rhs) { m_size = strlen(rhs);
98
by: jrefactors | last post by:
I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1. i++ ++i Please advise. thanks!!
65
by: Skybuck Flying | last post by:
Hi, I needed a method to determine if a point was on a line segment in 2D. So I googled for some help and so far I have evaluated two methods. The first method was only a formula, the second...
1
by: James dean | last post by:
I done a test and i really do not know the reason why a jagged array who has the same number of elements as a multidimensional array is faster here is my test. I assign a value and do a small...
12
by: Bill Moran | last post by:
Hey all. I've hit an SQL problem that I'm a bit mystified by. I have two different questions regarding this problem: why? and how do I work around it? The following query: SELECT GCP.id,...
8
by: fniles | last post by:
I have a collection inside a class, sometimes when I add to the collection, I get the error "At least one object must implement IComparable". What does the error mean ? Thanks. Public Class...
11
by: ctman770 | last post by:
Hi Everyone, Is it faster to save the precise location of an html dom node into a variable in js, or to use getElementById everytime you need to access the node? I want to make my application...
23
by: Python Maniac | last post by:
I am new to Python however I would like some feedback from those who know more about Python than I do at this time. def scrambleLine(line): s = '' for c in line: s += chr(ord(c) | 0x80)...
3
by: Nimion | last post by:
Sorry if I posted in the wrong forum, but since I'm looking at VB code I have a 50/50 chance at being wrong. :) I've been trying a variety of methods to speed up the checking for duplicates. ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.