473,659 Members | 2,666 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

must go faster!

I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as I
can) at once.

Any ideas would be appreciatted.

Bill
Nov 17 '05 #1
11 1256
Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops. That
said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as
I can) at once.

Any ideas would be appreciatted.

Bill

Nov 17 '05 #2
Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round(( double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill





"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0******** ******@TK2MSFTN GP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill

"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0******** ******@TK2MSFTN GP09.phx.gbl... Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill


Nov 17 '05 #3
Multiplication should be faster than division. Thus, instead of division by
91.0222, you can multiply by 0.010986328125

/Fredrik

"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round(( double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill





"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0******** ******@TK2MSFTN GP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill


"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0******** ******@TK2MSFTN GP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill




Nov 17 '05 #4
Hi,

Like Fredrick said, you can make that a multiplication. Vectorised code is
useful for doing (like we do here) multiple iterations of exponentials,
logs, fourier transforms etc over 65000 digit chunks. For 900 blocks I'd
expect this kind of thing to be practically instantaneous on a modern
processor.

Steve

"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round(( double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill





"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0******** ******@TK2MSFTN GP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do
a lot of intensive maths and use vector libraries (from Apple and Intel)
to take care of the low-level stuff, like multiplication and division. We
use the Intel Integrated Performance Primitives library and the
performance is incredible (especially on Intel libraries) over doing
hand-coded loops. That said, it may be possible to squeeze some
performance out simply by optimising the algorithm (as you say, combining
the multiplication and division).

Steve
"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill


"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0******** ******@TK2MSFTN GP09.phx.gbl...
Hi,

If you post the algorithm, people may be able to help optimise it. We do
a lot of intensive maths and use vector libraries (from Apple and Intel)
to take care of the low-level stuff, like multiplication and division. We
use the Intel Integrated Performance Primitives library and the
performance is incredible (especially on Intel libraries) over doing
hand-coded loops. That said, it may be possible to squeeze some
performance out simply by optimising the algorithm (as you say, combining
the multiplication and division).

Steve
"bill" <wj****@hotmail .com> wrote in message
news:11******** *****@corp.supe rnews.com...
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill



Nov 17 '05 #5
did you try compile with SSE and SSE2 (with Visual C++ 6.0/processor pack or
Visual Studio .Net) ?
pehaps this can help. This is just an idea...
Nov 17 '05 #6
did you try compile with SSE and SSE2 (with Visual C++ 6.0/processor pack or
Visual Studio .Net) ?
pehaps this can help. This is just an idea...

Nov 17 '05 #7
I am new to MSC and .NET, coming form a Unix/Linux & Borland background. I
am evaluating using MS tools instead of Borland tools for our PC/Windows
apps.

Anyhow, are you guys telling me that the expression "32768 * 360" is not
optimized by the compiler? I thought all compilers would optimize all
constant mathematical expressions. For documentation purposes I often use
things like "60 * 60 * 24" to represent seconds in a day rather than just
putting 86400. I have looked at the actual code generated and the
constants have been evaluated to a single value, at least on other
compilers...

Back to the original question, I had some ideas about your original
discussion:

1. Array libraries are great. It is often much quicker to look up complex
math than to calculate it.

2. You didn't specify what you were doing with these numbers, or the degree
of accuracy needed. Can you use binary math instead of real math? I mean
there are several published routines that use binary approximations for
things like sin(), cos() and others that are really fast, and the accuracy
is good enough for calculating rotations of thing being displayed on the
screen.

3. Again I was not sure if your example was exactly what you were trying to
do, or just a short for instance, but there are a lot of shortcuts you can
take if high precision is not necessary. Take a look at some of the math
that gamers use to do quick calculations for drawing and scaling objects on
the screen. They are not high precision, but they are FAST.
Nov 17 '05 #8
"Fred Hebert" <fh*****@hotmai l.com> wrote in message
news:Xn******** *************** ********@207.46 .248.16...
I am new to MSC and .NET, coming form a Unix/Linux & Borland background. I
am evaluating using MS tools instead of Borland tools for our PC/Windows
apps.

Anyhow, are you guys telling me that the expression "32768 * 360" is not
optimized by the compiler? I thought all compilers would optimize all
constant mathematical expressions. For documentation purposes I often use
things like "60 * 60 * 24" to represent seconds in a day rather than just
putting 86400. I have looked at the actual code generated and the
constants have been evaluated to a single value, at least on other
compilers...


Of course they are - I'm not sure how you reached that conclusion based on
this thread, but...

-cd
Nov 17 '05 #9
"Carl Daniel [VC++ MVP]"
<cp************ *************** **@mvps.org.nos pam> wrote in
news:ON******** ******@TK2MSFTN GP12.phx.gbl:

Of course they are - I'm not sure how you reached that conclusion
based on this thread, but...

-cd


From the statement "I realize I could do just one multiply (instead of
multiply and divide) but" in the first message, and "Multiplica tion should
be faster than division. Thus, instead of division by 91.0222, you can
multiply by 0.010986328125" in the fourth message. I would not think that
made any difference.

I haven't looked at the machine code generated by the MS compiler yet, and
probably won't. I just thought it sounded odd to be worrying about things
like that or order of operators. My experience is that modern compilers
generally optimize those things pretty well.

The guys sounded like they were fairly familiar with math routines, I was
just wondering why they were worrying about something that I thought was
insignificant. 20 years ago programmers had to be concerned about the small
details. Good programmers often took time to "optimize" their code for the
compiler, manually align structures on word boundaries, etc...

Aren't modern compilers wonderful?
Nov 17 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

23
4742
by: YinTat | last post by:
Hi, I learned C++ recently and I made a string class. A code example is this: class CString { public: inline CString(const char *rhs) { m_size = strlen(rhs);
98
14400
by: jrefactors | last post by:
I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1. i++ ++i Please advise. thanks!!
65
12568
by: Skybuck Flying | last post by:
Hi, I needed a method to determine if a point was on a line segment in 2D. So I googled for some help and so far I have evaluated two methods. The first method was only a formula, the second method was a piece of C code which turned out to be incorrect and incomplete but by modifieing it would still be usuable. The first method was this piece of text:
1
2721
by: James dean | last post by:
I done a test and i really do not know the reason why a jagged array who has the same number of elements as a multidimensional array is faster here is my test. I assign a value and do a small calculation. Even if i initialise the jagged array inside the function it is still much faster. Are these results correct?. If i put the initialisation loop in the constructor its ridiculously faster but even here its 4 times faster...is this correct?...
12
25253
by: Bill Moran | last post by:
Hey all. I've hit an SQL problem that I'm a bit mystified by. I have two different questions regarding this problem: why? and how do I work around it? The following query: SELECT GCP.id, GCP.Name FROM Gov_Capital_Project GCP,
8
14746
by: fniles | last post by:
I have a collection inside a class, sometimes when I add to the collection, I get the error "At least one object must implement IComparable". What does the error mean ? Thanks. Public Class SessionClass Private Quotes As Collection = New Collection Sub NewQuote(ByVal Message As String) Dim swError As StreamWriter
11
2982
by: ctman770 | last post by:
Hi Everyone, Is it faster to save the precise location of an html dom node into a variable in js, or to use getElementById everytime you need to access the node? I want to make my application as fast as possible. I have about 10-20 id tags that need to be accessed and modified from time to time. Would the jvm perform slowly if I stored all of the dom node strings "document.node.child...." into a huge js array?
23
2511
by: Python Maniac | last post by:
I am new to Python however I would like some feedback from those who know more about Python than I do at this time. def scrambleLine(line): s = '' for c in line: s += chr(ord(c) | 0x80) return s def descrambleLine(line):
3
1419
by: Nimion | last post by:
Sorry if I posted in the wrong forum, but since I'm looking at VB code I have a 50/50 chance at being wrong. :) I've been trying a variety of methods to speed up the checking for duplicates. Right now I'm using a simple while statement and if statement to go through the database row by row checking the field I want, and then moving onto the next. At the moment it works fine because the database is in its infancy, but I know down the...
0
8427
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8746
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8523
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8626
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7355
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4175
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4334
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2749
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1737
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.