Hi,
Does anyone know of any link which describes the (relative)
performance of all kinds of C operations? e.g: how fast is "add"
comparing with "multiplication " on a typical machine.
Thanks!
--
B. Y. 36 2459
mrby wrote: Hi,
Does anyone know of any link which describes the (relative) performance of all kinds of C operations? e.g: how fast is "add" comparing with "multiplication " on a typical machine.
Pages with this sort of information are scattered about
the Web. Most that I've seen have been highly system-specific,
whether they say they are or not. One page I saw made a fairly
serious effort to assign "costs" to various C constructs, but
it seemed rooted in the days when optimizers were less radical
and when the speed disparity between CPU and memory was not so
enormous.
Nowadays it is very nearly meaningless to ask whether an
addition is slower or faster than a multiplication, even if the
data types are specified. If the operands of the addition are
in memory while those of the multiplication are in registers,
the multiplication will likely finish far sooner. A division
will likely finish far sooner; even taking the square root of
a register-resident value will likely be quicker than performing
an addition that incurs two memory references.
You can probably get an answer by ignoring the effects of
memory, of the multiple levels of cache, and of pipelining --
but the answer you get by ignoring reality is not likely to be
very helpful. It's like noticing that jet airplanes are faster
than bicycles, and therefore choosing an airplane for a one-mile
journey.
In all but a tiny and steadily diminishing fraction of cases,
micro-optimization is a waste of time and effort. Choose a good
algorithm without worrying about whether it uses multiplications
or additions, pointer arithmetic or array indexing. Then code it
as clearly and robustly as you can. Go no further unless and
until you have measured the resulting performance and found it
inadequate.
--
Eric Sosman es*****@acm-dot-org.invalid
mrby wrote: Does anyone know of any link which describes the (relative) performance of all kinds of C operations? e.g: how fast is "add" comparing with "multiplication " on a typical machine.
Well, things are not that simple, but I have an old page with this kind
of information that still stands up: http://www.pobox.com/~qed/optimize.html
where I stated that: "Arithmetic operation performance is ordered
roughly by: transcendental functions, square root, modulo, divide,
multiply, add/subtract/mutiply by power of 2/divide by power of
2/modulo by power of 2." I guess I should have added addition,
subtraction and logic operations at the end. Anyhow, this list is
still largely true on pretty much all architectures. The reason why
all architectures seem to perform so similarly on these operations is
that the best or near best techniques for building logic that perform
all those operations in hardware are generally well known. This is a
reality worth noting.
However, it should be noted that over time, memory bandwidth has not
kept pace with modern CPU speeds. Because of this, operations even as
slow as trascendental functions are now no longer slower than first
time uncached memory accesses. So if you think of memory access as an
"operation" , this would be the one major one which has changed its
relative performance over time (by getting slower).
--
Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
On 29 Apr 2006 05:55:30 -0700, "mrby" <bi******@gmail .com> wrote in
comp.lang.c: Hi,
Does anyone know of any link which describes the (relative) performance of all kinds of C operations? e.g: how fast is "add" comparing with "multiplication " on a typical machine.
Thanks!
The real reason that your question is meaningless is that there is no
such thing as a "typical machine", as far as C is concerned. This is
not just theoretical, there is a vast difference between 8-bit
microcontroller s such as an 8051 or AVR on the one hand, and 64-bit
desk top processors like Pentium or PowerPC on the other.
I do a lot of work these days with a DSP where multiplication and
addition take exactly the same amount of time, one clock cycle. And
it can also do MAC, multiply two numbers and add them to an
accumulator, in one clock cycle. Or it can square a number and add it
to an accumulator, all in one clock cycle.
--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.l earn.c-c++ http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
In article <mr************ *************** *****@4ax.com>, Jack Klein
<ja*******@spam cop.net> wrote:
.... I do a lot of work these days with a DSP where multiplication and addition take exactly the same amount of time, one clock cycle. And it can also do MAC, multiply two numbers and add them to an accumulator, in one clock cycle. Or it can square a number and add it to an accumulator, all in one clock cycle.
VERY interesting. While this is irrelevant to C, do you know how the
DSP accomplishes this? IIRC the complexity of multiplication is higher
than that of addition (perhaps the DSP parallelizes the operation
better?).
--Ron Bruck
Ronald Bruck wrote: In article <mr************ *************** *****@4ax.com>, Jack Klein <ja*******@spam cop.net> wrote: ... I do a lot of work these days with a DSP where multiplication and addition take exactly the same amount of time, one clock cycle. And it can also do MAC, multiply two numbers and add them to an accumulator, in one clock cycle. Or it can square a number and add it to an accumulator, all in one clock cycle.
VERY interesting. While this is irrelevant to C, do you know how the DSP accomplishes this? IIRC the complexity of multiplication is higher than that of addition (perhaps the DSP parallelizes the operation better?).
--Ron Bruck
So we are to assume that these one cycle quotations apply to peak
throughput, not total latency?
On Thu, 15 Jun 2006 18:29:20 GMT, Tim Prince
<ti***********@ sbcglobal.net> wrote: Ronald Bruck wrote: In article <mr************ *************** *****@4ax.com>, Jack Klein <ja*******@spam cop.net> wrote: ... I do a lot of work these days with a DSP where multiplication and addition take exactly the same amount of time, one clock cycle. And it can also do MAC, multiply two numbers and add them to an accumulator, in one clock cycle. Or it can square a number and add it to an accumulator, all in one clock cycle.
VERY interesting. While this is irrelevant to C, do you know how the DSP accomplishes this? IIRC the complexity of multiplication is higher than that of addition (perhaps the DSP parallelizes the operation better?).
--Ron Bruck So we are to assume that these one cycle quotations apply to peak throughput, not total latency?
<OT>
I am using a TI F2812, a similar processor (I believe) to the one Jack
is referring to. The "one op per clock cycle" can be sustained as long
as the code & operands reside in the CPU internal RAM.
The throughput is reduced drastically when accessing data in internal
FLASH memory, or external memory (of any kind,) since that requires
inserting a few wait states on each memory access cycle.
But still, as long as the location of the data is similar, the time
required to perform an addition, multiplication or MAC operation
remains identical.
</OT>
From: websn...@gmail. com (Paul Hsieh)
Date: Sat, Apr 29 2006 3:57 pm mrby wrote: Does anyone know of any link which describes the (relative) performance of all kinds of C operations? e.g: how fast is "add" comparing with "multiplication " on a typical machine.
Well, things are not that simple, but I have an old page with this kind of information that still stands up:
http://www.pobox.com/~qed/optimize.html
I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? Or why would
"if( ((a-b)|(c-d)|(e-f))==0 )" be faster than "if( a==b &&c==d &&e==f
)" ?
Spiros Bousbouras
An off-topic answer to an off-topic question; I shall keep it short:
In article <15************ **********@math .usc.edu>
Ronald Bruck <br***@math.usc .edu> wrote: ... do you know how the DSP accomplishes [single-cycle multiply / MAC]? IIRC the complexity of multiplication is higher than that of addition (perhaps the DSP parallelizes the operation better?).
Look up "Booth multiplier".
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
On Thu, 15 Jun 2006 18:29:20 GMT, Tim Prince <ti***********@ sbcglobal.net> wrote: VERY interesting. While this is irrelevant to C, do you know how the DSP accomplishes this?
A 16x16 multiply can be accomplished in one lookup into a 64K-word table.
--
#include <standard.discl aimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Brian Patterson |
last post by:
I have noticed in the book of words that hasattr works by calling getattr
and raising an exception if no such attribute exists. If I need the value
in any case, am I better off using getattr within a try statement myself, or
is there some clever implementation enhancement which makes this a bad idea?
i.e. should I prefer:
if...
|
by: BCC |
last post by:
Why the huge drop in performance in STL from VC6.0 to VC7.1? Particularly
with vector?
The following code shows what I mean...
Any thoughts?
Thanks,
B
|
by: Sebastian Werner |
last post by:
Howdy,
I currently develop the javascript toolkit qooxdoo
(http://qooxdoo.sourceforge.net), some of you heard it already.
We have discovered a slowdown on Internet Explorers performance when
creating objects with some data and store them in a global object
registry. It take some time to get this example extracted from our
codebase. The...
|
by: Jesper Nilsson |
last post by:
Hi,
I have imported my com dll with Visual studios "Add reference", and then i'm
using this code:
private static MyComDll connectionKit = new MyComDll();
public static void CreateBatch(object hBatch)
{ ...
connectionKit.ComCreateBatch(...)
|
by: Lakesider |
last post by:
Hi NG,
I have written an application with a lot of file- and database
operations. There are several algorithmic operations, too. My question
is: are ther any tools to improve performance
- for "normal" C# methods
- for database operations
- for memory optimization
- ...
| |
by: Bern McCarty |
last post by:
I have run an experiment to try to learn some things about floating point
performance in managed C++. I am using Visual Studio
2003. I was hoping to get a feel for whether or not it would make sense to
punch out from managed code to native code (I was using
IJW) in order to do some amount of floating point work and, if so, what that
certain...
|
by: Mike |
last post by:
Lets just say my app is done HOO HOO.
Now, I'm accessing the database via a web service and one thing i noticed
that my app is running real slow. When I first started working on the app is
ran pretty quick returned the data to the screens in about 2 - 3 seconds. Now
its going about 5 - 10 seconds. How can I beef it up for better performance.
|
by: =?Utf-8?B?V2lsc29uIEMuSy4gTmc=?= |
last post by:
Hi Experts,
I am doing a prototype of providing data access (read, write & search)
through Web Service. We observed that the data storing in SQL Server 2005,
the memory size is always within 250MB. Our aim is to support ~50K
concurrency users.
After investigation, we are thinking to use In-memory database for achieving
|
by: jehugaleahsa |
last post by:
Hello:
I am experiencing performance related issues when my custom data
structures work with value types. I use generics to prevent boxing
wherever I can. For instance, I use IEqualityComparer, etc. I have
gone through most of my data structures and verified that I don't
compare to null or call methods that would box my value types.
...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |