number of machine instructions - for algorithm measuring

ben

hello,

i'm trying to answer a "prove an upper bound on the number of machine
instructions required to process M connections on N objects" exercise
from an algorithms book. i don't understand a very basic detail of how
i should go about answering the exercise. here's a simplified example
question and example programme to hopefully get at what i'd like to
know:

Prove an upper bound on the number of machine instructions required to
process N objects using the below programme. You may
assume, for example, that any C assignment statement always requires
less than c instructions, for some fixed constant c.

/* a very simple and silly example "algorithm" */
#include <stdio.h>
#define N 10000

int main(void)
{
int n = 0;
int x[3] = { 1, 4, 9 };
long unsigned total = 0;

while( n < N ) {
total += x[ n % 3 ];
n++;
}

printf("%lu\n", total);
return 0;
}

i know the "upper bound" aspect of the question isn't particularly
meaningful with this programme, as after the value of N has been
decided there's nothing variable about the number of times the loop is
run -- upper and lower bound the same -- one fixed, by N, value. but i
wanted to leave the question as unchanged as possible.

to answer that exercise, is it a case of assigning a certain number of
machine instructions to each and every C statement, or operation, in
the loop, adding those up, and then multiplying that by the number of
loops?

it's the "You may assume, for example, that any C assignment statement
always requires less than c instructions, for some fixed constant c."
part i'm unsure on.

how exactly would you go about answering the above exercise (ignoring
the sillyness part of it)?

thanks, ben.

Nov 14 '05 #1

Subscribe Post Reply

1891

Tom St Denis

ben wrote:

hello,

Not about CLC...

What do you do in big-Oh with constants?

e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
though] that's O(n) time... then the outer loop is O(nlogn) so its the
product or O(n^2logn) time.

.... so the question is really moot.

If you want really precise timings you're gonna have to sample.
Compilers vary too much and they have too many options to enumerate all
possible results..

Tom

Nov 14 '05 #2

ben

In article <11**********************@o13g2000cwo.googlegroups .com>, Tom
St Denis <to********@gmail.com> wrote:

e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
though]

yes, that's the part that i'm asking about and hoping to find out
about. my question is about this part of the exercise question:

"You may assume, for example, that any C assignment statement always
requires less than c instructions, for some fixed constant c."

how would you carry out that part with the example loop -- just for one
loop is all that's needed.

the higher level aspect of the exercise question that you're talking
about Tom, i understand -- that's why i gave a very simple bit of code
and labelled it silly and stated "after the value of N has been
decided there's nothing variable about the number of times the loop is
run -- upper and lower bound the same -- one fixed, by N, value. but i
wanted to leave the question as unchanged as possible" -- there's no
need for the variable run aspect to get at what i'm asking about.

so basically how do you apply "You may assume, for example, that any C
assignment statement always requires less than c instructions, for some
fixed constant c." to

while( n < N ) {
total += x[ n % 3 ];
n++;
}

?

thanks, ben.

Nov 14 '05 #3

Michael Mair

ben wrote:

In article <11**********************@o13g2000cwo.googlegroups .com>, Tom
St Denis <to********@gmail.com> wrote:
e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
though]
yes, that's the part that i'm asking about and hoping to find out
about. my question is about this part of the exercise question:

"You may assume, for example, that any C assignment statement always
requires less than c instructions, for some fixed constant c."

how would you carry out that part with the example loop -- just for one
loop is all that's needed.

I am not sure how this is meant -- plainly just looking for assignments
seems stupid. "Operation" is more meaningful

[snip]
so basically how do you apply "You may assume, for example, that any C
assignment statement always requires less than c instructions, for some
fixed constant c." to

while( n < N ) {
total += x[ n % 3 ];
n++;
}

Assignments: 2 per loop -> 2*c*N. Complete nonsense.

Operations:
n % 3 -> temp1
x + temp1 -> temp2
total + *temp2 -> total
or:
*temp2 -> temp3
total + temp3 -> total

n + 1 -> n

and of course
n < N -> stay in the loop
else -> leave the loop
or intermediate assignment of n<N -> temp0 and test afterwards.

Would be [5..7]*c*N.

IMO, the exercise should be stated in a clearer way.
Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

Nov 14 '05 #4

ben

In article <38*************@individual.net>, Michael Mair
<Mi**********@invalid.invalid> wrote:

ben wrote:
In article <11**********************@o13g2000cwo.googlegroups .com>, Tom
St Denis <to********@gmail.com> wrote:
e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
though]
yes, that's the part that i'm asking about and hoping to find out
about. my question is about this part of the exercise question:

"You may assume, for example, that any C assignment statement always
requires less than c instructions, for some fixed constant c."

how would you carry out that part with the example loop -- just for one
loop is all that's needed.

I am not sure how this is meant -- plainly just looking for assignments
seems stupid. "Operation" is more meaningful

[snip]
so basically how do you apply "You may assume, for example, that any C
assignment statement always requires less than c instructions, for some
fixed constant c." to

while( n < N ) {
total += x[ n % 3 ];
n++;
}

Assignments: 2 per loop -> 2*c*N. Complete nonsense.

Operations:
n % 3 -> temp1
x + temp1 -> temp2
total + *temp2 -> total
or:
*temp2 -> temp3
total + temp3 -> total

n + 1 -> n

and of course
n < N -> stay in the loop
else -> leave the loop
or intermediate assignment of n<N -> temp0 and test afterwards.

Would be [5..7]*c*N.

right. great.

IMO, the exercise should be stated in a clearer way.

great. it was unclear to me also, party for the 'assignment' reason
which i also thought was odd and partly not understanding exactly how
it should be broken down. so i wasn't sure if it was unclear to me
through my lack of knowledge, or due to the exercise being badly
written -- a bit of both was the answer to that.

thanks very much for the reply Michael,

ben.

Nov 14 '05 #5

Mark F. Haigh

ben wrote:

<snip>

Prove an upper bound on the number of machine instructions required to process N objects using the below programme. You may
assume, for example, that any C assignment statement always requires
less than c instructions, for some fixed constant c.

/* a very simple and silly example "algorithm" */
#include <stdio.h>
#define N 10000

int main(void)
{
int n = 0;
int x[3] = { 1, 4, 9 };
long unsigned total = 0;

while( n < N ) {
total += x[ n % 3 ];
n++;
}

printf("%lu\n", total);
return 0;
}

In my opinion, this is a waste of time. Instruction count no longer
has much of a bearing on how much time a program takes to execute. To
use your example program, with N set to 0x4FFFFFFFu:
[mark@icepick ~]$ gcc -save-temps -Wall -O3 -o bar bar.c && time ./bar
&& wc -l bar.s
1968526669

real 0m16.683s
user 0m16.682s
sys 0m0.002s
44 bar.s <- lines of assembly
[mark@icepick ~]$ gcc -save-temps -Wall -O3 -march=pentium4 -o bar
bar.c && time ./bar && wc -l bar.s
1968526669

real 0m4.313s
user 0m4.312s
sys 0m0.001s
78 bar.s <- lines of assembly
As you can see, a change in an optimization setting roughly doubled the
assembly output, but resulted in a 4x speed improvement. I picked the
quickest of 3 trials each, for the record.

It may be an interesting academic exercise, but its real world
applicability is questionable. Probably not the answer you wanted, but
an answer nonetheless.
Mark F. Haigh
mf*****@sbcglobal.net

Nov 14 '05 #6

ben

In article <11**********************@g14g2000cwa.googlegroups .com>,
Mark F. Haigh <mf*****@sbcglobal.net> wrote:

ben wrote:

<snip>

Prove an upper bound on the number of machine instructions required

to
process N objects using the below programme. You may
assume, for example, that any C assignment statement always requires
less than c instructions, for some fixed constant c.

/* a very simple and silly example "algorithm" */
#include <stdio.h>
#define N 10000

int main(void)
{
int n = 0;
int x[3] = { 1, 4, 9 };
long unsigned total = 0;

while( n < N ) {
total += x[ n % 3 ];
n++;
}

printf("%lu\n", total);
return 0;
}

In my opinion, this is a waste of time. Instruction count no longer
has much of a bearing on how much time a program takes to execute. To
use your example program, with N set to 0x4FFFFFFFu:
[mark@icepick ~]$ gcc -save-temps -Wall -O3 -o bar bar.c && time ./bar
&& wc -l bar.s
1968526669

real 0m16.683s
user 0m16.682s
sys 0m0.002s
44 bar.s <- lines of assembly
[mark@icepick ~]$ gcc -save-temps -Wall -O3 -march=pentium4 -o bar
bar.c && time ./bar && wc -l bar.s
1968526669

real 0m4.313s
user 0m4.312s
sys 0m0.001s
78 bar.s <- lines of assembly
As you can see, a change in an optimization setting roughly doubled the
assembly output, but resulted in a 4x speed improvement. I picked the
quickest of 3 trials each, for the record.

It may be an interesting academic exercise, but its real world
applicability is questionable. Probably not the answer you wanted, but
an answer nonetheless.

Mark,

well, it sounds like a good point. if you're right then it's a very
good answer and one i'd definitely want. the exercise is just from a
book. i don't have to do it as stated by the exercise. if the way
that's detailed isn't so useful and there's a much more useful way then
i'd prefer to learn and do it a way that's useful. i do intend to use
this stuff in the real world -- that's what i'm learning it for.

i'm just wondering though, if you're comparing operation counts of code
compiled using the same settings, and the different programmes are
written in a similar fashion (that is, for example, not comparing one
programme that unrolls a loop and another that doesn't -- code written
in the same style) then the number of operations is a reasonable way to
compare? basically, use the same base settings and style. (although
"code written in the same style" is probably pretty ambiguous)

further thought: different operations take different amounts of time
don't they? -- i think so. so that'd make operation counting less
useful. for instance i think accessing a word of memory that isn't in
the cpu's cache can take several clock cycles, whereas if that data is
in cache the access might take just one clock cycle. correct?

so basically you're saying timing is the way to measure algorithms?

thanks, ben

Nov 14 '05 #7

Mark F. Haigh

ben wrote:

well, it sounds like a good point. if you're right then it's a very
good answer and one i'd definitely want. the exercise is just from a
book. i don't have to do it as stated by the exercise. if the way
that's detailed isn't so useful and there's a much more useful way then i'd prefer to learn and do it a way that's useful. i do intend to use
this stuff in the real world -- that's what i'm learning it for.
Good decision. Can't remember who said "In theory, theory and practice
are the same, in practice, they're different."

i'm just wondering though, if you're comparing operation counts of code compiled using the same settings, and the different programmes are
written in a similar fashion (that is, for example, not comparing one
programme that unrolls a loop and another that doesn't -- code written in the same style) then the number of operations is a reasonable way to compare? basically, use the same base settings and style. (although
"code written in the same style" is probably pretty ambiguous)
These days, it would be unwise to manually unroll a loop in code
without timings to back it up. Compilers typically understand the
associated branching costs (whether cheap or expensive, cost of
mispredicted branches, etc) and pipeline scheduling better than most
people do.

further thought: different operations take different amounts of time
don't they? -- i think so. so that'd make operation counting less
useful.
Yes, although it depends on the processor architecture exactly how
different the numbers can be. Some instructions can be executed in
parallel. Some instructions may need to be internally broken down into
smaller micro-ops, each of which is individually executed.
for instance i think accessing a word of memory that isn't in
the cpu's cache can take several clock cycles, whereas if that data is in cache the access might take just one clock cycle. correct?
Worse, much of the time, unless you're dealing with older or embedded
processors. It can take hundreds or even thousands of cycles on newer
fast machines. Typically it's not one cycle from cache, either, and it
will depend on which cache it's coming from.

For the sake of illustration, let's revisit a previous topic-- loop
unrolling. Let's say you manually unrolled a bunch of loops all over
your code. Are you sure that this new bulk of code does not force some
other critical part of code out of cache, resulting in several hundred
cycle wastage every time it has to be re-fetched? Now the several
cycles you saved looks like nothing. Think of the compiler as your
friend that helps you find the sweet spot amongst many different
competing factors.

so basically you're saying timing is the way to measure algorithms?

Absolutely. First, optimize algorithmic complexity. Second, implement
it cleanly. Third, play with your compiler settings/hints and profile
it. Fourth and last, which is rarely needed, micro-optimize.

The fastest linked list in the world will get creamed by a slow hash or
tree if it's used inappropriately. Compilers do a better job of
optimizing if the code is implemented cleanly. Profiling shows you
where the bulk of _time_ is spent, so you don't go chasing false leads
and trying to optimize things that will give you a .00001% increase.
Mark F. Haigh
mf*****@sbcglobal.net

Nov 14 '05 #8

ben

Mark,

ok, thanks for the info

ben.

In article <11**********************@o13g2000cwo.googlegroups .com>,
Mark F. Haigh <mf*****@sbcglobal.net> wrote:

ben wrote:

well, it sounds like a good point. if you're right then it's a very
good answer and one i'd definitely want. the exercise is just from a
book. i don't have to do it as stated by the exercise. if the way
that's detailed isn't so useful and there's a much more useful way

then
i'd prefer to learn and do it a way that's useful. i do intend to use
this stuff in the real world -- that's what i'm learning it for.

Good decision. Can't remember who said "In theory, theory and practice
are the same, in practice, they're different."

i'm just wondering though, if you're comparing operation counts of

code
compiled using the same settings, and the different programmes are
written in a similar fashion (that is, for example, not comparing one
programme that unrolls a loop and another that doesn't -- code

written
in the same style) then the number of operations is a reasonable way

to
compare? basically, use the same base settings and style. (although
"code written in the same style" is probably pretty ambiguous)

These days, it would be unwise to manually unroll a loop in code
without timings to back it up. Compilers typically understand the
associated branching costs (whether cheap or expensive, cost of
mispredicted branches, etc) and pipeline scheduling better than most
people do.

further thought: different operations take different amounts of time
don't they? -- i think so. so that'd make operation counting less
useful.

Yes, although it depends on the processor architecture exactly how
different the numbers can be. Some instructions can be executed in
parallel. Some instructions may need to be internally broken down into
smaller micro-ops, each of which is individually executed.
for instance i think accessing a word of memory that isn't in
the cpu's cache can take several clock cycles, whereas if that data

is
in cache the access might take just one clock cycle. correct?

Worse, much of the time, unless you're dealing with older or embedded
processors. It can take hundreds or even thousands of cycles on newer
fast machines. Typically it's not one cycle from cache, either, and it
will depend on which cache it's coming from.

For the sake of illustration, let's revisit a previous topic-- loop
unrolling. Let's say you manually unrolled a bunch of loops all over
your code. Are you sure that this new bulk of code does not force some
other critical part of code out of cache, resulting in several hundred
cycle wastage every time it has to be re-fetched? Now the several
cycles you saved looks like nothing. Think of the compiler as your
friend that helps you find the sweet spot amongst many different
competing factors.

so basically you're saying timing is the way to measure algorithms?

Absolutely. First, optimize algorithmic complexity. Second, implement
it cleanly. Third, play with your compiler settings/hints and profile
it. Fourth and last, which is rarely needed, micro-optimize.

The fastest linked list in the world will get creamed by a slow hash or
tree if it's used inappropriately. Compilers do a better job of
optimizing if the code is implemented cleanly. Profiling shows you
where the bulk of _time_ is spent, so you don't go chasing false leads
and trying to optimize things that will give you a .00001% increase.
Mark F. Haigh
mf*****@sbcglobal.net

Nov 14 '05 #9

by: Mike P | last post by:

I will be passing my function a two dimensional array of varying length. Within that array is one data point, and the number of times it should loop through. So, for example, I might pass this...

Javascript

Does any CVM [ C virtual Machine] exist?

by: Minti | last post by:

Hi there everybody, I was just wondering that too many people choose to use language like Java because of its architecture independence, this AI is achieved because Java is as such a...

C / C++

Extracting Digits from a Number

by: Kosio | last post by:

Hello, I know of a way to extract digits from a number using the %10 and divide by 10. But I am wondering if there is an algorithm out there that does not use a divide by 10 feature. The...

C / C++

Storage of char in 64 bit machine

by: aruna.mysore | last post by:

Hi all, I have a simple definitioin in a C file something like this. main() { char a; ....... int k; }

C / C++

109

Number of digits in N!

by: jmcgill | last post by:

Hello. Is there a method for computing the number of digits, in a given numeric base, of N factorial, without actually computing the factorial? For example, 8! has 5 digits in base 10; 10! has...

C / C++

<faqentry>How do I convert a Number into a String with exactly 2 decimal places?</faqentry>

by: VK | last post by:

http://groups.google.com/group/comp.lang.javascript/browse_frm/thread/ b495b4898808fde0> is more than one month old - this may pose problem for posting over some news servers. This is why I'm...

Javascript

checking if a number is zero within machine precision

by: H.S. | last post by:

Hello, I am trying out a few methods with which to test of a given number is practically zero. as an example, does the following test correctly if a given number is zero within machine...

C / C++

How to count Number of Instructions executed in One Second

by: coaassign | last post by:

Hi all I need to get a count the no of instructions executed in a for loop in exactly one second... Can any one please give me the code for this... THANKS IN ADVANCE

C / C++

Number of machine instructions generated by code

by: justinanthonyhamilton | last post by:

Hello, everyone. I recently took a class on Data Structures in C++ (using D.S. Malik's C++ Programming: Program Design Including Data Structures), and while I learned a good about specific data...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

number of machine instructions - for algorithm measuring

Similar topics