473,548 Members | 2,636 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Execution speed question

I am performing simulations on networks (graphs). I have a question on
speed of execution (assuming very ample memory for now). I simplify the
details of my simulation below, as the question I ask applies more
generally than my specific case. I would greatly appreciate general
feedback in terms of computing and of course considerations specific to
implementation in Python.

The nodes in my network may be ON or OFF. The network starts off with
all nodes in the OFF state. I loop through the nodes. For each node
that is OFF, I consider some probability of it turning ON based on the
states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING
WHICH ONES TO TURN ON.

So my question is whether it is faster to

1. loop through a list of ALL nodes and check for OFF nodes using ifs

or to

2. loop through a container of OFF nodes and remove from this when they
turn ON

The second would result in looping through less nodes, especially as the
simulation progresses, but how does the cost of removal compare with
cheap ifs and would the extra memory usage affect performance.

I an appreciate that the cost of the if check, the number of nodes, and
the type of container I use will come into the answer.

In my case, the ifs are cheap boolean queries (whether the node is ON or
OFF). The number of nodes is very large: millions for sure, maybe tens
of millions. If considering (2), take note of my BOLD text above, which
means I can't remove nodes as I iterate through them in the main loop.

I naturally started coding with (2), but couldn't decide on the best data
structure for python. A set seemed ideal for speedy removal, but then I
can't iterate through them with out popping. An ordered list? Some
creative solution with numpy arrays?

There is also the complication that since we are in interpreted python,
what is theoretically the best data structure may not in reality be
optimal unless it is a default system object or coded externally in a
compiled module.

Of course, I will start experimenting to see what the execution
difference is, but I would appreciate some suggestions from others re
which is best and also on best data structure for (2).

I'm not a newbie, so you can get technical with me python-wise and
algorithm wise. I realise it is a 'basic' question, but it is something
that I have always wondered about (cheap ifs versus extra structure) and
with the number of nodes I am considering, it actually becomes an issue.

Many Thanks,
Suresh
Jul 25 '08 #1
17 1457
On Jul 25, 7:57*pm, Suresh Pillai <stochash...@ya hoo.cawrote:
The nodes in my network may be ON or OFF. *The network starts off with
all nodes in the OFF state. *I loop through the nodes. *For each node
that is OFF, I consider some probability of it turning ON based on the
states of its neighbours. *I MUST GO THROUGH ALL NODES BEFORE DECIDING
WHICH ONES TO TURN ON.

So my question is whether it is faster to

1. loop through a list of ALL nodes and check for OFF nodes using ifs
I'd recommend using 'filter' and list comprehensions.
>>import random
class Node:
... def __init__(self):
... self.on = False
... def toggle(self):
... self.on = random.choice([True, False])
...
>>nodes = [Node() for i in range(0, 10000)]
for node in nodes:
... node.toggle()
...
>>off_nodes = filter(lambda x: not x.on, nodes)
len(off_nodes )
5050
Jul 25 '08 #2
I'd recommend using 'filter' and list comprehensions.

Look at using reduce(). You can collect information about all of the
nodes without necessarily building a large, intermediate list in the
process.

You might get some ideas from here [http://en.wikipedia.org/wiki/
Antiobjects].

Jul 25 '08 #3
On Jul 25, 10:57 am, Suresh Pillai <stochash...@ya hoo.cawrote:
I am performing simulations on networks (graphs). I have a question on
speed of execution (assuming very ample memory for now). I simplify the
details of my simulation below, as the question I ask applies more
generally than my specific case. I would greatly appreciate general
feedback in terms of computing and of course considerations specific to
implementation in Python.

The nodes in my network may be ON or OFF. The network starts off with
all nodes in the OFF state. I loop through the nodes. For each node
that is OFF, I consider some probability of it turning ON based on the
states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING
WHICH ONES TO TURN ON.

So my question is whether it is faster to

1. loop through a list of ALL nodes and check for OFF nodes using ifs

or to

2. loop through a container of OFF nodes and remove from this when they
turn ON
or 3. build a new list every iteration intead of deleting from the old
one:

while processing:
new_off_list = []
for x in off_list:
if goes_on(x):
on_list.append( x)
else:
new_off_list.ap pend(x)
off_list = new_off_list
generation += 1

Iain
Jul 25 '08 #4
On Jul 25, 1:46 pm, Iain King <iaink...@gmail .comwrote:
On Jul 25, 10:57 am, Suresh Pillai <stochash...@ya hoo.cawrote:
I am performing simulations on networks (graphs). I have a question on
speed of execution (assuming very ample memory for now). I simplify the
details of my simulation below, as the question I ask applies more
generally than my specific case. I would greatly appreciate general
feedback in terms of computing and of course considerations specific to
implementation in Python.
The nodes in my network may be ON or OFF. The network starts off with
all nodes in the OFF state. I loop through the nodes. For each node
that is OFF, I consider some probability of it turning ON based on the
states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING
WHICH ONES TO TURN ON.
So my question is whether it is faster to
1. loop through a list of ALL nodes and check for OFF nodes using ifs
or to
2. loop through a container of OFF nodes and remove from this when they
turn ON

or 3. build a new list every iteration intead of deleting from the old
one:

while processing:
new_off_list = []
for x in off_list:
if goes_on(x):
on_list.append( x)
else:
new_off_list.ap pend(x)
off_list = new_off_list
generation += 1

Iain
I was curious to what extent the different methods varied in time, so
I checked it out. Here there are three procedures: test_every which
matches your (1), destructive which matches your (2), and constructive
which is (3) as I've outlined above.

On varying the size of the dataset I get this (probability a node goes
on = 50%):

Length of initial list: 100000
Test every: 1.16085492357
Destructive: 2.592310272
Constructive: 0.850312458886

Length of initial list: 200000
Test every: 2.48013843287
Destructive: 9.20894689718
Constructive: 1.73562198439

Length of initial list: 400000
Test every: 5.00652267447
Destructive: 44.9696004134
Constructive: 3.51687329373

Length of initial list: 800000
Test every: 9.67657648655
Destructive: 220.57583941
Constructive: 7.06614485537
and changing the probability that a nodes goes on (dataset size =
200000):
Probability goes on: 1/2
Test every: 2.24765364513
Destructive: 9.28801971614
Constructive: 1.62770773816

Probability goes on: 1/4
Test every: 4.77387350904
Destructive: 13.4432467571
Constructive: 3.45467140006

Probability goes on: 1/8
Test every: 11.0514899721
Destructive: 18.4026878278
Constructive: 6.86778036177

Probability goes on: 1/16
Test every: 22.5896021593
Destructive: 25.7784044083
Constructive: 13.8631404605

Probability goes on: 1/32
Test every: 49.7667941179
Destructive: 39.3652502735
Constructive: 27.2527219598

Probability goes on: 1/64
Test every: 91.0523955153
Destructive: 65.7747103963
Constructive: 54.4087322936

Code:

import random
from timeit import Timer

SIZE = 100000
MAX = 2

def goes_on(x):
global MAX
return random.randint( 1,MAX) == 1

def test_every():
global SIZE
print "Test every:",
nodes = range(SIZE)
is_on = [False for x in xrange(SIZE)]
count = SIZE
while count:
for i,x in enumerate(nodes ):
if not is_on[i] and goes_on(x):
is_on[i] = True
count -= 1

def destructive():
global SIZE
print "Destructiv e:",
off_list = range(SIZE)
on_list = []
count = SIZE
while count:
for i in xrange(len(off_ list)-1, -1, -1):
x = off_list[i]
if goes_on(x):
on_list.append( x)
del(off_list[i])
count -= 1

def constructive():
global SIZE
print "Constructive:" ,
off_list = range(SIZE)
on_list = []
count = SIZE
while count:
new_off_list = []
for x in off_list:
if goes_on(x):
on_list.append( x)
count -= 1
else:
new_off_list.ap pend(x)
off_list = new_off_list

#SIZE = 200000
while True:
print "Length of initial list:", SIZE
#print "Probabilit y goes on: 1/%d" % MAX
print Timer("test_eve ry()", "from __main__ import
test_every").ti meit(1)
print Timer("destruct ive()", "from __main__ import
destructive").t imeit(1)
print Timer("construc tive()", "from __main__ import
constructive"). timeit(1)
print
SIZE *= 2
#MAX *= 2

Conclusions:

On size, (2) really doesn't like bigger datasets, taking exponentially
longer as it increases, while (1) and (3) happily increase linearly.
(3) is faster.

On probability it's (1) who's the loser, while (2) and (3) are happy.
(3) is once again faster.

I think (2)'s poor performance is being amplified by how python
handles lists and list deletions; the effect may be stymied in other
languages, or by using other data constructs in python (like a
dictionary or a user made list class). If you were short on memory
then (2) would have an advantage, but as it is, (3) is the clear
winner.
I'm a fan of list comprehensions, and it feels like they could be nice
here, but since we are making two lists at once here I don't see how
to... anyone see how to use them (or 'map' if you want to be old
school)?

Iain
Jul 25 '08 #5
On Jul 25, 9:54*pm, Jeff <jeffo...@gmail .comwrote:
Look at using reduce(). *You can collect information about all of the
nodes without necessarily building a large, intermediate list in the
process.
From the OP's description, I assumed there'd be a list of all nodes,
from which he wishes to derive a 2nd list of specific nodes. reduce()
applies "a function of two arguments cumulatively to the items of a
sequence, from left to right, so as to reduce the sequence to a single
value", which doesn't seem to me to be what the OP was asking for. I
could understand using map() across the filter'd list, or embedding
the conditional check within the map'd function and ignoring filter(),
but at no point does the OP ask to perform any kind of function based
on two nodes...

I may have misunderstand your point, though :) Could you provide a
quick code sample to clarify?
Jul 25 '08 #6
That's a good comparison for the general question I posed. Thanks.
Although I do believe lists are less than ideal here and a different data
structure should be used.

To be more specific to my case:
As mentioned in my original post, I also have the specific condition that
one does not know which nodes to turn ON until after all the
probabilities are calculated (lets say we take the top m for example).
In this case, the second and third will perform worse as the second one
will require a remove from the list after the fact and the third will
require another loop through the nodes to build the new list.
Jul 25 '08 #7
Iain King wrote:
I think (2)'s poor performance is being amplified by how python
handles lists and list deletions; the effect may be stymied in other
languages
Delete is O(n) (or "O(n/2) on average", if you prefer), while append is
amortized O(1).

Unless I'm missing something, your example keeps going until it's
flagged *all* nodes as "on", which, obviously, kills performance for the
first version as the probability goes down. The OP's question was about
a single pass (but he did mention "as the simulation progresses", so I
guess it's fair to test a complete simulation.)

Btw, if the nodes can be enumerated, I'd probably do something like:

node_list = ... get list of nodes ...
random.shuffle( node_list)

start = 0
end = len(node_list)
step = end / MAX

while start < end:

for i in xrange(start, start + step):
... switch on node_list[i] ...

... do whatever you want to do after a step ...

# prepare for next simulation step
start += step
step = max((len(node_l ist) - start) / MAX, 1)

which is near O(n) overall, and mostly constant wrt. the probability for
each pass (where the probability is 1:MAX).

Might need some tuning; tweak as necessary.

</F>

Jul 25 '08 #8
On Fri, 25 Jul 2008 16:51:42 +0200, Fredrik Lundh wrote:
Unless I'm missing something, your example keeps going until it's
flagged *all* nodes as "on", which, obviously, kills performance for the
first version as the probability goes down. The OP's question was about
a single pass (but he did mention "as the simulation progresses", so I
guess it's fair to test a complete simulation.)
I was referring to multiple passes as in Iain' test cases. Although not
necessarily till all nodes are ON, let's say to to a large proportion at
least.
Jul 25 '08 #9
On Jul 25, 3:39 pm, Suresh Pillai <stochash...@ya hoo.cawrote:
That's a good comparison for the general question I posed. Thanks.
Although I do believe lists are less than ideal here and a different data
structure should be used.

To be more specific to my case:
As mentioned in my original post, I also have the specific condition that
one does not know which nodes to turn ON until after all the
probabilities are calculated (lets say we take the top m for example).
In this case, the second and third will perform worse as the second one
will require a remove from the list after the fact and the third will
require another loop through the nodes to build the new list.
So you need to loops through twice regardless? i.e. loop once to
gather data on off nodes, do some calculation to work out what to turn
on, then loop again to turn on the relevant nodes? If so, then I
think the functions above remain the same, becoming the 2nd loop.
Every iteration you do a first loop over the off_nodes (or them all
for (1)) to gather the data on them, perform your calculation, and
then perform one of the above functions (minus the setup code at the
begining; basically starting at the 'for') as a second loop, with the
goes_on function now returning a value based on the calculation
(rather than the calculation itself as I had it). Performance should
be similar.

Iain
Jul 25 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
1396
by: linvin333 | last post by:
Hi, Does function overloading affect the speed of execution of the program ? If so what is the reason ? How does it compare with a speed of a program using non-overloded functions ? Linny
6
2537
by: | last post by:
Assume that we have a complex application with many math operations and it is written in an ANSI C++ code and running on a single PC without any problem. Is there an automatic way to execute the same application on a cluster of PCs to speed-up the execution? (How?) Or does it has to be re-written in some special way to be executed on the...
3
2909
by: jdph40 | last post by:
In Access 2002, I designed a simple database for our Safety department to enter results of a survey. There are 41 true/false statements. I have a main form called frmSurvey with a subform called sbfrmAnswers. I put an option group (optAnswers) on the subform with buttons for true or false. To speed entry of the results of the 350+ surveys...
38
2528
by: vashwath | last post by:
Might be off topic but I don't know where to post this question.Hope some body clears my doubt. The coding standard of the project which I am working on say's not to use malloc.When I asked my lead(I have just started working) he said we should not use dynamic allocation in real time systems, the code will not run in predictable time...
14
9568
by: Michel Esber | last post by:
Linux RH 4.0 running DB2 V8 FP 11. I have a table with ~ 11M rows and running DELETE statements is really slow. Deleting 1k rows takes more than 3 minutes. If I run select statements on the same table, I usually fetch rows in a reasonable time. The table has the following description: MACHINE_ID VARCHAR (24)
40
2946
by: kavi | last post by:
Hello friends, Could any one tell the way of calculating the speed of c program execution?
1
2518
by: Kelie | last post by:
hello, would there be any speed increase in code execution after python code being compiled into exe file with py2exe? thanks, kelie
25
4996
by: Umesh | last post by:
i want to calculate the time required to execute a program. Also i want to calcute the time remaining for the execution of the program. how can i do that? pl mention some good websites for learning advanced C.thx.
1
1359
by: =?Utf-8?B?TWFyayBT?= | last post by:
I have an application that consists of a managed C++ wrapper around an unmanaged C++ "engine" that performs a very processor intensive task. In the application I create two instances of the managed wrapper (and therefore of the unmanaged engine) on separate threads so that it can be working on two scenarios at the same time. The engine...
0
7512
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7438
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7707
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7951
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7803
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
5082
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3495
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3475
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1926
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.