On Jul 25, 1:46 pm, Iain King <iaink...@gmail.comwrote:

On Jul 25, 10:57 am, Suresh Pillai <stochash...@yahoo.cawrote:

I am performing simulations on networks (graphs). I have a question on

speed of execution (assuming very ample memory for now). I simplify the

details of my simulation below, as the question I ask applies more

generally than my specific case. I would greatly appreciate general

feedback in terms of computing and of course considerations specific to

implementation in Python.

The nodes in my network may be ON or OFF. The network starts off with

all nodes in the OFF state. I loop through the nodes. For each node

that is OFF, I consider some probability of it turning ON based on the

states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING

WHICH ONES TO TURN ON.

So my question is whether it is faster to

1. loop through a list of ALL nodes and check for OFF nodes using ifs

or to

2. loop through a container of OFF nodes and remove from this when they

turn ON

or 3. build a new list every iteration intead of deleting from the old

one:

while processing:

new_off_list = []

for x in off_list:

if goes_on(x):

on_list.append(x)

else:

new_off_list.append(x)

off_list = new_off_list

generation += 1

Iain

I was curious to what extent the different methods varied in time, so

I checked it out. Here there are three procedures: test_every which

matches your (1), destructive which matches your (2), and constructive

which is (3) as I've outlined above.

On varying the size of the dataset I get this (probability a node goes

on = 50%):

Length of initial list: 100000

Test every: 1.16085492357

Destructive: 2.592310272

Constructive: 0.850312458886

Length of initial list: 200000

Test every: 2.48013843287

Destructive: 9.20894689718

Constructive: 1.73562198439

Length of initial list: 400000

Test every: 5.00652267447

Destructive: 44.9696004134

Constructive: 3.51687329373

Length of initial list: 800000

Test every: 9.67657648655

Destructive: 220.57583941

Constructive: 7.06614485537

and changing the probability that a nodes goes on (dataset size =

200000):

Probability goes on: 1/2

Test every: 2.24765364513

Destructive: 9.28801971614

Constructive: 1.62770773816

Probability goes on: 1/4

Test every: 4.77387350904

Destructive: 13.4432467571

Constructive: 3.45467140006

Probability goes on: 1/8

Test every: 11.0514899721

Destructive: 18.4026878278

Constructive: 6.86778036177

Probability goes on: 1/16

Test every: 22.5896021593

Destructive: 25.7784044083

Constructive: 13.8631404605

Probability goes on: 1/32

Test every: 49.7667941179

Destructive: 39.3652502735

Constructive: 27.2527219598

Probability goes on: 1/64

Test every: 91.0523955153

Destructive: 65.7747103963

Constructive: 54.4087322936

Code:

import random

from timeit import Timer

SIZE = 100000

MAX = 2

def goes_on(x):

global MAX

return random.randint(1,MAX) == 1

def test_every():

global SIZE

print "Test every:",

nodes = range(SIZE)

is_on = [False for x in xrange(SIZE)]

count = SIZE

while count:

for i,x in enumerate(nodes):

if not is_on[i] and goes_on(x):

is_on[i] = True

count -= 1

def destructive():

global SIZE

print "Destructive:",

off_list = range(SIZE)

on_list = []

count = SIZE

while count:

for i in xrange(len(off_list)-1, -1, -1):

x = off_list[i]

if goes_on(x):

on_list.append(x)

del(off_list[i])

count -= 1

def constructive():

global SIZE

print "Constructive:",

off_list = range(SIZE)

on_list = []

count = SIZE

while count:

new_off_list = []

for x in off_list:

if goes_on(x):

on_list.append(x)

count -= 1

else:

new_off_list.append(x)

off_list = new_off_list

#SIZE = 200000

while True:

print "Length of initial list:", SIZE

#print "Probability goes on: 1/%d" % MAX

print Timer("test_every()", "from __main__ import

test_every").timeit(1)

print Timer("destructive()", "from __main__ import

destructive").timeit(1)

print Timer("constructive()", "from __main__ import

constructive").timeit(1)

print

SIZE *= 2

#MAX *= 2

Conclusions:

On size, (2) really doesn't like bigger datasets, taking exponentially

longer as it increases, while (1) and (3) happily increase linearly.

(3) is faster.

On probability it's (1) who's the loser, while (2) and (3) are happy.

(3) is once again faster.

I think (2)'s poor performance is being amplified by how python

handles lists and list deletions; the effect may be stymied in other

languages, or by using other data constructs in python (like a

dictionary or a user made list class). If you were short on memory

then (2) would have an advantage, but as it is, (3) is the clear

winner.

I'm a fan of list comprehensions, and it feels like they could be nice

here, but since we are making two lists at once here I don't see how

to... anyone see how to use them (or 'map' if you want to be old

school)?

Iain