Hello there :) ,
I am a python newbie and need to run following code for a task in an
external simulation programm called "Abaqus" which makes use of python
to access the mesh (ensamble of nodes with xy coordinates) of a
certain geometrical model.
[IN is the starting input containing the nodes to be check, there are
some double nodes with the same x and y coordinates which need to be
removed. SN is the output containing such double nodes]
Code: Select all
for i in range(len(IN)): #scan all elements of the list IN
for j in range(len(IN)):
if i <j:
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
Unfortunately my len(IN) is about 100.000 and the running time about
15h !!!! :(
Any idea to improve it?
I have already tried to group the "if statements" in a single one:
Code: Select all
if i <j and if IN[i].coordinates[0] == IN[j].coordinates[0] and
if IN[i].coordinates[1] == IN[j].coordinates[1]:
but no improvements.
Many thanks, Alex 26 1531
AlexUnfortunately my len(IN) is about 100.000 and the running time
Alexabout 15h !!!! :(
AlexAny idea to improve it?
numpy? http://numpy.scipy.org/ http://www.scipy.org/Numpy_Example_List
More immediately, note that you are building a list of len(IN) ints every
time through the inner loop. A quick hit might be this simple change:
indexes = range(len(IN))
for i in indexes: #scan all elements of the list IN
for j in indexes:
if i != j:
if (IN[i].coordinates[0] == IN[j].coordinates[0] and
IN[i].coordinates[1] == IN[j].coordinates[1]):
SN.append(IN[i].label)
Skip
Alexzive wrote:
Hello there :) ,
I am a python newbie and need to run following code for a task in an
external simulation programm called "Abaqus" which makes use of python
to access the mesh (ensamble of nodes with xy coordinates) of a
certain geometrical model.
[IN is the starting input containing the nodes to be check, there are
some double nodes with the same x and y coordinates which need to be
removed. SN is the output containing such double nodes]
Code: Select all
for i in range(len(IN)): #scan all elements of the list IN
for j in range(len(IN)):
if i <j:
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
Unfortunately my len(IN) is about 100.000 and the running time about
15h !!!! :(
Any idea to improve it?
I have already tried to group the "if statements" in a single one:
Code: Select all
if i <j and if IN[i].coordinates[0] == IN[j].coordinates[0] and
if IN[i].coordinates[1] == IN[j].coordinates[1]:
but no improvements.
Many thanks, Alex
When you're looking for duplicates an efficient solution is likely to be
based on a set or dict object.
# untested
from collections import defaultdict
groups = defaultdict(list)
for item in IN:
c = item.coordinates
groups[c[0], c[1]].append(item.label)
SN = []
for labels in groups.itervalues():
if len(labels) 1:
SN.extend(labels) # or labels[1:] if you want to keep one item
Peter
Code: Select all
for i in range(len(IN)): #scan all elements of the list IN
for j in range(len(IN)):
if i <j:
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
Unfortunately my len(IN) is about 100.000 and the running time about
15h !!!! :(
Any idea to improve it?
[snip]
I have already tried to group the "if statements" in a single one:
Code: Select all
if i <j and if IN[i].coordinates[0] == IN[j].coordinates[0] and
if IN[i].coordinates[1] == IN[j].coordinates[1]:
but no improvements.
It's like rearranging deckchairs on the Titanic :) Yes, it may
give a speed up, but what's 3 seconds when you're waiting 15hr :)
Not knowing the len(IN[x].coordinates) or their structure, if
it's a list of len==2, you should be able to just do
if i <j and IN[i].coordinates == IN[j].coordinates
or
if i <j and IN[i].coordinates[:2] == IN[j].coordinates[:2]
However, again, this is just polish. The big problem is that you
have an O(N^2) algorithm that's killing you.
1) use xrange instead of range to save eating memory with a huge
unneeded array.
2) unless you need to append duplicate labels, you know that when
I and J are swapped, you'll reach the same condition again, so it
might be worth writing the outer loops to eliminate this
scenario, and in the process, but just starting at i+1 rather
than i, you can forgo the check if "i<>j".
Such changes might look something like
for i in xrange(len(IN)):
for j in xrange(i+1, len(IN)):
if IN[i].coordinates == IN[j].coordinates:
SN.append(IN[i].label)
If my college algorithms memory serves me sufficiently, this
reduces your O(N^2) to O(N log N) which will garner you some
decent time savings.
tkc
Skip:
indexes = range(len(IN))
for i in indexes: #scan all elements of the list IN
for j in indexes:
Nope, use xrange in both situations, and save a list.
Tim Chase:
for i in xrange(len(IN)):
for j in xrange(i+1, len(IN)):
if IN[i].coordinates == IN[j].coordinates:
SN.append(IN[i].label)
If my college algorithms memory serves me sufficiently, this
reduces your O(N^2) to O(N log N) which will garner you some
decent time savings.
That's O(n^2) still, it's just half matrix, a triangle.
Bye,
bearophile
On Thu, 18 Sep 2008 05:25:02 0700, Alexzive wrote:
Hello there :) ,
I am a python newbie and need to run following code for a task in an
external simulation programm called "Abaqus" which makes use of python
to access the mesh (ensamble of nodes with xy coordinates) of a certain
geometrical model.
[IN is the starting input containing the nodes to be check, there are
some double nodes with the same x and y coordinates which need to be
removed. SN is the output containing such double nodes]
Code: Select all
for i in range(len(IN)): #scan all elements of the list IN
for j in range(len(IN)):
if i <j:
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
Here's a better version of your algorithm, one which avoids the minor
inefficiencies but keeps the huge inefficiency:
for node1 in IN:
for node2 in IN:
if node1 is not node2:
if node1.coordinates == node2.coordinates:
SN.append(node1.label)
This assumes that node.coordinates is a type where equality is defined.
If they are a tuple or list, that should work fine.
But the huge inefficiency is that you are checking each node not once,
not twice, but 100,000 times! So you have to iterate 10,000,000,000
times, which is going to be slow no matter what you do. Especially in
pure Python.
Here's a better idea: iterate over the list once only:
seen = set()
for node in IN:
coords = tuple(node.coordinates)
if coords in seen:
SN.append(node.label)
else:
seen.add(coords)
Hope this helps.

Steven
Tim Chase:
> for i in xrange(len(IN)): for j in xrange(i+1, len(IN)): if IN[i].coordinates == IN[j].coordinates: SN.append(IN[i].label)
If my college algorithms memory serves me sufficiently, this reduces your O(N^2) to O(N log N) which will garner you some decent time savings.
That's O(n^2) still, it's just half matrix, a triangle.
Ah, good catch as I'm thinking about it more, you're right...it's
O((N^2)/2) which is just O(N^2). Sigh. I'm not awake enough
yet. However, dividing the time by 2 (from 15hr to 7.5hr) is
still a significant savings in my book :)
However, if listcomprehensions are faster as well, you might be
able to do something like
SN = [d.label
for (i,d) in enumerate(IN)
for j in xrange(i+1, len(IN))
if d.coordinates == IN[j].coordinates
]
or the slightly more verbose (but closer to my original code) version
SN = [IN[i].label
for i in xrange(len(IN))
for j in xrange(i+1, len(IN))
if IN[i].coordinates == IN[j].coordinates
]
To the OP: As always, throw some timing code on the various
samples you get back from the list and see what works for you.
tkc
On Thu, 20080918 at 07:57 0500, Tim Chase wrote:
Code: Select all
for i in range(len(IN)): #scan all elements of the list IN
for j in range(len(IN)):
if i <j:
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
Such changes might look something like
for i in xrange(len(IN)):
for j in xrange(i+1, len(IN)):
if IN[i].coordinates == IN[j].coordinates:
SN.append(IN[i].label)
If you aren't checking j values less than i, you might want to do both
SN.append(IN[i].label)
SN.append(IN[j].label)
on the same pass.
If my college algorithms memory serves me sufficiently, this
reduces your O(N^2) to O(N log N) which will garner you some
decent time savings.
tkc
 http://mail.python.org/mailman/listinfo/pythonlist
On Sep 18, 8:25 am, Alexzive <zasaconsult...@gmail.comwrote:
Hello there :) ,
I am a python newbie and need to run following code for a task in an
external simulation programm called "Abaqus" which makes use of python
to access the mesh (ensamble of nodes with xy coordinates) of a
certain geometrical model.
[IN is the starting input containing the nodes to be check, there are
some double nodes with the same x and y coordinates which need to be
removed. SN is the output containing such double nodes]
Code: Select all
for i in range(len(IN)): #scan all elements of the list IN
for j in range(len(IN)):
if i <j:
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
Unfortunately my len(IN) is about 100.000 and the running time about
15h !!!! :(
Any idea to improve it?
I have already tried to group the "if statements" in a single one:
Code: Select all
if i <j and if IN[i].coordinates[0] == IN[j].coordinates[0] and
if IN[i].coordinates[1] == IN[j].coordinates[1]:
but no improvements.
Many thanks, Alex
dup=set()
SN=[]
for item in IN:
c=item.coordinates[0], item.coordinates[1]
if c in dup:
SN.append(item.label)
else:
dup.add(c)
psyco might help a fair bit (10x40x) here ;>
perhaps look at dumping the data into sqlite then pulling it back out.
It (or the other databases) are designed for tossing around large lumps
of data.
Alexzive wrote:
Hello there :) ,
I am a python newbie and need to run following code for a task in an
external simulation programm called "Abaqus" which makes use of python
to access the mesh (ensamble of nodes with xy coordinates) of a
certain geometrical model.
[IN is the starting input containing the nodes to be check, there are
some double nodes with the same x and y coordinates which need to be
removed. SN is the output containing such double nodes]
Code: Select all
for i in range(len(IN)): #scan all elements of the list IN
for j in range(len(IN)):
if i <j:
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
Unfortunately my len(IN) is about 100.000 and the running time about
15h !!!! :(
Any idea to improve it?
I have already tried to group the "if statements" in a single one:
Code: Select all
if i <j and if IN[i].coordinates[0] == IN[j].coordinates[0] and
if IN[i].coordinates[1] == IN[j].coordinates[1]:
but no improvements.
Many thanks, Alex
 http://mail.python.org/mailman/listinfo/pythonlist

Vapour Forge
Jake Anderson
Project Manager
Mobile: 0412 897 125
Email: ja**@vapourforge.com
Web Page: www.vapourforge.com
Your source for custom IT services
On Sep 18, 11:18*am, prueba...@latinmail.com wrote:
dup=set()
SN=[]
for item in IN:
* *c=item.coordinates[0], item.coordinates[1]
* *if c in dup:
* * * SN.append(item.label)
* *else:
* * * dup.add(c)
+1 for O(N)
If item.coordinates is just an (x, y) pair, you can skip building c
and save a little memory:
seen_coords = set()
for node in IN:
if node.coordinates in seen_coords:
SN.append(node.label)
else:
seen_coords.add(node.coordinates)
seen_coords gets populated with references to the existing
node.coordinates objects, instead of new tuples.
Geoff GT
On Thu, 18 Sep 2008 Alexzive wrote:
>I am a python newbie and need to run following code for a task in an external simulation programm called "Abaqus" which makes use of python to access the mesh (ensamble of nodes with xy coordinates) of a certain geometrical model.
[IN is the starting input containing the nodes to be check, there are some double nodes with the same x and y coordinates which need to be removed. SN is the output containing such double nodes]
Code: Select all
for i in range(len(IN)): #scan all elements of the list IN
for j in range(len(IN)):
if i <j:
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
I did not test the syntax, but here is an idea with sorted lists.
It should be O(NlogN).
def sk(x):
return x.coordinates[0]
IN.sort(key=sk)
for i in xrange(len(IN)):
for j in xrange(i+1, len(IN)):
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
else:
break
Harald
On Sep 18, 2:25*pm, Alexzive <zasaconsult...@gmail.comwrote:
Hello there :) ,
I am a python newbie and need to run following code for a task in an
external simulation programm called "Abaqus" which makes use of python
to access the mesh (ensamble of nodes with xy coordinates) of a
certain geometrical model.
[IN is the starting input containing the nodes to be check, there are
some double nodes with the same x and y coordinates which need to be
removed. SN is the output containing such double nodes]
Code: Select all
* * for i in range(len(IN)): #scan all elements of the list IN
* * * for j in range(len(IN)):
* * * * if i <j:
* * * * *if IN[i].coordinates[0] == IN[j].coordinates[0]:
* * * * * *if IN[i].coordinates[1] == IN[j].coordinates[1]:
* * * * * * * SN.append(IN[i].label)
Unfortunately my len(IN) is about 100.000 and the running time about
15h !!!! :(
Any idea to improve it?
I have already tried to group the "if statements" in a single one:
Code: Select all
* * if i <j and if IN[i].coordinates[0] == IN[j].coordinates[0]and
if IN[i].coordinates[1] == IN[j].coordinates[1]:
A simple O(N) algorithm:
from collections import defaultdict
d = defaultdict(int)
for x in IN:
d[x] += 1
SN = [x for (x, c) in d.iteritems() if c 1]

Paul Hankin
On Sep 18, 2:25*pm, Alexzive <zasaconsult...@gmail.comwrote:
Hello there :) ,
I am a python newbie and need to run following code for a task in an
external simulation programm called "Abaqus" which makes use of python
to access the mesh (ensamble of nodes with xy coordinates) of a
certain geometrical model.
[IN is the starting input containing the nodes to be check, there are
some double nodes with the same x and y coordinates which need to be
removed. SN is the output containing such double nodes]
Code: Select all
* * for i in range(len(IN)): #scan all elements of the list IN
* * * for j in range(len(IN)):
* * * * if i <j:
* * * * *if IN[i].coordinates[0] == IN[j].coordinates[0]:
* * * * * *if IN[i].coordinates[1] == IN[j].coordinates[1]:
* * * * * * * SN.append(IN[i].label)
Using only a little extra storage to compute IN (but O(N log N)
time complexity):
import itertools
IN.sort()
SN = [k for k, v in itertools.groupby(IN) if len(list(v)) 1]

Paul Hankin
Bruno Desthuilliers:
def doubles9():
...
SN = []
sn_append = SN.append
Few more:
import psyco
def doubles10():
dup = set()
SN = []
for item in IN:
c = item.coordinates
if c in dup:
SN.append(item)
else:
dup.add(c)
return SN
psyco.bind(doubles10)
from collections import deque
def doubles11():
dup = set()
dup_add = dup.add
SN = deque()
sn_append = SN.append
for item in IN:
c = item.coordinates
if c in dup:
sn_append(item)
else:
dup_add(c)
return SN
def doubles12():
dup = set()
SN = deque()
for item in IN:
c = item.coordinates
if c in dup:
SN.append(item)
else:
dup.add(c)
return SN
psyco.bind(doubles12)
Timings:
doubles00 : 0.522365288653
doubles01 : 0.247219812198
doubles02 : 0.237889823898
doubles03 : 0.238638921389
doubles04 : 0.23821698217
doubles05 : 0.177042495425
doubles06 : 0.13166199162
doubles08 : 0.00569725197252
doubles09 : 0.00418566685667
doubles10 : 0.00192086920869
doubles11 : 0.00403324533245
doubles12 : 0.00184026840268
Hopefully that's fast enough :)
Bye,
bearophile
On Thu, 18 Sep 2008 20:43:00 +0200, Bruno Desthuilliers wrote:
Now the obvious winner is pruebono  even unoptimized, using sets seems
to be *way* faster than even the most optimized corrected version of
your algorithm.
I'm not being snarky about losing priority here, but I submitted
essentially the same solution two hours earlier than pruebono.
Are people not seeing my posts? Have I been killfiled by everyone except
Mensator? I also asked a question about HTTPError and I haven't seen any
responses at all.

Steven
bearophile:
doubles12 : 0.00184026840268
For fun, and to have an almost baseline reference, I have done a
little benchmark in D too:
import std.stdio: writefln;
// Using Tango std lib you don't need all this
version (Win32) {
import std.c.windows.windows;
double clock() {
long t;
QueryPerformanceCounter(&t);
return cast(double)t / queryPerformanceFrequency;
}
long queryPerformanceFrequency;
static this() {
QueryPerformanceFrequency(&queryPerformanceFrequen cy);
}
}
union N {
struct { int x, y; }
long xy;
}
auto IN = [
N(4, 9), N(5, 0), N(6, 6), N(7, 2), N(3, 6), N(9, 6), N(0, 1), N(1,
6),
N(0, 5), N(1, 2), N(8, 9), N(5, 4), N(1, 6), N(7, 6), N(9, 1), N(7,
6),
N(0, 1), N(7, 4), N(7, 4), N(8, 4), N(8, 4), N(3, 5), N(9, 6), N(6,
1),
N(3, 4), N(4, 5), N(0, 5), N(6, 3), N(2, 4), N(1, 6), N(9, 5), N(1,
2),
N(5, 8), N(8, 5), N(3, 1), N(9, 4), N(9, 4), N(3, 3), N(4, 8), N(9,
7),
N(8, 4), N(6, 2), N(1, 5), N(5, 8), N(8, 6), N(0, 8), N(5, 2), N(3,
4),
N(0, 5), N(4, 4), N(2, 9), N(7, 7), N(1, 0), N(4, 2), N(5, 7), N(0,
4),
N(2, 5), N(0, 8), N(7, 3), N(9, 1), N(0, 4), N(5, 0), N(4, 9), N(0,
6),
N(3, 0), N(3, 0), N(3, 9), N(8, 3), N(7, 9), N(8, 5), N(7, 6), N(1,
5),
N(0, 6), N(5, 9), N(6, 8), N(0, 0), N(4, 1), N(3, 3), N(5, 4), N(5,
3),
N(6, 1), N(5, 4), N(4, 5), N(5, 8), N(4, 1), N(3, 6), N(1, 9), N(0,
5),
N(6, 5), N(5, 5), N(6, 0), N(0, 9), N(2, 6), N(0, 7), N(5, 9), N(7,
3),
N(7, 9), N(5, 4), N(4, 9), N(2, 9)
];
N[] doubles13() {
size_t[N] dup; // used as set
N[] SN;
foreach (item; IN) {
if (item in dup)
SN ~= item;
else
dup[item] = 0;
}
return SN;
}
N[] doubles0() {
N[] SN;
for (int i; i < IN.length; i++)
for (int j; j < IN.length; j++)
if (i != j)
if (IN[i] == IN[j])
SN ~= IN[i];
return SN;
}
void test_results() {
size_t[N] set1, set2;
foreach (n; doubles0())
set1[n] = 0;
foreach (n; doubles13())
set2[n] = 0;
if (set1.keys.sort != set2.keys.sort)
throw new Error("ERROR");
}
void main() {
int n = 150_000;
test_results();
int count = n;
auto t0 = clock();
while (count)
doubles13();
auto t1 = clock();
writefln("%0.10f", cast(double)(t1  t0) / n);
}
doubles13() needs 0.0000270.000037 seconds (*), about 6075 times
faster than doubles12, this means about 34 seconds instead of 15h (on
the original computer).
Using C++ with GCC (using a <hash_mapfor dup and a <vectorfor SN)
you can probably go 1040% faster still :)
(*) Probably there's such variability of timings because the current
DMD compiler I have used doesn't add "align" instructions in the asm
it produces as GCC does. With modern CPUs very sensitive to code
alignment this leads to even 2030% runtime differences in small tight
loops.
Bye,
bearophile
Steven D'Aprano wrote:
On Thu, 18 Sep 2008 20:43:00 +0200, Bruno Desthuilliers wrote:
>Now the obvious winner is pruebono  even unoptimized, using sets seems to be *way* faster than even the most optimized corrected version of your algorithm.
I'm not being snarky about losing priority here, but I submitted
essentially the same solution two hours earlier than pruebono.
Are people not seeing my posts? Have I been killfiled by everyone except
Mensator? I also asked a question about HTTPError and I haven't seen any
responses at all.
Yes, after figuring out what to do from the original post, I saw yours
and then Pruebono's and decided that since two people had submitted the
jackpot algorithm, I need not say more. I will say this: this solution
amounts to finding equivalence classes (the sets of items with a given
'key') and then finding the classes (sets) with more than one member.
Defaultdict is great for this.
tjr
En Fri, 19 Sep 2008 02:11:51 0300, Tino Wildenhain <ti**@wildenhain.de>
escribió:
Also I never saw a list where the threading often goes wrong like
this here  is there any special setup or is it just peoples MUA
which screws up?
Perhaps it's due to the newsgroup/list duality...

Gabriel Genellina
Bruno Desthuilliers wrote:
Alexzive a écrit :
>Hello there :) ,
I am a python newbie and need to run following code for a task in an external simulation programm called "Abaqus" which makes use of python to access the mesh (ensamble of nodes with xy coordinates) of a certain geometrical model.
[IN is the starting input containing the nodes to be check, there are some double nodes with the same x and y coordinates which need to be removed. SN is the output containing such double nodes]
Code: Select all for i in range(len(IN)): #scan all elements of the list IN for j in range(len(IN)): if i <j: if IN[i].coordinates[0] == IN[j].coordinates[0]: if IN[i].coordinates[1] == IN[j].coordinates[1]: SN.append(IN[i].label) Unfortunately my len(IN) is about 100.000 and the running time about 15h !!!! :(
Any idea to improve it?
A couple ones have been submitted. Harald gets a point about the
redundant tests (even if his solution seems to be broken, cf below) 
your inner loop should have looked like :
for j in xrange(i+1, len(IN))
Now the obvious winner is pruebono  even unoptimized, using sets seems
to be *way* faster than even the most optimized corrected version of
your algorithm.
Here's a quick bench  please everyone doublecheck it to make sure it's ok:
<snip code>
Results here (py2.5, gentoo linux, athlonxp1800, 512 ram):
>>test_results()
True
>>test_times()
doubles0 : 1.55667901039
doubles1 : 0.719144105911
doubles2 : 0.703393936157
doubles3 : 0.700654983521
doubles4 : 0.706257104874
doubles5 : 0.528184890747
doubles6 : 0.461633205414
doubles8 : 0.0134379863739
doubles9 : 0.0108540058136
>>>
Not surprisingly, half less iterations makes for half less time.
Aliasing, as often, proves to be a good optimization too. But obviously,
using the correct data structure / algorithm combo is the key : simpler
code, and 115 times faster (143 times with aliasing). If pruebono
solution's is correct (and it as AFAICT), your 15 hours computation
should by now take less than 10 minutes...
Ubuntu 8.04 core2 2.6(i think)
without psycho
doubles0 : 0.610555171967
doubles1 : 0.29314494133
doubles2 : 0.286273956299
doubles3 : 0.281984090805
doubles4 : 0.28240609169
doubles5 : 0.207377910614
doubles6 : 0.156388044357
doubles8 : 0.00533080101013
doubles9 : 0.00458884239197
with psycho
doubles0 : 0.127684116364
doubles1 : 0.069571018219
doubles2 : 0.064826965332
doubles3 : 0.0702300071716
doubles4 : 0.0647261142731
doubles5 : 0.0522589683533
doubles6 : 0.0437579154968
doubles8 : 0.00190806388855
doubles9 : 0.00214099884033
On this small test its a variance between ~6x to 2X still its basically
free so why not ;>
On Sep 18, 7:42*pm, Steven D'Aprano <st...@REMOVETHIS
cybersource.com.auwrote:
I'm not being snarky about losing priority here, but I submitted
essentially the same solution two hours earlier than pruebono.
My apologies (seriosuly). In this case, I think it might just be
haste. For what it's worth, I saw your post (on Google Groups), but I
skipped over it. You wrote two solutions, one slow and one fast (the
latter being the same as pruebono's). You put the slow one at the
top, I saw
for ...
for ...
and went straight to the next message without reading the better
solution. I knew that there was only one for loop necessary, so I
didn't bother reading on. Actually, I missed pruebono's post, too,
until after I figured it out myself (but before I posted).
That several people came up with the nigh exact same solution, modulo
variable names only, says something about the Zen of Python.
Geoff GT
On Thu, 18 Sep 2008 Bruno Desthuilliers wrote:
># Harald : uncomment this and run test_results. As far as I can tell, it # doesn't yields the expected results
## IN7 = IN[:] ## def sortk7(n): ## return n.coordinates[0]
## def doubles7(): ## "is ordering better ?  Nope Sir, it's broken..." ## IN7.sort(key=sortk) ## SN = [] ## sn_append = SN.append ## in_len = len(IN) ## for i in xrange(in_len): ## node_i = IN[i] ## coords_i = node_i.coordinates ## for j in xrange(i+1, in_len): ## if coords_i[0] == IN[j].coordinates[0]: ## if coords_i[1] == IN[j].coordinates[1]: ## sn_append(node_i) ## else: ## break ## return SN
....
>Results here (py2.5, gentoo linux, athlonxp1800, 512 ram):
>test_results()
True
>test_times()
doubles0 : 1.55667901039 doubles1 : 0.719144105911 doubles2 : 0.703393936157 doubles3 : 0.700654983521 doubles4 : 0.706257104874 doubles5 : 0.528184890747 doubles6 : 0.461633205414 doubles8 : 0.0134379863739 doubles9 : 0.0108540058136
When you change my code then do it right. :)
You forgot to change the IN to IN7 at _every_ place.
And the sortk should be sortk7 in _both_ places.
I never let the code run before myself. I just wrote it
in the newsreader. But now i did and I have a second
version as bonus.
IN7 = IN[:]
def sortk7(n):
return n.coordinates[0], n.coordinates[1]
def doubles7():
IN7.sort(key=sortk7)
SN = []
sn_append = SN.append
in_len = len(IN7)
for i in xrange(in_len):
node_i = IN7[i]
coords_i = node_i.coordinates
for j in xrange(i+1, in_len):
if coords_i[0] == IN7[j].coordinates[0]:
if coords_i[1] == IN7[j].coordinates[1]:
sn_append(node_i)
else:
break
return SN
def comp7( x, y ):
return cmp( x.coordinates, y.coordinates )
def doubles7a():
IN7.sort( comp7 )
SN = []
sn_append = SN.append
in_len = len(IN7)
for i in xrange(in_len):
node_i = IN7[i]
for j in xrange(i+1, in_len):
node_j = IN7[j]
if comp7( node_i, node_j ) == 0:
sn_append(node_i)
else:
break
return SN
Here are the results. (py2.5, WindowsXP, Pentium4, 2.6GHz, 1.5GB):
My version is not so bad.
doubles0 : 1.03830598582
doubles1 : 0.47943719104
doubles2 : 0.487412506338
doubles3 : 0.475924733451
doubles4 : 0.466548681466
doubles5 : 0.340487967046
doubles6 : 0.278480365521
doubles7 : 0.0953190978183
doubles7a : 0.0784233750379
doubles8 : 0.010236496538
doubles9 : 0.00742803903848
Harald
On Sep 18, 7:42 pm, Steven D'Aprano <st...@REMOVETHIS
cybersource.com.auwrote:
On Thu, 18 Sep 2008 20:43:00 +0200, Bruno Desthuilliers wrote:
Now the obvious winner is pruebono  even unoptimized, using sets seems
to be *way* faster than even the most optimized corrected version of
your algorithm.
I'm not being snarky about losing priority here, but I submitted
essentially the same solution two hours earlier than pruebono.
Are people not seeing my posts? Have I been killfiled by everyone except
Mensator? I also asked a question about HTTPError and I haven't seen any
responses at all.

Steven
I see your post now, but I didn't notice it at the time I posted.
Could be because I am using the noticeable bad Google groups interface
that is known for being unreliable. Duplicate solutions on Usenet are
almost a given and I consider duplicate solutions a good thing, it
means that other people will be able to understand that code.
In any case I am not here for glory I am posting under a pseudonym so
nobody discovers that I am slacking off at work reading Usen[carrier
lost]
Gabriel Genellina wrote:
En Fri, 19 Sep 2008 02:11:51 0300, Tino Wildenhain <ti**@wildenhain.de>
escribió:
>Also I never saw a list where the threading often goes wrong like this here  is there any special setup or is it just peoples MUA which screws up?
Perhaps it's due to the newsgroup/list duality...
Actually, the situation is a c.l.p <==pythonlist <==>
gmane.c.p.general triality.
On Sat, 20 Sep 2008 19:01:42 +0200, Bruno Desthuilliers wrote:
Once again, sorry
if me missing your correct answer drives you paranoid :)
What do you mean by that? How many other people have been talking about
me?
*wink*

Steven
On Sep 20, 9:20*pm, Steven D'Aprano <st...@REMOVETHIS
cybersource.com.auwrote:
On Sat, 20 Sep 2008 19:01:42 +0200, Bruno Desthuilliers wrote:
Once again, sorry
if me missing your correct answer drives you paranoid :)
What do you mean by that? How many other people have been talking about
me?
*wink*

Steven
Why, no fewer than usual!
*wink*
km wrote:
how abt this ?
N = len(IN)
for k in range(N):
for j in range(N):
if j >= k: # or k <= j
doSomething()
This has the root problem that the "if" statement is evaluated
N*N times, which is ugly/slow O(N^2) behavior. My solution
managed to reduce it by a constant multiplier, but several folks
proposed a more elegant O(N) solution which was leaps & bounds
faster.
tkc This discussion thread is closed Replies have been disabled for this discussion. Similar topics
8 posts
views
Thread by Hans Georg Krauthaeuser 
last post: by

6 posts
views
Thread by ad 
last post: by

29 posts
views
Thread by Olaf Baeyens 
last post: by

2 posts
views
Thread by Irfan Akram 
last post: by

9 posts
views
Thread by Vjay77 
last post: by

5 posts
views
Thread by adhag 
last post: by

11 posts
views
Thread by perspolis 
last post: by

30 posts
views
Thread by Alf P. Steinbach 
last post: by

10 posts
views
Thread by Bernhard Reinhardt 
last post: by

3 posts
views
Thread by Jeff 
last post: by
          