By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,480 Members | 1,146 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,480 IT Pros & Developers. It's quick & easy.

please help with optimisation of this code - update of given table according to another table

P: n/a
Hi I need your help...
I am implementing the method that updates given table (table is
represented as list of lists of strings) according to other table (some
kind of merging)...

This method takes following arguments:
t1 - table we would like to update
t2 - table we would like to take data
from
keyColumns - list of key indexes e.g. [0,1]
columnsToBeUpdated - list of column indexes we would like to update in
our table T1 e.g [2,4]

Let's say we have a table T1:

A B C D E
---------------
1 4 5 7 7
3 4 0 0 0

and we call a method mergeTable(T1, T2, [0,1], [2,4])

It means that we would like to update columns C and E of table T1 with
data from table T2 but only in case the key columns A and B are equal
in both tables.... I grant that the given key is unique in both tables
so if I find a row with the same key in table T2 I do merging, stop and
go to next row in table T1...

Let's say T2 looks following:

A B C D E
---------------
2 2 8 8 8
1 4 9 9 9

So after execution of our mergeTable method, the table T1 should look
like :

A B C D E
1 4 9 7 9
3 4 0 0 0

The 2nd row ['3', '4', '0' ,'0', '0'] didn't change because there was
no row in table T2 with key = 3 ,4

The main part of my algorithm now looks something like ...

merge(t1, t2, keyColumns, columnsToBeUpdated)

........

for row_t1 in t1:
for row_t2 in t2:
if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
# the keys are the same
for colName in columnsToBeUpdated:
row_t1[colName] = row_t2[colName]

# go outside the inner loop - we found a row with
# the same key in the table
break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(

I will be grateful for any ideas
Thanks in advance!

Nov 8 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
In <11*********************@b28g2000cwb.googlegroups. com>, Farraige wrote:
Let's say we have a table T1:

A B C D E
---------------
1 4 5 7 7
3 4 0 0 0

and we call a method mergeTable(T1, T2, [0,1], [2,4])

It means that we would like to update columns C and E of table T1 with
data from table T2 but only in case the key columns A and B are equal
in both tables.... I grant that the given key is unique in both tables
so if I find a row with the same key in table T2 I do merging, stop and
go to next row in table T1...

Let's say T2 looks following:

A B C D E
---------------
2 2 8 8 8
1 4 9 9 9

So after execution of our mergeTable method, the table T1 should look
like :

A B C D E
1 4 9 7 9
3 4 0 0 0

The 2nd row ['3', '4', '0' ,'0', '0'] didn't change because there was
no row in table T2 with key = 3 ,4

The main part of my algorithm now looks something like ...

merge(t1, t2, keyColumns, columnsToBeUpdated)

.......

for row_t1 in t1:
for row_t2 in t2:
if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
# the keys are the same
for colName in columnsToBeUpdated:
row_t1[colName] = row_t2[colName]

# go outside the inner loop - we found a row with
# the same key in the table
break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(
Just go through the first table once and build a mapping key->row and then
go through the second table once and look for each row if the key is in
the mapping. If yes: update columns. This runs in O(2*rows) instead if
O(rows**2).

def update_table(table_a, table_b, key_columns, columns_to_be_updated):
def get_key(row):
return tuple(row[x] for x in key_columns)

key2row = dict((get_key(row), row) for row in table_a)
for row in table_b:
row_to_be_updated = key2row.get(get_key(row))
if row_to_be_updated is not None:
for column in columns_to_be_updated:
row_to_be_updated[column] = row[column]
def main():
table_a = [[1, 4, 5, 7, 7],
[3, 4, 0, 0, 0]]
table_b = [[2, 2, 8, 8, 8],
[1, 4, 9, 9, 9]]
update_table(table_a, table_b, (0, 1), (2, 4))
for row in table_a:
print row

Ciao,
Marc 'BlackJack' Rintsch
Nov 8 '06 #2

P: n/a
On 2006-11-08, Farraige <fa******@go2.plwrote:
>
...

The main part of my algorithm now looks something like ...

merge(t1, t2, keyColumns, columnsToBeUpdated)

.......

for row_t1 in t1:
for row_t2 in t2:
if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
# the keys are the same
for colName in columnsToBeUpdated:
row_t1[colName] = row_t2[colName]

# go outside the inner loop - we found a row with
# the same key in the table
break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(

I will be grateful for any ideas
One idea would be to precompute the list comprehensions in the if test.

p2 = [[row_t2[i] for i in keyColums] for row_t2 in t2]
for row_t1 in t1:
proj1 = [row_t1[i] for i in keyColumns]
for row_t1, proj2 in izip(t2, p2):
if proj1 == proj2:
...
--
Antoon Pardon
Nov 8 '06 #3

P: n/a
At Wednesday 8/11/2006 07:18, Farraige wrote:
for row_t1 in t1:
for row_t2 in t2:
if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
# the keys are the same
for colName in columnsToBeUpdated:
row_t1[colName] = row_t2[colName]

# go outside the inner loop - we found a row with
# the same key in the table
break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(
def getkey(row):
return tuple([row[i] for i in keyColumns])

Sort both tables using .sort(key=getkey); then traverse both in
paralell like in classic merge; in 1 pass you get the job done (not
counting the two initial sorts)
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 8 '06 #4

P: n/a
Thank you all for your very valuable clues!
Thanks to you I got nearly 97% perfomance improvement !!!!!!!!!!!!!!! I
can't believe it :)))
Once again thank you

Best wishes
Farraige

Nov 8 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.