please help with optimisation of this code - update of given table according to another table

Farraige

Hi I need your help...
I am implementing the method that updates given table (table is
represented as list of lists of strings) according to other table (some
kind of merging)...

This method takes following arguments:
t1 - table we would like to update
t2 - table we would like to take data
from
keyColumns - list of key indexes e.g. [0,1]
columnsToBeUpdated - list of column indexes we would like to update in
our table T1 e.g [2,4]

Let's say we have a table T1:

A B C D E
---------------
1 4 5 7 7
3 4 0 0 0

and we call a method mergeTable(T1, T2, [0,1], [2,4])

It means that we would like to update columns C and E of table T1 with
data from table T2 but only in case the key columns A and B are equal
in both tables.... I grant that the given key is unique in both tables
so if I find a row with the same key in table T2 I do merging, stop and
go to next row in table T1...

Let's say T2 looks following:

A B C D E
---------------
2 2 8 8 8
1 4 9 9 9

So after execution of our mergeTable method, the table T1 should look
like :

A B C D E
1 4 9 7 9
3 4 0 0 0

The 2nd row ['3', '4', '0' ,'0', '0'] didn't change because there was
no row in table T2 with key = 3 ,4

The main part of my algorithm now looks something like ...

merge(t1, t2, keyColumns, columnsToBeUpdated)

........

for row_t1 in t1:
for row_t2 in t2:
if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
# the keys are the same
for colName in columnsToBeUpdated:
row_t1[colName] = row_t2[colName]

# go outside the inner loop - we found a row with
# the same key in the table
break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(

I will be grateful for any ideas
Thanks in advance!

Nov 8 '06 #1

Subscribe Reply

1859

Marc 'BlackJack' Rintsch

In <11*********************@b28g2000cwb.googlegroups. com>, Farraige wrote:

Let's say we have a table T1:

A B C D E
---------------
1 4 5 7 7
3 4 0 0 0

and we call a method mergeTable(T1, T2, [0,1], [2,4])

It means that we would like to update columns C and E of table T1 with
data from table T2 but only in case the key columns A and B are equal
in both tables.... I grant that the given key is unique in both tables
so if I find a row with the same key in table T2 I do merging, stop and
go to next row in table T1...

Let's say T2 looks following:

A B C D E
---------------
2 2 8 8 8
1 4 9 9 9

So after execution of our mergeTable method, the table T1 should look
like :

A B C D E
1 4 9 7 9
3 4 0 0 0

The 2nd row ['3', '4', '0' ,'0', '0'] didn't change because there was
no row in table T2 with key = 3 ,4

The main part of my algorithm now looks something like ...

merge(t1, t2, keyColumns, columnsToBeUpdated)

.......

for row_t1 in t1:
for row_t2 in t2:
if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
# the keys are the same
for colName in columnsToBeUpdated:
row_t1[colName] = row_t2[colName]

# go outside the inner loop - we found a row with
# the same key in the table
break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(

Just go through the first table once and build a mapping key->row and then
go through the second table once and look for each row if the key is in
the mapping. If yes: update columns. This runs in O(2*rows) instead if
O(rows**2).

def update_table(table_a, table_b, key_columns, columns_to_be_updated):
def get_key(row):
return tuple(row[x] for x in key_columns)

key2row = dict((get_key(row), row) for row in table_a)
for row in table_b:
row_to_be_updated = key2row.get(get_key(row))
if row_to_be_updated is not None:
for column in columns_to_be_updated:
row_to_be_updated[column] = row[column]
def main():
table_a = [[1, 4, 5, 7, 7],
[3, 4, 0, 0, 0]]
table_b = [[2, 2, 8, 8, 8],
[1, 4, 9, 9, 9]]
update_table(table_a, table_b, (0, 1), (2, 4))
for row in table_a:
print row

Ciao,
Marc 'BlackJack' Rintsch

Nov 8 '06 #2

Antoon Pardon

On 2006-11-08, Farraige <fa******@go2.plwrote:

>
...

The main part of my algorithm now looks something like ...

merge(t1, t2, keyColumns, columnsToBeUpdated)

.......

for row_t1 in t1:
for row_t2 in t2:
if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
# the keys are the same
for colName in columnsToBeUpdated:
row_t1[colName] = row_t2[colName]

# go outside the inner loop - we found a row with
# the same key in the table
break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(

I will be grateful for any ideas

One idea would be to precompute the list comprehensions in the if test.

p2 = [[row_t2[i] for i in keyColums] for row_t2 in t2]
for row_t1 in t1:
proj1 = [row_t1[i] for i in keyColumns]
for row_t1, proj2 in izip(t2, p2):
if proj1 == proj2:
...
--
Antoon Pardon

Nov 8 '06 #3

Gabriel Genellina

At Wednesday 8/11/2006 07:18, Farraige wrote:

for row_t1 in t1:
for row_t2 in t2:
if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
# the keys are the same
for colName in columnsToBeUpdated:
row_t1[colName] = row_t2[colName]

# go outside the inner loop - we found a row with
# the same key in the table
break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(

def getkey(row):
return tuple([row[i] for i in keyColumns])

Sort both tables using .sort(key=getkey); then traverse both in
paralell like in classic merge; in 1 pass you get the job done (not
counting the two initial sorts)
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

Nov 8 '06 #4

Farraige

Thank you all for your very valuable clues!
Thanks to you I got nearly 97% perfomance improvement !!!!!!!!!!!!!!! I
can't believe it :)))
Once again thank you

Best wishes
Farraige

Nov 8 '06 #5

Similar topics

1971

Concurrency issues with "Tree" structures.

by: Robin Tucker | last post by:

Hi, I'm currently implementing a database with a tree structure in a table. The nodes in the tree are stored as records with a column called "Parent". The root of the tree has a "NULL" parent....

Microsoft SQL Server

9584

Starting University COSC and learning JAVA, advice please :D

by: David Van D | last post by:

Hi there, A few weeks until I begin my journey towards a degree in Computer Science at Canterbury University in New Zealand, Anyway the course tutors are going to be teaching us JAVA wth bluej...

Java

3282

Its very very urgent Please....Could you please help.plz help in solving queries

by: jenipriya | last post by:

Hi all... its very urgent.. please........i m a beginner in oracle.... Anyone please help me wit dese codes i hv tried... and correct the errors... The table structures i hav Employee (EmpID,...

Oracle Database

6277

Please Help - Undefined index & Invalid argument supplied for foreach()

by: mattehz | last post by:

Hey there, I am trying to upload old source files and came across these errors: Warning: Invalid argument supplied for foreach() in /home/mattehz/public_html/acssr/trunk/inc_html.php on line 59...

PHP

7207

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7093

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7291

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7012

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7468

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5023

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

3180

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

1522

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

748

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP