473,378 Members | 1,175 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

pytables - best practices / mem leaks

I have an H5 file with one group (off the root) and two large main
tables and I'm attempting to aggragate my data into 50+ new groups (off
the root) with two tables per sub group.

sys info:
PyTables version: 1.3.2
HDF5 version: 1.6.5
numarray version: 1.5.0
Zlib version: 1.2.3
BZIP2 version: 1.0.3 (15-Feb-2005)
Python version: 2.4.2 (#1, Jul 13 2006, 20:16:08)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)]
Platform: darwin-Power Macintosh (v10.4.7)
Byte-ordering: big

Ran all pytables tests included with package and recieved an OK.
Using the following code I get one of three errors:

1. Illegal Instruction

2. Malloc(): trying to call free() twice

3. Bus Error

I believe all three stem from the same issue, involving a malloc()
memory problem in the pytable c libraries. I also believe this may be
due to how I'm attempting to write my sorting script.

The script executes fine and all goes well until I'm sorting about
group 20 to 30 and I throw one of the three above errors depending on
how/when I'm flush() close() the file. When I open the file after the
error using h5ls all tables are in perfact order up to the crash and if
I continue from the point every thing runs fine until python throws the
same error again after another 10 sorts or so. The somewhat random
crashing is what leads me to believe I have a memory leak or my method
of doing this is incorrect.

Is there a better way to aggragate data using pytables/python? Is there
a better way to be doing this? This seems strait forward enough.

Thanks,
Conor

#function to agg state data from main neg/pos tables into neg/pos state
tables

import string
import tables
def aggstate(state, h5file):

print state

class PosRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)

class NegRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)

group1 = h5file.createGroup("/", state+"_raw_records", state+" raw
records")

table1 = h5file.createTable(group1, "pos_records", PosRecords, state+"
raw pos record table")
table2 = h5file.createTable(group1, "neg_records", NegRecords, state+"
raw neg record table")

table = h5file.root.raw_records.pos_records
point = table1.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']

point.append()

h5file.flush()

table = h5file.root.raw_records.neg_records
point = table2.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']

point.append()
h5file.flush()

states =
['AL','AK','AZ','AR','CA','CO','CT','DC','DE','FL', 'GA','HI','ID','IL','IN','IA','KS','KY','LA','ME', 'MD','MA','MI','MN','MS','MO','MT','NE','NV','NH', 'NJ','NM','NY','NC','ND','OH','OK','OR','PA','RI', 'SC','SD','TN','TX','UT','VT','VA','WA','WV','WI', 'WY']

h5file = tables.openFile("200309_data.h5", mode = 'a')

for i in xrange(len(states)):
aggstate(states[i], h5file)

h5file.close()

Jul 17 '06 #1
1 2345

py_genetic wrote:
I have an H5 file with one group (off the root) and two large main
tables and I'm attempting to aggragate my data into 50+ new groups (off
the root) with two tables per sub group.

sys info:
PyTables version: 1.3.2
HDF5 version: 1.6.5
numarray version: 1.5.0
Zlib version: 1.2.3
BZIP2 version: 1.0.3 (15-Feb-2005)
Python version: 2.4.2 (#1, Jul 13 2006, 20:16:08)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)]
Platform: darwin-Power Macintosh (v10.4.7)
Byte-ordering: big

Ran all pytables tests included with package and recieved an OK.
Using the following code I get one of three errors:

1. Illegal Instruction

2. Malloc(): trying to call free() twice

3. Bus Error

I believe all three stem from the same issue, involving a malloc()
memory problem in the pytable c libraries. I also believe this may be
due to how I'm attempting to write my sorting script.

The script executes fine and all goes well until I'm sorting about
group 20 to 30 and I throw one of the three above errors depending on
how/when I'm flush() close() the file. When I open the file after the
error using h5ls all tables are in perfact order up to the crash and if
I continue from the point every thing runs fine until python throws the
same error again after another 10 sorts or so. The somewhat random
crashing is what leads me to believe I have a memory leak or my method
of doing this is incorrect.

Is there a better way to aggragate data using pytables/python? Is there
a better way to be doing this? This seems strait forward enough.

Thanks,
Conor

#function to agg state data from main neg/pos tables into neg/pos state
tables

import string
import tables
def aggstate(state, h5file):

print state

class PosRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)

class NegRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)

group1 = h5file.createGroup("/", state+"_raw_records", state+" raw
records")

table1 = h5file.createTable(group1, "pos_records", PosRecords, state+"
raw pos record table")
table2 = h5file.createTable(group1, "neg_records", NegRecords, state+"
raw neg record table")

table = h5file.root.raw_records.pos_records
point = table1.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']

point.append()

h5file.flush()

table = h5file.root.raw_records.neg_records
point = table2.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']

point.append()
h5file.flush()

states =
['AL','AK','AZ','AR','CA','CO','CT','DC','DE','FL', 'GA','HI','ID','IL','IN','IA','KS','KY','LA','ME', 'MD','MA','MI','MN','MS','MO','MT','NE','NV','NH', 'NJ','NM','NY','NC','ND','OH','OK','OR','PA','RI', 'SC','SD','TN','TX','UT','VT','VA','WA','WV','WI', 'WY']

h5file = tables.openFile("200309_data.h5", mode = 'a')

for i in xrange(len(states)):
aggstate(states[i], h5file)

h5file.close()
The problem with my above posting is that h5file.flush() should be
table.flush() (flush the table not the whole object) although
h5file.flush() is an actual method I don't believe it correctly writes
to the tables, it causes all types of issues as time goes on and I
think overlaps .close() causing more issues. I also flushed the table1
and table2 after I created the new group and table1 and table2 each
iteration, things are stable now, pytables is great.

Jul 18 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Ajith Prasad | last post by:
I am trying to install the WinXP version of PyTables which requires hdf5 (from http://hdf.ncsa.uiuc.edu/HDF5/) as a pre-requisite. However, I am at a loss as to how to install hdf5 - I have...
0
by: benjamin.scott | last post by:
Hello, I am using an XP box and Python 2.3 (Enthought Edition). I am getting the same error with both of the .exe's listed on sourceforge: tables-1.0.win32-py2.3.exe...
136
by: Matt Kruse | last post by:
http://www.JavascriptToolbox.com/bestpractices/ I started writing this up as a guide for some people who were looking for general tips on how to do things the 'right way' with Javascript. Their...
10
by: jojobar | last post by:
Hello, I am trying to use vs.net 2005 to migrate a project originally in vs.net 2003. I started with creation of a "web site", and then created folders for each component of the site. I read...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.