By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,985 Members | 1,575 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,985 IT Pros & Developers. It's quick & easy.

help in debugging file.seek, file.read

P: n/a
I have a wierd sort of problem.
I'm writing a bunch of sets to a file (each element is a fixed length
string).

I was originally using the built-in sets type but due to a processing
issue, I had to shift to a Python implementation (see
http://groups.google.com/group/comp....70083be9a67f6).
I'm using Raymond Hettinger's very graciously provided TransactionSet
recipe from http://aspn.activestate.com/ASPN/Coo.../Recipe/511496

My old code was pretty speedy but the new code is quite slow and
according to some hotshot profiles it has nothing to do with the new
TransactionSet. *Some* of the file.seek and file.read calls
occasionally block for insane amounts (>10s) of time when reading no
more than 45 bytes of data from the file. So when I'm running load-
tests, this eventually happens like so:

Each i_id statement is the time taken for 100 successive commits.
Each RTH (Read Table Header) statement is the time taken for a single
read call for 45 bytes of data
# sz__TABLE_HEADER_FORMAT__ is 45
hdr = os.read(f.fileno(), sz__TABLE_HEADER_FORMAT__)
#hdr = f.read(sz__TABLE_HEADER_FORMAT__) # Tried this as well

Loading items...
i_id: 0 0.000111103057861
i_id: 100 1.01557397842
i_id: 200 1.14013886452
i_id: 300 1.16142892838
i_id: 400 1.16356801987
i_id: 500 1.36410307884
i_id: 600 1.34421014786
i_id: 700 1.30385017395
i_id: 800 1.48079919815
i_id: 900 1.41147589684
RTH: 0.582525968552
RTH: 2.77490496635
i_id: 1000 5.16863512993
i_id: 1100 1.73725795746
i_id: 1200 1.56621193886
i_id: 1300 1.81338000298
i_id: 1400 1.69464302063
i_id: 1500 1.74725604057
i_id: 1600 2.35222291946
i_id: 1700 1.85096788406
i_id: 1800 2.20518493652
i_id: 1900 1.94831299782
i_id: 2000 2.03350806236
i_id: 2100 2.32529306412
i_id: 2200 2.44498205185
RTH: 0.105868816376
i_id: 2300 3.65522289276
i_id: 2400 4.2119910717
i_id: 2500 4.21354198456
RTH: 0.115046024323
RTH: 0.122591972351
RTH: 2.88115119934
RTH: 10.5908679962
i_id: 2600 18.8498170376
i_id: 2700 2.42577004433
i_id: 2800 2.47392010689
i_id: 2900 2.88293218613

So I have no idea why this is happening (it is also happening with
seek operations).
Any guidance on how to debug this?

thanks,
Prateek

Apr 29 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Sorry, I forgot to mention - the RTH line only prints when the time
taken is 0.1 second (so that I don't pollute the output with other
calls that complete normally)

Apr 29 '07 #2

P: n/a
On Apr 30, 3:20 am, Prateek <sure...@gmail.comwrote:
Sorry, I forgot to mention - the RTH line only prints when the time
taken is 0.1 second (so that I don't pollute the output with other
calls that complete normally)
I have some more information on this problem. Turns out the issue is
with buffer syncing.
I was already doing a flush operation on the file after every commit
(which flushes the user buffers).

I added an os.fsync call which fixed the erratic behavior. But the
code is still horribly slow... 122s vs around 100s (without the
fsync).
Since fsync flushes the kernel buffers, I'm assuming this has
something to do with my O/S (Mac OS 10.4.9 Intel - Mac Book Pro
2.33Ghz Core 2 Duo, 2GB RAM).

Can anybody help?

Apr 30 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.