467,219 Members | 1,073 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,219 developers. It's quick & easy.

help in debugging file.seek, file.read

I have a wierd sort of problem.
I'm writing a bunch of sets to a file (each element is a fixed length

I was originally using the built-in sets type but due to a processing
issue, I had to shift to a Python implementation (see
I'm using Raymond Hettinger's very graciously provided TransactionSet
recipe from http://aspn.activestate.com/ASPN/Coo.../Recipe/511496

My old code was pretty speedy but the new code is quite slow and
according to some hotshot profiles it has nothing to do with the new
TransactionSet. *Some* of the file.seek and file.read calls
occasionally block for insane amounts (>10s) of time when reading no
more than 45 bytes of data from the file. So when I'm running load-
tests, this eventually happens like so:

Each i_id statement is the time taken for 100 successive commits.
Each RTH (Read Table Header) statement is the time taken for a single
read call for 45 bytes of data
hdr = os.read(f.fileno(), sz__TABLE_HEADER_FORMAT__)
#hdr = f.read(sz__TABLE_HEADER_FORMAT__) # Tried this as well

Loading items...
i_id: 0 0.000111103057861
i_id: 100 1.01557397842
i_id: 200 1.14013886452
i_id: 300 1.16142892838
i_id: 400 1.16356801987
i_id: 500 1.36410307884
i_id: 600 1.34421014786
i_id: 700 1.30385017395
i_id: 800 1.48079919815
i_id: 900 1.41147589684
RTH: 0.582525968552
RTH: 2.77490496635
i_id: 1000 5.16863512993
i_id: 1100 1.73725795746
i_id: 1200 1.56621193886
i_id: 1300 1.81338000298
i_id: 1400 1.69464302063
i_id: 1500 1.74725604057
i_id: 1600 2.35222291946
i_id: 1700 1.85096788406
i_id: 1800 2.20518493652
i_id: 1900 1.94831299782
i_id: 2000 2.03350806236
i_id: 2100 2.32529306412
i_id: 2200 2.44498205185
RTH: 0.105868816376
i_id: 2300 3.65522289276
i_id: 2400 4.2119910717
i_id: 2500 4.21354198456
RTH: 0.115046024323
RTH: 0.122591972351
RTH: 2.88115119934
RTH: 10.5908679962
i_id: 2600 18.8498170376
i_id: 2700 2.42577004433
i_id: 2800 2.47392010689
i_id: 2900 2.88293218613

So I have no idea why this is happening (it is also happening with
seek operations).
Any guidance on how to debug this?


Apr 29 '07 #1
  • viewed: 1297
2 Replies
Sorry, I forgot to mention - the RTH line only prints when the time
taken is 0.1 second (so that I don't pollute the output with other
calls that complete normally)

Apr 29 '07 #2
On Apr 30, 3:20 am, Prateek <sure...@gmail.comwrote:
Sorry, I forgot to mention - the RTH line only prints when the time
taken is 0.1 second (so that I don't pollute the output with other
calls that complete normally)
I have some more information on this problem. Turns out the issue is
with buffer syncing.
I was already doing a flush operation on the file after every commit
(which flushes the user buffers).

I added an os.fsync call which fixed the erratic behavior. But the
code is still horribly slow... 122s vs around 100s (without the
Since fsync flushes the kernel buffers, I'm assuming this has
something to do with my O/S (Mac OS 10.4.9 Intel - Mac Book Pro
2.33Ghz Core 2 Duo, 2GB RAM).

Can anybody help?

Apr 30 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Grumfish | last post: by
5 posts views Thread by simon place | last post: by
6 posts views Thread by Neil Patel | last post: by
2 posts views Thread by DAVE P | last post: by
19 posts views Thread by Lee Crabtree | last post: by
4 posts views Thread by MikeJ | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.