473,326 Members | 2,196 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

help in debugging file.seek, file.read

I have a wierd sort of problem.
I'm writing a bunch of sets to a file (each element is a fixed length
string).

I was originally using the built-in sets type but due to a processing
issue, I had to shift to a Python implementation (see
http://groups.google.com/group/comp....70083be9a67f6).
I'm using Raymond Hettinger's very graciously provided TransactionSet
recipe from http://aspn.activestate.com/ASPN/Coo.../Recipe/511496

My old code was pretty speedy but the new code is quite slow and
according to some hotshot profiles it has nothing to do with the new
TransactionSet. *Some* of the file.seek and file.read calls
occasionally block for insane amounts (>10s) of time when reading no
more than 45 bytes of data from the file. So when I'm running load-
tests, this eventually happens like so:

Each i_id statement is the time taken for 100 successive commits.
Each RTH (Read Table Header) statement is the time taken for a single
read call for 45 bytes of data
# sz__TABLE_HEADER_FORMAT__ is 45
hdr = os.read(f.fileno(), sz__TABLE_HEADER_FORMAT__)
#hdr = f.read(sz__TABLE_HEADER_FORMAT__) # Tried this as well

Loading items...
i_id: 0 0.000111103057861
i_id: 100 1.01557397842
i_id: 200 1.14013886452
i_id: 300 1.16142892838
i_id: 400 1.16356801987
i_id: 500 1.36410307884
i_id: 600 1.34421014786
i_id: 700 1.30385017395
i_id: 800 1.48079919815
i_id: 900 1.41147589684
RTH: 0.582525968552
RTH: 2.77490496635
i_id: 1000 5.16863512993
i_id: 1100 1.73725795746
i_id: 1200 1.56621193886
i_id: 1300 1.81338000298
i_id: 1400 1.69464302063
i_id: 1500 1.74725604057
i_id: 1600 2.35222291946
i_id: 1700 1.85096788406
i_id: 1800 2.20518493652
i_id: 1900 1.94831299782
i_id: 2000 2.03350806236
i_id: 2100 2.32529306412
i_id: 2200 2.44498205185
RTH: 0.105868816376
i_id: 2300 3.65522289276
i_id: 2400 4.2119910717
i_id: 2500 4.21354198456
RTH: 0.115046024323
RTH: 0.122591972351
RTH: 2.88115119934
RTH: 10.5908679962
i_id: 2600 18.8498170376
i_id: 2700 2.42577004433
i_id: 2800 2.47392010689
i_id: 2900 2.88293218613

So I have no idea why this is happening (it is also happening with
seek operations).
Any guidance on how to debug this?

thanks,
Prateek

Apr 29 '07 #1
2 1508
Sorry, I forgot to mention - the RTH line only prints when the time
taken is 0.1 second (so that I don't pollute the output with other
calls that complete normally)

Apr 29 '07 #2
On Apr 30, 3:20 am, Prateek <sure...@gmail.comwrote:
Sorry, I forgot to mention - the RTH line only prints when the time
taken is 0.1 second (so that I don't pollute the output with other
calls that complete normally)
I have some more information on this problem. Turns out the issue is
with buffer syncing.
I was already doing a flush operation on the file after every commit
(which flushes the user buffers).

I added an os.fsync call which fixed the erratic behavior. But the
code is still horribly slow... 122s vs around 100s (without the
fsync).
Since fsync flushes the kernel buffers, I'm assuming this has
something to do with my O/S (Mac OS 10.4.9 Intel - Mac Book Pro
2.33Ghz Core 2 Duo, 2GB RAM).

Can anybody help?

Apr 30 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Grumfish | last post by:
Is there a way to see the next character of an input file object without advancing the position in the file?
5
by: simon place | last post by:
is the code below meant to produce rubbish?, i had expected an exception. f=file('readme.txt','w') f.write(' ') f.read() ( PythonWin 2.3 (#46, Jul 29 2003, 18:54:32) on win32. ) I got...
1
by: wtnt | last post by:
Hello. I've searched all over and haven't seen another thread with this problem. Please bear with me as I try to explain. thanks. :) I have some programs that need to be cross-platform...
6
by: Jamal | last post by:
I am working on binary files of struct ACTIONS I have a recursive qsort/mergesort hybrid that 1) i'm not a 100% sure works correctly 2) would like to convert to iteration Any comments or...
6
by: Neil Patel | last post by:
I have a log file that puts the most recent record at the bottom of the file. Each line is delimited by a \r\n Does anyone know how to seek to the end of the file and start reading backwards?
0
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen....
2
by: DAVE P | last post by:
below is code snippet what im trying to do is open a file and Read Bytes into a buffer find end of line (like crlf) in this case "~" The buffer will be the bytes up to the "~" then i need to...
19
by: Lee Crabtree | last post by:
Is there a class in the framework that allows me read text from a file in an unbuffered manner? That is, I'd like to be able to read lines in the same manner as StreamReader.ReadLine(), but I also...
4
by: MikeJ | last post by:
make a While loop ofs = TextFileServer("somefile") string srow while (ofs=false) { srow=ofs.getRow(); Console.Writeline(srow); }
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.