By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,345 Members | 1,783 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,345 IT Pros & Developers. It's quick & easy.

Read a binary file and feed it to PyCURL

P: n/a
The problem: I am writing a file uploading utility in python that uses
the walk() function to parse a directory, finding any file under that
directory, and upload it to a remote server using the pyCURL curl
interface. The files are invariably binary files, and the upload
method is via an HTTP PUT to the system.

I also need to perform the reverse - I need to GET those files and
write them to disk.

The problem I am seeing is memory. Currently, I call
os.path.walk(dir), and then I call the upload function. The upload
function basically goes (the formatting got nuked when I pasted it):

f = open(filepath, "rb")
fs = os.path.getsize(filepath)

c = pycurl.Curl()
c.setopt(c.URL, target_url)
c.setopt(c.HTTPHEADER, ["User-Agent: Load Tool (PyCURL Load Tool)"])
c.setopt(c.PUT, 1)
c.setopt(c.READDATA, f)
c.setopt(c.INFILESIZE, int(fs))
c.setopt(c.NOSIGNAL, 1)
if verbose == 'true':
c.setopt(c.VERBOSE, 1)
c.body = StringIO()
c.setopt(c.WRITEFUNCTION, c.body.write)
try:
c.perform()
except:
import traceback
traceback.print_exc(file=sys.stderr)
sys.stderr.flush()
f.close()
c.close()
sys.stdout.write(".")
sys.stdout.flush()
This opens the file via open() - which reads the file into memory.
This of course, causes problems when the client machine only has 512
megs of ram and we're uploading a 2-3 gig file (barring the argument
against doing this via HTTP PUT).

Does anyone know a more efficient method to do this with? Please also
note I am measuring the metrics for each transaction sent too - so I
don't want to chunk and then upload, as I only get metrics for the
chunks.

The metrics measuring comes before the c.close() function:

speed_up = c.getinfo(c.SPEED_UPLOAD)
size_up = c.getinfo(c.SIZE_UPLOAD)
ttime = c.getinfo(c.TOTAL_TIME)
ctime = c.getinfo(c.CONNECT_TIME)
sttime = c.getinfo(c.STARTTRANSFER_TIME)

Does anyone have any thoughts?

Thank you

-jesse
Jul 18 '05 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.