By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,907 Members | 1,932 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,907 IT Pros & Developers. It's quick & easy.

File objects? - under the hood question

P: n/a

I didn't come across any illuminating discussion via Google, thus my question here (though it may be a neophyte question.) I am interested in the workings under the hood of Python's access of "files".

What is actually happening at the various stages when I create a file object and "read" it?

(1) >>> f = file("C:/GuidosParrot.txt","r")

(2) >>> hesjustsleeping = f.read()

At (1) have I loaded the file from hard drive into RAM when I create the file object? What does this object know and how did it find it out?

At (2) am I loading the file contents into RAM, or just assigning what is already in RAM to a variable?

Where is the work split between the OS and Python? I assume the OS is responsible for "presenting" the file to Python, so perhaps the OS assembles this file from the blocks on disk and loads it into RAM at the time the file object is created? Or would the OS simply have pointers that can assemble the file, and pass those pointers to Python?

Perhaps I've answered my question and the under-the-hood mechanics are handled on the OS side, and Python is just making requests of the OS...
My brain-teaser: What I'd like to do is read the last ~2K of a large number of large files on arbitrary servers across the net, without having to read each file from the beginning (which would be slow and resource inefficient)...


Eric Pederson
http://www.songzilla.blogspot.com
:::::::::::::::::::::::::::::::::::
domainNot="@something.com"
domainIs=domainNot.replace("s","z")
ePrefix="".join([chr(ord(x)+1) for x in "do"])
mailMeAt=ePrefix+domainIs
:::::::::::::::::::::::::::::::::

Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On Tue, 18 Jan 2005 22:53:10 -0800, Eric Pederson wrote:
Perhaps I've answered my question and the under-the-hood mechanics are
handled on the OS side, and Python is just making requests of the OS...
Almost by definition, the only correct way to read a file is to use the
file system, which on current operating systems means going the the OS
kernel.
My brain-teaser: What I'd like to do is read the last ~2K of a large
number of large files on arbitrary servers across the net, without having
to read each file from the beginning (which would be slow and resource
inefficient)...


"Across the net" is not specific enough. There are generally ways to do
that, subject to the support of the various relevant servers, but what
protocol are you talking about using? HTTP? FTP? NFS? Generally you issue
some sort of command to get the length, then some sort of "seek" or
"continuation" command to get to the end, but the details are different
for each protocol. (Unless someone has a convenient flattening library
around? I have something almost like that in my personal collection except
I have no intention of supporting "seek" behavior for various technical
reasons.)

And note that with the possible exception of that last one, there is no
relationship between these two questions. (Maybe you know that, maybe you
don't. :-) ) There is no argument you can pass to file() that will read an
HTTP file. (Pedants may note this isn't an absolute truth but it's "true
enough" that it's not worth sweating the details if you're still working
out what "file()" does.)
Jul 18 '05 #2

P: n/a
Jeremy responds:
[kind enough not to mention I must have had only 10% of my brain cells functioning when I posted]
And note that with the possible exception of that last one, there is no
relationship between these two questions.

Right, I just want there to be.

There is no argument you can pass to file() that will read
an
HTTP file.

A file is a file, no matter where it resides; yes I know it's not that simple.

Here the sort of thing (seek, then read) I think I want:
IDV2=open(("http://musicsite.com/song453.mp3","rb")[:-128]) song453.tags=IDV2.read() len(song453.tags)


128
But it's not a Python problem. :-(
Thanks for the responses and indulgence.
I'm OK now - the repair man fixed the coffee pot.

Eric Pederson
http://www.songzilla.blogspot.com

Jul 18 '05 #3

P: n/a
On Thu, 20 Jan 2005 21:06:31 -0800, Eric Pederson wrote:
Here the sort of thing (seek, then read) I think I want:
IDV2=open(("http://musicsite.com/song453.mp3","rb")[:-128]) song453.tags=IDV2.read() len(song453.tags)
128
But it's not a Python problem. :-(


OK, HTTP. It's true that it isn't a Python problem, but the fact that this
is possible and even easy isn't generally known, since it involves
actually understanding HTTP as more than a protocol that says "give me
this file". :-{

You need to use the Range header in HTTP to request just the end. urllib
doesn't seem to like the Range header (it interprets the 206 response that
results as an error, at least in 2.3.4 which I'm using here, which I would
consider a bug; 2xx responses are "success"), but you can still do it with
httplib:

Python 2.3.4 (#1, Oct 26 2004, 20:13:42)
[GCC 3.4.2 (Gentoo Linux 3.4.2-r2, ssp-3.4.1-1, pie-8.7.6.5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import httplib
connection = httplib.HTTPConnection("www.jerf.org")
connection.request("GET", "/", headers = {"Range": "bytes=-100"})
response = connection.getresponse()
response.read()

'rch Google -->\r\n\t\t\t\t\t</DIV>\r\n\t\t\t\t\t<P CLASS="Seperator" />&nbsp;</P>\r\n\t\t\t<div>\r\n\t\t</body>\r\n\t</html>\r\n'

The bad news is I think you would have to chase redirects and such on your
own. Hopefully a urllib expert will pop in and show how to quickly tell
urllib to chill out when it gets a 206; I'm pretty sure it's easy, but I
can't quite rattle off how. Or maybe 2.4 has a better urllib.
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.