473,406 Members | 2,387 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Download excel file from web?

Hi - experienced programmer but this is my first Python program.

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

http://www.mscibarra.com/webapp/inde...EIPerfRegional

Want to write python to download and save the file.

So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch

xlApp = Dispatch("Excel.Application")

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_dat a()

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close
Jul 28 '08 #1
15 9395
On Jul 28, 3:00*pm, "p...@well.com" <p...@well.comwrote:
Hi - experienced programmer but this is my first Python program.

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

http://www.mscibarra.com/webapp/inde...vel=0&scope=0&....

Want to write python to download and save the file.

So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch

xlApp = Dispatch("Excel.Application")

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_dat a()

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close
Woops hit Send when I wanted Preview. Looks like the html [quote] tag
doesn't work from groups.google.com (nice).

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

So, in theory, I'm retrieving my excel spreadsheet with

response = urllib2.urlopen()

Except what then do I do with this?

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

I use pdb to debug. This is interesting:

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

pat
Jul 28 '08 #2
pa**@well.com schrieb:[quote]
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program.

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...

Want to write python to download and save the file.

So far I've arrived at this:

># import pdb
import urllib2
from win32com.client import Dispatch

xlApp = Dispatch("Excel.Application")

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Woops hit Send when I wanted Preview. Looks like the html
tag
doesn't work from groups.google.com (nice).

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

So, in theory, I'm retrieving my excel spreadsheet with

response = urllib2.urlopen()

Except what then do I do with this?

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

I use pdb to debug. This is interesting:

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Diez
Jul 28 '08 #3
On Jul 28, 3:29*pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:[quote]
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:
# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_dat a()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. *It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
>
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). *Would be happy to learn if
that's the case (and if that gets the job done for me).

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

"""
This function returns a file-like object with two additional methods:
"""

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

"""
read( * [size])
* * *Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Diez
Just stumbled upon .read:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Now the question is: what to do with this? I'll look at the
documentation that you point to.

thanx - pat
Jul 28 '08 #4
On Jul 28, 3:33*pm, "p...@well.com" <p...@well.comwrote:[quote]
On Jul 28, 3:29*pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
>msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
># xlApp.Workbooks.Add()
># xlApp.ActiveSheet.Cells(1,1).Value = 'A'
># xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
># xlBook = xlApp.ActiveWorkbook
># xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional')
># test 2 - returns check = False
>check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>indexperf/excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
>print(response.fp.name)
>print(xlApp.name)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. *It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.

right - I got it backwards.


Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). *Would be happy to learnif
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( * [size])
* * *Read at most size bytes from the file (less if the read hitsEOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez

Just stumbled upon .read:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Now the question is: what to do with this? *I'll look at the
documentation that you point to.

thanx - pat
Or rather (next iteration):

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)

The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).

And then when I do:

print(response)

I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.

When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

pat
Jul 28 '08 #5
On Mon, Jul 28, 2008 at 7:43 PM, pa**@well.com <pa**@well.comwrote:[quote]
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
>On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
import urllib2
from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close
Woops hit Send when I wanted Preview. Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.

right - I got it backwards.


Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
>http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez

Just stumbled upon .read:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Now the question is: what to do with this? I'll look at the
documentation that you point to.

thanx - pat

Or rather (next iteration):

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)

The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).

And then when I do:

print(response)

I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.

When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
You don't need to convince Python, just write it to a file.
More reading for you: http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list


--
-- Guilherme H. Polo Goncalves
Jul 28 '08 #6
On Jul 28, 3:52*pm, "Guilherme Polo" <ggp...@gmail.comwrote:[quote]
On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
>msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
># xlApp.Workbooks.Add()
># xlApp.ActiveSheet.Cells(1,1).Value = 'A'
># xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
># xlBook = xlApp.ActiveWorkbook
># xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional')
># test 2 - returns check = False
>check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>indexperf/excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
>print(response.fp.name)
>print(xlApp.name)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. *It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names withoutthe
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). *Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( * [size])
* * *Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquireas
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? *I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. *Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list

--
-- Guilherme H. Polo Goncalves
OK:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

pat
Jul 28 '08 #7
On Jul 28, 4:04*pm, "p...@well.com" <p...@well.comwrote:[quote]
On Jul 28, 3:52*pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
>On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program..
>This URL will retrieve an excel spreadsheet containing (that day's)
>msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
># xlApp.Workbooks.Add()
># xlApp.ActiveSheet.Cells(1,1).Value = 'A'
># xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
># xlBook = xlApp.ActiveWorkbook
># xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional')
># test 2 - returns check = False
>check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>indexperf/excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
>print(response.fp.name)
>print(xlApp.name)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. *It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file..
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
>right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). *Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
>http://docs.python.org/lib/bltin-file-objects.html
"""
read( * [size])
* * *Read at most size bytes from the file (less if the readhits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
>Just stumbled upon .read:
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').read
>Now the question is: what to do with this? *I'll look at the
>documentation that you point to.
>thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. *Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
>http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves

OK:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

pat
Nope - must have been stumbling over my own feet.

'wb' _is_ necessary (as I would expect).

So it works:

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'wb')
f.write(response)
f.flush
f.close

I know the f.flush and f.close are redundant - in the sense that both
flush the contents to disk. So I can probably just take out the
f.flush.

Thanx for the help.

pat
Jul 28 '08 #8
On Mon, Jul 28, 2008 at 8:04 PM, pa**@well.com <pa**@well.comwrote:[quote]
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
>On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
import urllib2
from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close
Woops hit Send when I wanted Preview. Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
>right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
>http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
>Just stumbled upon .read:
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
>Now the question is: what to do with this? I'll look at the
documentation that you point to.
>thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list

--
-- Guilherme H. Polo Goncalves

OK:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&currency=15&style=C&siz e=36&market=1897&asOf=Jul+25%2C+2008&export=Excel_ IEIPerfRegional')

f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()

and then..
>
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
>
pat
--
http://mail.python.org/mailman/listinfo/python-list


--
-- Guilherme H. Polo Goncalves
Jul 28 '08 #9
On Jul 28, 4:20*pm, "Guilherme Polo" <ggp...@gmail.comwrote:[quote]
On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
>msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
># xlApp.Workbooks.Add()
># xlApp.ActiveSheet.Cells(1,1).Value = 'A'
># xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
># xlBook = xlApp.ActiveWorkbook
># xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional')
># test 2 - returns check = False
>check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>indexperf/excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
>print(response.fp.name)
>print(xlApp.name)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. *It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). *Would be happy tolearn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( * [size])
* * *Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? *I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. *Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

I would initially change that to:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)

f = open("c:\\msci.xls", "wb")
for line in response:
* * f.write(line)
f.close()

and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

try it.
pat
--
http://mail.python.org/mailman/listinfo/python-list

--
-- Guilherme H. Polo Goncalves
A simple f.write(response) does work (click on a single row in Excel
and you get a single row).

But I can see that what you recommend Guilherme is probably safer -
thanx.

pat
Jul 28 '08 #10
On Jul 29, 12:41*am, "p...@well.com" <p...@well.comwrote:[quote]
On Jul 28, 4:20*pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
>On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
>On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (thatday's)
>msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
># xlApp.Workbooks.Add()
># xlApp.ActiveSheet.Cells(1,1).Value = 'A'
># xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
># xlBook = xlApp.ActiveWorkbook
># xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional')
># test 2 - returns check = False
>check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>indexperf/excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
>print(response.fp.name)
>print(xlApp.name)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and foundthe
Request class with the method has_data() on it. *It returnsFalse.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__','close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
>right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybethere
are much better modules to do this stuff). *Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
>http://docs.python.org/lib/bltin-file-objects.html
"""
read( * [size])
* * *Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
>Just stumbled upon .read:
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').read
>Now the question is: what to do with this? *I'll look at the
>documentation that you point to.
>thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. *Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
>You don't need to convince Python, just write it to a file.
>More reading for you:http://docs.python.org/tut/node9.html
pat
--
>http://mail.python.org/mailman/listinfo/python-list
>--
>-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
* * f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat
--
>http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves

A simple f.write(response) does work (click on a single row in Excel
and you get a single row).

But I can see that what you recommend Guilherme is probably safer -
thanx.

pat
If response contains a string then:

for line in response:
f.write(line)

will actually be writing the string one character at a time!
Jul 29 '08 #11
On Jul 28, 5:39*pm, MRAB <goo...@mrabarnett.plus.comwrote:[quote]
On Jul 29, 12:41*am, "p...@well.com" <p...@well.comwrote:
On Jul 28, 4:20*pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
>msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
># xlApp.Workbooks.Add()
># xlApp.ActiveSheet.Cells(1,1).Value = 'A'
># xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
># xlBook = xlApp.ActiveWorkbook
># xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional')
># test 2 - returns check = False
>check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>indexperf/excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
>print(response.fp.name)
>print(xlApp.name)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. *It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excelfile.
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). *Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) arepretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( * [size])
* * *Read at most size bytes from the file (less if theread hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense tocontinue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, evenif no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? *I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read()has
returned the data as a string object. *Now - how do I convincePython
that the string object is in fact an excel file - and save it todisk?
You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
* * f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
A simple f.write(response) does work (click on a single row in Excel
and you get a single row).
But I can see that what you recommend Guilherme is probably safer -
thanx.
pat

If response contains a string then:

for line in response:
* * f.write(line)

will actually be writing the string one character at a time!
Hmm. In this case, response was a string object. (that's what
urllib2.urlopen().read() returns).

My concern was with line ending characters (delimiters). I was
thinking that if the string object doesn't contain line ending
delimiters then maybe the for loop was better. Although that begs the
question of how

for line in reponse

recognizes lines (as defined by line ending delimiters) in the first
place.

pat
Jul 29 '08 #12
On Mon, Jul 28, 2008 at 9:39 PM, MRAB <go****@mrabarnett.plus.comwrote:[quote]
On Jul 29, 12:41 am, "p...@well.com" <p...@well.comwrote:
>On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
import urllib2
from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close
Woops hit Send when I wanted Preview. Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
>right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
>http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
>Just stumbled upon .read:
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
>Now the question is: what to do with this? I'll look at the
documentation that you point to.
>thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
>You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list
>--
-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves

A simple f.write(response) does work (click on a single row in Excel
and you get a single row).

But I can see that what you recommend Guilherme is probably safer -
thanx.

pat

If response contains a string then:
Did you notice I removed the read(...) part ?
for line in response:
f.write(line)

will actually be writing the string one character at a time!
--
http://mail.python.org/mailman/listinfo/python-list


--
-- Guilherme H. Polo Goncalves
Jul 29 '08 #13
On Jul 28, 6:05*pm, "Guilherme Polo" <ggp...@gmail.comwrote:[quote]
On Mon, Jul 28, 2008 at 9:39 PM, MRAB <goo...@mrabarnett.plus.comwrote:
On Jul 29, 12:41 am, "p...@well.com" <p...@well.comwrote:
On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
>On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
>On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
>msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
># xlApp.Workbooks.Add()
># xlApp.ActiveSheet.Cells(1,1).Value = 'A'
># xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
># xlBook = xlApp.ActiveWorkbook
># xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional')
># test 2 - returns check = False
>check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>indexperf/excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
>print(response.fp.name)
>print(xlApp.name)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate anexcel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. *It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen isreturning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next','read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the nameswithout the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
>right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). *Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additionalmethods:
"""
And then for file-like objects:
>http://docs.python.org/lib/bltin-file-objects.html
"""
read( * [size])
* * *Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may callthe
underlying C function fread() more than once in an effort toacquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
>Just stumbled upon .read:
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').read
>Now the question is: what to do with this? *I'll look at the
>documentation that you point to.
>thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. *Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
>You don't need to convince Python, just write it to a file.
>More reading for you:http://docs.python.org/tut/node9.html
pat
--
>http://mail.python.org/mailman/listinfo/python-list
>--
>-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
* * f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat
--
>http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
A simple f.write(response) does work (click on a single row in Excel
and you get a single row).
But I can see that what you recommend Guilherme is probably safer -
thanx.
pat
If response contains a string then:

Did you notice I removed the read(...) part ?
for line in response:
* *f.write(line)
will actually be writing the string one character at a time!
--
http://mail.python.org/mailman/listinfo/python-list

--
-- Guilherme H. Polo Goncalves
Actually no I didn't Guilherme (although I'll take it out now).

Would leaving the in urllib2.urlopen().read() imply, as MRAB would
seem to indicate, that the following for loop would act byte-by-byte?
And if so, how?

Even with the .read() in, it was very fast. But it looks like it
won't hurt (and very possibly helps) to take it out.

pat
Jul 29 '08 #14
On Jul 28, 6:05*pm, "Guilherme Polo" <ggp...@gmail.comwrote:[quote]
On Mon, Jul 28, 2008 at 9:39 PM, MRAB <goo...@mrabarnett.plus.comwrote:
On Jul 29, 12:41 am, "p...@well.com" <p...@well.comwrote:
On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
>On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
>On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
>Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
>msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
># xlApp.Workbooks.Add()
># xlApp.ActiveSheet.Cells(1,1).Value = 'A'
># xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
># xlBook = xlApp.ActiveWorkbook
># xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional')
># test 2 - returns check = False
>check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>indexperf/excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
>print(response.fp.name)
>print(xlApp.name)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. *Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate anexcel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. *It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen isreturning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. *This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next','read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the nameswithout the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
>right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). *Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additionalmethods:
"""
And then for file-like objects:
>http://docs.python.org/lib/bltin-file-objects.html
"""
read( * [size])
* * *Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may callthe
underlying C function fread() more than once in an effort toacquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
>Just stumbled upon .read:
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>excel?
>priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
>+25%2C+2008&export=Excel_IEIPerfRegional').read
>Now the question is: what to do with this? *I'll look at the
>documentation that you point to.
>thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. *Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
>You don't need to convince Python, just write it to a file.
>More reading for you:http://docs.python.org/tut/node9.html
pat
--
>http://mail.python.org/mailman/listinfo/python-list
>--
>-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
* * f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat
--
>http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
A simple f.write(response) does work (click on a single row in Excel
and you get a single row).
But I can see that what you recommend Guilherme is probably safer -
thanx.
pat
If response contains a string then:

Did you notice I removed the read(...) part ?
for line in response:
* *f.write(line)
will actually be writing the string one character at a time!
--
http://mail.python.org/mailman/listinfo/python-list

--
-- Guilherme H. Polo Goncalves
Actually no I didn't Guilherme (although I'll take it out now).

Would leaving the in urllib2.urlopen().read() imply, as MRAB would
seem to indicate, that the following for loop would act byte-by-byte?
And if so, how?

Even with the .read() in, it was very fast. But it looks like it
won't hurt (and very possibly helps) to take it out.

pat
Jul 29 '08 #15
On Tue, Jul 29, 2008 at 1:47 AM, pa**@well.com <pa**@well.comwrote:[quote]
On Jul 28, 6:05 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
>On Mon, Jul 28, 2008 at 9:39 PM, MRAB <goo...@mrabarnett.plus.comwrote:
On Jul 29, 12:41 am, "p...@well.com" <p...@well.comwrote:
On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.comwrote:
On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comwrote:
On Jul 28, 3:33 pm, "p...@well.com" <p...@well.comwrote:
On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well.com" <p...@well.comwrote:
Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
>>http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
>Want to write python to download and save the file.
>So far I've arrived at this:
>
># import pdb
import urllib2
from win32com.client import Dispatch
>xlApp = Dispatch("Excel.Application")
># test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
># pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_da ta()
>xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close
Woops hit Send when I wanted Preview. Looks like the html
tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
>right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
>http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
>Just stumbled upon .read:
>response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36& market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
>Now the question is: what to do with this? I'll look at the
documentation that you point to.
>thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
>You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list
>--
-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&m arket=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(10 00000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
>A simple f.write(response) does work (click on a single row in Excel
and you get a single row).
>But I can see that what you recommend Guilherme is probably safer -
thanx.
>pat
If response contains a string then:

Did you notice I removed the read(...) part ?
for line in response:
f.write(line)
will actually be writing the string one character at a time!
--
http://mail.python.org/mailman/listinfo/python-list

--
-- Guilherme H. Polo Goncalves

Actually no I didn't Guilherme (although I'll take it out now).

Would leaving the in urllib2.urlopen().read() imply, as MRAB would
seem to indicate, that the following for loop would act byte-by-byte?
And if so, how?
..read() returns a string, so yes.
The point in removing the .read(xxxxx) is that you no longer need to
guess how long is the file to read it entirely.
>
Even with the .read() in, it was very fast. But it looks like it
won't hurt (and very possibly helps) to take it out.

pat
--
http://mail.python.org/mailman/listinfo/python-list


--
-- Guilherme H. Polo Goncalves
Jul 29 '08 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: jim | last post by:
I'm trying to stream a html page to the user as an excel file. I'm currently using Response.ContentType = "application/vnd.ms-excel" Response.AddHeader "Content-Disposition", "inline;...
0
by: Cathy Bro | last post by:
I have been able to create and Active Server Page that when I link to it will ask if I want to download to an excel file which I do want. It works well, but there is one thing that I haven't been...
0
by: Pankaj Garg | last post by:
I have written a program, where I download reports in PDF or Excel format, depends on user selection. My code is working fine for PDF format, but same code os not working for excel file. When I...
4
by: Ali | last post by:
I need a functionality where my clients download Excel files and after they do, I do some processing. Downloading is easily achieved using a anchor or hyperlink tag, but that does not give me the...
1
by: Daniel | last post by:
Hi, can u explain to me what is the path need to use in order to upload and download the file based on below solution: 1.Set up a folder that both the web app and SQL server can get to. 2.Do a...
1
by: ksskumar2000 | last post by:
Hi Friends, What I want: If the user click a hyperlink, I have to download a .CSV file into the users machine. What I tried: <asp:HyperLink id=HyperLink1 runat="server"...
1
by: farhana | last post by:
Hi experts, Need ur help and advice, My excel file is for user to upload their product listing. I have created a macro programming in the excel file. To use the file, user need to download...
2
by: thanawala27 | last post by:
Hi. I want to download an Excel file from the server and am using the following code. <input type=button value="Click To Download" onClick="location.href='Logs.xls'"> On clicking this...
0
by: angelgal | last post by:
Hi ALL, I have to download a file from web and upload in MS SQLSERVER 2005. However, the problem i am facing is that the file i download in in web page format though it shows excel format. I have...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.