I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?
When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')
Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2
params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]
data = urllib.urlencode(params)
f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)
s = f.read()
f.close()
open('jcolib.html','w').write(s) 7 3635
Jetus schrieb:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?
When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')
Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2
params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]
data = urllib.urlencode(params)
f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)
s = f.read()
f.close()
open('jcolib.html','w').write(s)
Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.
There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.
Diez
On May 12, 1:54*pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
1) Use BeautifulSoup to extract the path:
JCCOGetImage.jsp?refnum=DN2007036179
from the html page.
2) The path is relative to the current url, so if the current url is: http://www.landrecords.jcc.ky.gov/re...S3DataLKUP.jsp
Then the url to the page you want is: http://www.landrecords.jcc.ky.gov/re...m=DN2007036179
You can use urlparse.urljoin() to join a relative path to the current
url:
import urlparse
base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'
target_url = urlparse.urljoin(base_url, relative_url)
print target_url
--output:-- http://www.landrecords.jcc.ky.gov/re...m=DN2007036179
3) Python has a webbrowser module that allows you to open urls in a
browser:
import webbrowser
webbrowser.open("www.google.com")
You could also use system() or os.startfile()[Windows], to do the same
thing:
os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')
#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')
All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.
On May 12, 4:59*pm, 7stud <bbxx789_0...@yahoo.comwrote:
On May 12, 1:54*pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
1) Use BeautifulSoup to extract the path:
JCCOGetImage.jsp?refnum=DN2007036179
from the html page.
BeautifulSoup will allow you to locate and extract the href attribute:
javascript :openimagewin('JCCOGetImage.jsp?refnum=D N2007036179');
See: "The attributes of Tags" in the BS docs.
Then you can use string functions(preferable) or a regex to get
everything between the parentheses(remove the quotes around the path,
too)
Jetus wrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?
When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')
Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2
params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]
data = urllib.urlencode(params)
f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)
s = f.read()
f.close()
open('jcolib.html','w').write(s)
You may want to take a look at mechanize, I'm having pretty good luck with using
it to do the types of things you describe. http://wwwsearch.sourceforge.net/mechanize/
-Larry
On May 12, 4:06 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
Jetus schrieb:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?
When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')
Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2
params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]
data = urllib.urlencode(params)
f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)
s = f.read()
f.close()
open('jcolib.html','w').write(s)
Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.
There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.
Diez
Thanks Diez;
Never used Firebug, and could not find the http-header section, but it
lead me to Tamper Data, and that was perfect to give me the headers.
Thanks for the input.
On May 12, 6:59 pm, 7stud <bbxx789_0...@yahoo.comwrote:
On May 12, 1:54 pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
1) Use BeautifulSoup to extract the path:
JCCOGetImage.jsp?refnum=DN2007036179
from the html page.
2) The path is relative to the current url, so if the current url is:
http://www.landrecords.jcc.ky.gov/re...S3DataLKUP.jsp
Then the url to the page you want is:
http://www.landrecords.jcc.ky.gov/re...jsp?refnum=DN2...
You can use urlparse.urljoin() to join a relative path to the current
url:
import urlparse
base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'
target_url = urlparse.urljoin(base_url, relative_url)
print target_url
--output:--http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2...
3) Python has a webbrowser module that allows you to open urls in a
browser:
import webbrowser
webbrowser.open("www.google.com")
You could also use system() or os.startfile()[Windows], to do the same
thing:
os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')
#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')
All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.
7Stud;
Thanks for sharing your knowledge!!
1)The proper url to the website is http://www.landrecords.jcc.ky.gov/records/S0Search.html.
2) The join won't work. I found that the request it sends is http://206.196.0.195/cgi-bin/webview...2=SDAAAA76070B
It looks like it generates a random code for param2...
I have two choices for generating this javascript,
I can click on the View, or in the form, if I put a "i" in the code
and click on the
option link, it will send me pdf file.
3) Was not sure why you suggested I use the Webbrowser module?
But I am glad to find out about it.
Op Mon, 12 May 2008 22:06:28 +0200, schreef Diez B. Roggisch:
There is no way to interpret the JS in Python,
There is at least one way:
<http://wwwsearch.sourceforge.net/python-spidermonkey/>
--
JanC This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Steve |
last post by:
hello all
im looking to make a download script as i cannot use the .htaccess to
control directory access to files as im using a php login system using
mysql and sessions.
i have been playing...
|
by: Tom Haughton |
last post by:
To make better use of my commuting time I want to download audio files from
websites such as:
http://freshair.npr.org/day_fa.jhtml?displayValue=day&todayDate=02/27/2004
for listening to them...
|
by: Frances Del Rio |
last post by:
I have a bunch of Photoshop images on my server, I want when user clicks
on a link to a Photoshop img for it to start downloading (like when you
click on a Word file..) is there a way to do this...
|
by: Tony K |
last post by:
I have the most peculiar problem with an ASP.NET page which we use for
downloading a file.
When the user clicks on a link, the link points to an ASPX page which
downloads the file selected.
...
|
by: Doug van Vianen |
last post by:
Hi,
Is there some way in JavaScript to stop the downloading of pictures from a
web page?
Thank you.
Doug van Vianen
|
by: apondu |
last post by:
Hi,
I am new to web services and i am facing a problem.
I am interested in downloading some file from internet and use it for
furthur processing For eg. i have a file at a the following URL and...
|
by: apondu |
last post by:
Hi,
I am new to web services and i am facing a problem.
I am interested in downloading some file from internet and use it for
furthur processing For eg. i have a file at a the following URL and...
|
by: mahvarjin |
last post by:
hi,
I am using passthru() function to download zip file. During downloading process I can't get response from other link. If I click other link in my page at the downloading time I got response...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |