By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,485 Members | 1,061 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,485 IT Pros & Developers. It's quick & easy.

downloading a link with javascript in it..

P: n/a
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)

Jun 27 '08 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Jetus schrieb:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)
Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.

There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.

Diez
Jun 27 '08 #2

P: n/a
On May 12, 1:54*pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says

href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.
2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/re...S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/re...m=DN2007036179

You can use urlparse.urljoin() to join a relative path to the current
url:
import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--
http://www.landrecords.jcc.ky.gov/re...m=DN2007036179

3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")
You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')
All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.

Jun 27 '08 #3

P: n/a
On May 12, 4:59*pm, 7stud <bbxx789_0...@yahoo.comwrote:
On May 12, 1:54*pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.
BeautifulSoup will allow you to locate and extract the href attribute:

javascript:openimagewin('JCCOGetImage.jsp?refnum=D N2007036179');

See: "The attributes of Tags" in the BS docs.

Then you can use string functions(preferable) or a regex to get
everything between the parentheses(remove the quotes around the path,
too)
Jun 27 '08 #4

P: n/a
Jetus wrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)
You may want to take a look at mechanize, I'm having pretty good luck with using
it to do the types of things you describe.
http://wwwsearch.sourceforge.net/mechanize/

-Larry
Jun 27 '08 #5

P: n/a
On May 12, 4:06 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
Jetus schrieb:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?
When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')
Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2
params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]
data = urllib.urlencode(params)
f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)
s = f.read()
f.close()
open('jcolib.html','w').write(s)

Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.

There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.

Diez
Thanks Diez;
Never used Firebug, and could not find the http-header section, but it
lead me to Tamper Data, and that was perfect to give me the headers.
Thanks for the input.
Jun 27 '08 #6

P: n/a
On May 12, 6:59 pm, 7stud <bbxx789_0...@yahoo.comwrote:
On May 12, 1:54 pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.

2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/re...S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/re...jsp?refnum=DN2...

You can use urlparse.urljoin() to join a relative path to the current
url:

import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2...

3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")

You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')

All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.
7Stud;
Thanks for sharing your knowledge!!

1)The proper url to the website is http://www.landrecords.jcc.ky.gov/records/S0Search.html.

2) The join won't work. I found that the request it sends is
http://206.196.0.195/cgi-bin/webview...2=SDAAAA76070B
It looks like it generates a random code for param2...
I have two choices for generating this javascript,
I can click on the View, or in the form, if I put a "i" in the code
and click on the
option link, it will send me pdf file.

3) Was not sure why you suggested I use the Webbrowser module?
But I am glad to find out about it.
Jun 27 '08 #7

P: n/a
Op Mon, 12 May 2008 22:06:28 +0200, schreef Diez B. Roggisch:
There is no way to interpret the JS in Python,
There is at least one way:
<http://wwwsearch.sourceforge.net/python-spidermonkey/>
--
JanC
Jun 27 '08 #8

This discussion thread is closed

Replies have been disabled for this discussion.