473,320 Members | 1,946 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

downloading a link with javascript in it..

I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)

Jun 27 '08 #1
7 3635
Jetus schrieb:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)
Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.

There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.

Diez
Jun 27 '08 #2
On May 12, 1:54*pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says

href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.
2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/re...S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/re...m=DN2007036179

You can use urlparse.urljoin() to join a relative path to the current
url:
import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--
http://www.landrecords.jcc.ky.gov/re...m=DN2007036179

3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")
You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')
All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.

Jun 27 '08 #3
On May 12, 4:59*pm, 7stud <bbxx789_0...@yahoo.comwrote:
On May 12, 1:54*pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.
BeautifulSoup will allow you to locate and extract the href attribute:

javascript:openimagewin('JCCOGetImage.jsp?refnum=D N2007036179');

See: "The attributes of Tags" in the BS docs.

Then you can use string functions(preferable) or a regex to get
everything between the parentheses(remove the quotes around the path,
too)
Jun 27 '08 #4
Jetus wrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)
You may want to take a look at mechanize, I'm having pretty good luck with using
it to do the types of things you describe.
http://wwwsearch.sourceforge.net/mechanize/

-Larry
Jun 27 '08 #5
On May 12, 4:06 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
Jetus schrieb:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?
When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')
Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2
params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]
data = urllib.urlencode(params)
f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)
s = f.read()
f.close()
open('jcolib.html','w').write(s)

Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.

There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.

Diez
Thanks Diez;
Never used Firebug, and could not find the http-header section, but it
lead me to Tamper Data, and that was perfect to give me the headers.
Thanks for the input.
Jun 27 '08 #6
On May 12, 6:59 pm, 7stud <bbxx789_0...@yahoo.comwrote:
On May 12, 1:54 pm, Jetus <stevegi...@gmail.comwrote:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:openimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.

2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/re...S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/re...jsp?refnum=DN2...

You can use urlparse.urljoin() to join a relative path to the current
url:

import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2...

3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")

You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')

All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.
7Stud;
Thanks for sharing your knowledge!!

1)The proper url to the website is http://www.landrecords.jcc.ky.gov/records/S0Search.html.

2) The join won't work. I found that the request it sends is
http://206.196.0.195/cgi-bin/webview...2=SDAAAA76070B
It looks like it generates a random code for param2...
I have two choices for generating this javascript,
I can click on the View, or in the form, if I put a "i" in the code
and click on the
option link, it will send me pdf file.

3) Was not sure why you suggested I use the Webbrowser module?
But I am glad to find out about it.
Jun 27 '08 #7
Op Mon, 12 May 2008 22:06:28 +0200, schreef Diez B. Roggisch:
There is no way to interpret the JS in Python,
There is at least one way:
<http://wwwsearch.sourceforge.net/python-spidermonkey/>
--
JanC
Jun 27 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Steve | last post by:
hello all im looking to make a download script as i cannot use the .htaccess to control directory access to files as im using a php login system using mysql and sessions. i have been playing...
1
by: Tom Haughton | last post by:
To make better use of my commuting time I want to download audio files from websites such as: http://freshair.npr.org/day_fa.jhtml?displayValue=day&todayDate=02/27/2004 for listening to them...
1
by: Frances Del Rio | last post by:
I have a bunch of Photoshop images on my server, I want when user clicks on a link to a Photoshop img for it to start downloading (like when you click on a Word file..) is there a way to do this...
6
by: Tony K | last post by:
I have the most peculiar problem with an ASP.NET page which we use for downloading a file. When the user clicks on a link, the link points to an ASPX page which downloads the file selected. ...
23
by: Doug van Vianen | last post by:
Hi, Is there some way in JavaScript to stop the downloading of pictures from a web page? Thank you. Doug van Vianen
1
by: apondu | last post by:
Hi, I am new to web services and i am facing a problem. I am interested in downloading some file from internet and use it for furthur processing For eg. i have a file at a the following URL and...
1
by: apondu | last post by:
Hi, I am new to web services and i am facing a problem. I am interested in downloading some file from internet and use it for furthur processing For eg. i have a file at a the following URL and...
0
by: mahvarjin | last post by:
hi, I am using passthru() function to download zip file. During downloading process I can't get response from other link. If I click other link in my page at the downloading time I got response...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.