How can I get my https file fetch working? | Member | | Join Date: Aug 2007
Posts: 42
| |
Hi y'all!
I'm new at perl, and I'm trying to automate a file fetch.
I have this url (in this example called 'https://GetMyFile'), which, when I paste it into a browser, gives me the pop-up "File Download" - Do you want to open or save this file?.. And clicking 'save' gives me the file I want.
I would like to achieve the same result automatically, without having to paste the url into a browser and click 'save' a specify where to save my file.
So, here's my first attempt:
--------------------------------------------------------------------- - use strict;
-
use WWW::Mechanize;
-
use LWP::Debug qw(+);
-
-
my $ua = new LWP::UserAgent;
-
$ua->proxy([qw( https http )], "myProxyAddress");
-
-
my $url = "https://GetMyFile";
-
-
my $mech = WWW::Mechanize->new();
-
-
print "Fetching $url";
-
$mech->get( $url, ':content_file' => 'C:\Tmp\myFile.zip' );
-
die "Ooops, that didn't work: ", $mech->response->status_line unless $mech->success;
--------------------------------------------------------------------
The thing is, I don't get the "oops" printout, instead myFile.zip is downloaded to the correct location, but this file is corrupted. It seems it doesn't get downloaded entirely since when I download it manually it's much bigger.
Here are some debug printouts I get - LWP::UserAgent::new: ()
-
LWP::UserAgent::proxy: ARRAY(someHexNumber) myProxyAddress
-
LWP::UserAgent::proxy: https myProxyAddress
-
LWP::UserAgent::proxy: http myProxyAddress
-
LWP::UserAgent::new: ()
-
LWP::UserAgent::request: ()
-
HTTP::Cookies::add_cookie_header: Checking GetMyFile for cookies
-
LWP::UserAgent::send_request: GET https://GetMyFile
-
LWP::UserAgent::_need_proxy: Not proxied
-
LWP::Protocol::http::request: ()
-
LWP::Protocol::collect: read 336 bytes
-
LWP::UserAgent::request: Simple response: Found
-
LWP::UserAgent::request: ()
-
HTTP::Cookies::add_cookie_header: Checking GetMyFile for cookies
-
LWP::UserAgent::send_request: GET https://GetMyFile
-
LWP::UserAgent::_need_proxy: Not proxied
-
LWP::Protocol::http::request: ()
-
LWP::Protocol::collect: read 439 bytes
-
LWP::Protocol::collect: read 176 bytes
-
LWP::UserAgent::request: Simple response: Found
-
-
... Then these printouts are repeated
-
-
...
-
-
LWP::UserAgent::_need_proxy: Not proxied
-
LWP::Protocol::http::request: ()
-
LWP::Protocol::collect: read 869 bytes
-
LWP::Protocol::collect: read 4096 bytes
-
LWP::Protocol::collect: read 4096 bytes
-
LWP::Protocol::collect: read 2395 bytes
-
LWP::UserAgent::request: Simple response: OK
-
Fetching https://GetMyFile
-
Any help or suggestions as to why I don't get the entire file (?) would be greatly appreciated!
Cheers
|  | Expert | | Join Date: Jan 2007 Location: Southern California USA
Posts: 4,091
| | | re: How can I get my https file fetch working?
Looks like it should work. Don't know what the problem is.
|  | Site Moderator | | Join Date: May 2007 Location: New Hampshire
Posts: 2,571
| | | re: How can I get my https file fetch working?
I agree with Kevin. Right off the bat, it looks like it might work, but I haven't gone through it thoroughly. What I can say is that you want to look at the book "Spidering Hacks". Specifically, this part here: http://books.google.com/books?id=4M2...PXiAg#PPA60,M1
That will help you with a fetch using the Mechanize module.
Regards,
Jeff
| | Member | | Join Date: Aug 2007
Posts: 42
| | | re: How can I get my https file fetch working?
Hi! Thank you so much guys, for giving me feedback quickly!
I now know though, that the problem is related to credentials...
When, in the script, I change
$mech->get( $url, ':content_file' => 'C:\Tmp\myFile.zip' );
to
$mech->get( $url, ':content_file' => 'C:\Tmp\myFile.html' );
I can see that the downloaded file is indeed a webpage; and that is, a login page..
I don't really know how to solve this though. I will have to investigate further.
There is some autologin asp session involved when fetching files from where I want to fetch them. Probably the browser handles a lot of that "behind the scenes", and I don't really know exactly what's going on, which, of course I must, in order to get my script to work.. These enterprise networks.. *sigh* :)...
|  | Site Moderator | | Join Date: May 2007 Location: New Hampshire
Posts: 2,571
| | | re: How can I get my https file fetch working? Quote:
Originally Posted by MimiMi Hi! Thank you so much guys, for giving me feedback quickly!
I now know though, that the problem is related to credentials...
When, in the script, I change
$mech->get( $url, ':content_file' => 'C:\Tmp\myFile.zip' );
to
$mech->get( $url, ':content_file' => 'C:\Tmp\myFile.html' );
I can see that the downloaded file is indeed a webpage; and that is, a login page..
I don't really know how to solve this though. I will have to investigate further.
There is some autologin asp session involved when fetching files from where I want to fetch them. Probably the browser handles a lot of that "behind the scenes", and I don't really know exactly what's going on, which, of course I must, in order to get my script to work.. These enterprise networks.. *sigh* :)... Check out the module documentation on CPAN for WWW::Mechanize. I am pretty positive that it provides options for logging in to such pages, you just have to code for it.
I don't know if it will help any, but here is a script I wrote a while ago that logs into a website (you had to log in before you could see the list of files) and then downloads everything that was there: -
#!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
use File::Basename;
-
use WWW::Mechanize;
-
use MIME::Base64;
-
-
$|++;
-
-
my $username = "username";
-
my $password = "password";
-
my $url = "http://www.site.com/page.asp";
-
my $realm;
-
my $tempfile = "temp.txt";
-
-
my $agent = WWW::Mechanize->new();
-
my @args = (
-
Authorization => "Basic " . MIME::Base64::encode( $username . ':' . $password )
-
);
-
-
-
$agent->credentials( $url, $realm, $username, $password );
-
-
$agent->get( $url, @args)
-
Obviously, site name, username and password have all been changed to protect the innocent and the above values for each should be replaced with whatever you are using.
Regards,
Jeff
|  | Expert | | Join Date: Jan 2007 Location: Southern California USA
Posts: 4,091
| | | re: How can I get my https file fetch working?
Look into Win32::IE::Mechanize which can handle a lot more things than WWW::Mechanize can
| | Member | | Join Date: Aug 2007
Posts: 42
| | | re: How can I get my https file fetch working?
Hello again!
I appreciate all your efforts to help me out here!
I've been working on other things, but now it's time to get back to this. (I still haven't got it working).
Here's my current status:
The myFile.html I get from - $mech->get( $url, ':content_file' => 'C:\Tmp\myFile.html' );
-
(see previous posts if I'm unclear)
has JavaScript on it.. Here are some parts of the html-file (including the JavaScript): - <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
-
<HTML>
-
<head>
-
<title>TheCompany Portal Login</title>
-
<link type="text/css" rel="stylesheet" href="styles.css">
-
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
-
<META HTTP-EQUIV="Expires" CONTENT="-1">
-
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
-
-
<SCRIPT LANGUAGE="JavaScript">
-
function resetCredFields()
-
{
-
document.Login.PASSWORD.value = "";
-
}
-
-
function submitForm()
-
{
-
document.Login.submit();
-
}
-
-
function cancelLogin()
-
{
-
window.history.go(-1);
-
}
-
-
if (top.frames.length > 1)
-
{
-
top.location.href = document.location;
-
}
-
-
function checkEnter(event)
-
{
-
var code = 0;
-
NS4 = (document.layers) ? true : false;
-
if (NS4)
-
code = event.which;
-
else
-
code = event.keyCode;
-
if (code==13)
-
document.Login.submit();
-
}
-
-
</SCRIPT>
-
-
-
</head>
-
-
<BODY topmargin="0" leftmargin="0" marginwidth="0" marginheight="0">
-
-
-
-
-
<table height="95%" width="100%" border="0" cellspacing="0" cellpadding="0">
-
<tr>
... And so on and so forth..
I don't know anything about how WWW::Mechanize could work with JavaScript.. is that even possible? How then can I provide the JavaScript with the right credentials?
Cheers
| | Member | | Join Date: Aug 2007
Posts: 42
| | | re: How can I get my https file fetch working?
Sorry sorry.. I don't need to waste your time by asking silly questions such as whether WWW::Mechanize works with JavaScript, that wasn't hard to find out for myself. The answer is NO. Unfortunately.
Have to figure out how to solve this then.. some other way.. :/
Cheers
|  | Expert | | Join Date: Jan 2007 Location: Southern California USA
Posts: 4,091
| | | re: How can I get my https file fetch working?
I guess you missed my previosu post:
Look into Win32::IE::Mechanize which can handle a lot more things than WWW::Mechanize can
| | Member | | Join Date: Aug 2007
Posts: 42
| | | re: How can I get my https file fetch working?
Hi!
KevinADC: Yes that's right I missed looking into Win32::IE::Mechanize, sorry for that!
Now I've started looking into that though, and it seems to be filling my needs somewhat better, feels like I'm almost there, but still I don't get how I can get my files downloaded without manually having to provide any user input whatsoever.
As of now I get an IE browser starting up, and I get to the download file prompt, but I don't want to manually have to click
"save" and provide location etc.. Plus, I don't want IE to show at all.. Is that possible?
This script is to be run at a server, so I want everything to be "invisible"..
Here's my current script: -
use warnings;
-
use Win32::IE::Mechanize;
-
-
my $ie = Win32::IE::Mechanize->new( visible => 1 );
-
-
my $username = "user";
-
my $password = "pwd";
-
-
my $url = "http://weblink.To.TheFile";
-
my $realm;
-
-
$ie->credentials( 'myHostname:myPort', $realm, $username, $password );
-
-
print "Fetching $url";
-
$ie->get( $url, ':content_file' => 'C:\Temp\result\result.zip');
-
die "Ooops, this didn't work: ", $ie->response->status_line unless $ie->success;
-
|  | Expert | | Join Date: Jan 2007 Location: Southern California USA
Posts: 4,091
| | | re: How can I get my https file fetch working?
Sorry but I don't know the answer or have any suggestions for your last questions. All I can suggest is to carefully read the modules documentation and see if there is anything that can help you solve those parts of your question.
|  | Expert | | Join Date: Sep 2008 Location: Sydney, Australia
Posts: 173
| | | re: How can I get my https file fetch working?
That is not a Perl issue that is a browser issue, you have too look into your browser settings or use the first version of google chrome as they started download when a file was clicked on. (this was updated in newer versions as it is a security risk, this is why they have a save option).
|  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,449 network members.
|