"Seymour" <se************@gmail.comwrites:
I am trying to find a way to sign onto my Wall Street Journal account
(http://online.wsj.com/public/us) and automatically download various
financial pages on stocks and mutual funds that I am interested in
tracking. I have a subscription to this site and am trying to figure
[...]
My questions are:
1. Is there an easier way to grab these pages from a password protected
site, or is the use of Mechanoid a reasonable approach?
This is the first time I heard of anybody using mechanoid. As the
author of mechanize, of which mechnoid is a fork, I was always in the
dark about why the author decided to fork it (he hasn't emailed
me...).
I don't know if there's any activity on the mechanoid project, but I'm
certainly still working on mechanize, and there's an active mailing list:
http://wwwsearch.sourceforge.net/ https://lists.sourceforge.net/lists/...search-general
2. Is there an easy way of recording a web surfing session in Firefox
to see what the browser sends to the site? I am thinking that this
might help me better understand the Mechanoid commands, and more easily
program it. I do a fair amount of VBA Programming in Microsoft Excel
and have always found the Macro Recording feature a very useful
starting point which has greatly helped me get up to speed.
With Firefox, you can use the Livehttpheaders extension:
http://livehttpheaders.mozdev.org/
The mechanize docs explain how to turn on display of HTTP headers that
it sends.
Going further, certainly there's at least one HTTP-based recorder for
twill, which actually watches your browser traffic and generates twill
code for you (twill is a simple language for functional testing and
scraping built on top of mechanize):
http://twill.idyll.org/ http://darcs.idyll.org/%7Et/projects/scotch/doc/
That's not an entirely reliable process, but some people might find it
helpful.
I think there may be one for zope.testbrowser too (or ZopeTestBrowser
(sp?), the standalone version that works without Zope) -- I'm not
sure. (zope.testbrowser is also built on mechanize.) Despite the
name, I'm told this can be used for scraping as well as testing.
I would imagine that it would be fairly easy to modify or extend
Selenium IDE to emit mechanize or twill or zope.testbrowser (etc.)
code (perhaps without any coding, I used too many Firefox Selenium
plugins and now forget which had which features). Personally I would
avoid using Selenium itself to actually automate tasks, though, since
unlike mechanize &c., Selenium drags in an entire browser, which
brings with it some inflexibility (though not as bad as in the past).
It does have advantages though: most obviously, it knows JavaScript.
John