473,666 Members | 2,412 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Mechanoid Web Browser - Recording Capability

I am trying to find a way to sign onto my Wall Street Journal account
(http://online.wsj.com/public/us) and automatically download various
financial pages on stocks and mutual funds that I am interested in
tracking. I have a subscription to this site and am trying to figure
out how to use python, which I have been trying to learn for the past
year, to automatically login and capture a few different pages.
I have mastered capturing web pages on non-password sites, but am
struggling otherwise and have been trying to learn how to program the
Mechanoid module (http://cheeseshop.python.org/pypi/mechanoid) to get
past the password protected site hurdle.

My questions are:
1. Is there an easier way to grab these pages from a password protected
site, or is the use of Mechanoid a reasonable approach?
2. Is there an easy way of recording a web surfing session in Firefox
to see what the browser sends to the site? I am thinking that this
might help me better understand the Mechanoid commands, and more easily
program it. I do a fair amount of VBA Programming in Microsoft Excel
and have always found the Macro Recording feature a very useful
starting point which has greatly helped me get up to speed.

Thanks for your help/insights.
Seymour

Sep 16 '06 #1
7 2935
"Seymour" <se************ @gmail.comwrite s:
I am trying to find a way to sign onto my Wall Street Journal account
(http://online.wsj.com/public/us) and automatically download various
financial pages on stocks and mutual funds that I am interested in
tracking. I have a subscription to this site and am trying to figure
[...]
My questions are:
1. Is there an easier way to grab these pages from a password protected
site, or is the use of Mechanoid a reasonable approach?
This is the first time I heard of anybody using mechanoid. As the
author of mechanize, of which mechnoid is a fork, I was always in the
dark about why the author decided to fork it (he hasn't emailed
me...).

I don't know if there's any activity on the mechanoid project, but I'm
certainly still working on mechanize, and there's an active mailing list:

http://wwwsearch.sourceforge.net/

https://lists.sourceforge.net/lists/...search-general

2. Is there an easy way of recording a web surfing session in Firefox
to see what the browser sends to the site? I am thinking that this
might help me better understand the Mechanoid commands, and more easily
program it. I do a fair amount of VBA Programming in Microsoft Excel
and have always found the Macro Recording feature a very useful
starting point which has greatly helped me get up to speed.
With Firefox, you can use the Livehttpheaders extension:

http://livehttpheaders.mozdev.org/
The mechanize docs explain how to turn on display of HTTP headers that
it sends.
Going further, certainly there's at least one HTTP-based recorder for
twill, which actually watches your browser traffic and generates twill
code for you (twill is a simple language for functional testing and
scraping built on top of mechanize):

http://twill.idyll.org/

http://darcs.idyll.org/%7Et/projects/scotch/doc/
That's not an entirely reliable process, but some people might find it
helpful.

I think there may be one for zope.testbrowse r too (or ZopeTestBrowser
(sp?), the standalone version that works without Zope) -- I'm not
sure. (zope.testbrows er is also built on mechanize.) Despite the
name, I'm told this can be used for scraping as well as testing.

I would imagine that it would be fairly easy to modify or extend
Selenium IDE to emit mechanize or twill or zope.testbrowse r (etc.)
code (perhaps without any coding, I used too many Firefox Selenium
plugins and now forget which had which features). Personally I would
avoid using Selenium itself to actually automate tasks, though, since
unlike mechanize &c., Selenium drags in an entire browser, which
brings with it some inflexibility (though not as bad as in the past).
It does have advantages though: most obviously, it knows JavaScript.
John
Sep 17 '06 #2
"Seymour" <se************ @gmail.comwrite s:
[...]
struggling otherwise and have been trying to learn how to program the
Mechanoid module (http://cheeseshop.python.org/pypi/mechanoid) to get
past the password protected site hurdle.

My questions are:
1. Is there an easier way to grab these pages from a password protected
site, or is the use of Mechanoid a reasonable approach?
[...]

Again, can't speak for mechanoid, but it should be straightforward
with mechanize (simplifiying one of the examples from the URL below):
http://wwwsearch.sourceforge.net/mechanize/

br = Browser()
br.add_password ("http://example.com/protected/", "joe", "password")
br.set_debug_ht tp(True) # Print HTTP headers.
br.open("http://www.example.com/protected/blah.html")
print br.response().r ead()
John
Sep 17 '06 #3
"Seymour" <se************ @gmail.comwrite s:
I am trying to find a way to sign onto my Wall Street Journal account
(http://online.wsj.com/public/us) and automatically download various
financial pages on stocks and mutual funds that I am interested in
tracking. I have a subscription to this site and am trying to figure
out how to use python, which I have been trying to learn for the past
year, to automatically login and capture a few different pages.
[...]

Just to add: It's quite possible that site has an "no scraping"
condition in their terms of use. It seems standard legal boilerplate
on commercial sites these days. Not a good thing on the whole, I tend
to think, but you should be aware of it.
John
Sep 17 '06 #4
Thanks John!
Lots of great leads in your post that I am busy looking at. I did try
one program, MaxQ, that records web surfing. It seems to work great.
I have looked at all of your leads and plan to give them all a try.
BTW, I am not sure how I came accross Mechanoid before Mechanize, but I
did and started to study that. Somehow I had the notion that
Mechanize was a Pearl script.
Thanks again,
Seymour

John J. Lee wrote:
"Seymour" <se************ @gmail.comwrite s:
I am trying to find a way to sign onto my Wall Street Journal account
(http://online.wsj.com/public/us) and automatically download various
financial pages on stocks and mutual funds that I am interested in
tracking. I have a subscription to this site and am trying to figure
out how to use python, which I have been trying to learn for the past
year, to automatically login and capture a few different pages.
[...]

Just to add: It's quite possible that site has an "no scraping"
condition in their terms of use. It seems standard legal boilerplate
on commercial sites these days. Not a good thing on the whole, I tend
to think, but you should be aware of it.
John
Sep 18 '06 #5
"Seymour" <se************ @gmail.comwrite s:
Somehow I had the notion that Mechanize was a Pearl script.
mechanize the Python module started as a port of Andy Lester's Perl
module WWW::Mechanize (in turn based on Gisle Aas' libwww-perl), and
on some very high level has "the same" conceptual interface, but most
of the details (internal structure, features and bugs ;-) are
different to LWP and WWW::Mechanize due to the integration with
urllib2, httplib and friends, and with my own code. Most parts of the
code are no longer recognisable as having originated in LWP (and of
course, lots of it *didn't* originate there).
John

Sep 19 '06 #6
"Seymour" <se************ @gmail.comwrite s:
[...]
one program, MaxQ, that records web surfing. It seems to work great.
[...]

There are lots of such programs about (ISTR twill used to use MaxQ for
its recording feature, but I think Titus got rid of it in favour of
his own code, for some reason). How useful they are depends on the
details of what you're doing: the information that goes across HTTP is
on a fairly low level, so e.g., most obviously, you may need to be
sending a session ID that varies per-request. Those programs usually
have some way of dealing with that specific problem, but you may run
into other problems that have the same origin. Don't let me put you
off it gets your job done, but it's good to be a bit wary: All current
web-scraping approaches using free software suck in one way or
another.
John

Sep 19 '06 #7
You can try SWExplorerAutom ation (SWEA) (http:\\webunit testing.com).
It works very well with the password protected sites. SWEA is .Net API,
but you can use IronPython to access it.

Seymour wrote:
I am trying to find a way to sign onto my Wall Street Journal account
(http://online.wsj.com/public/us) and automatically download various
financial pages on stocks and mutual funds that I am interested in
tracking. I have a subscription to this site and am trying to figure
out how to use python, which I have been trying to learn for the past
year, to automatically login and capture a few different pages.
I have mastered capturing web pages on non-password sites, but am
struggling otherwise and have been trying to learn how to program the
Mechanoid module (http://cheeseshop.python.org/pypi/mechanoid) to get
past the password protected site hurdle.

My questions are:
1. Is there an easier way to grab these pages from a password protected
site, or is the use of Mechanoid a reasonable approach?
2. Is there an easy way of recording a web surfing session in Firefox
to see what the browser sends to the site? I am thinking that this
might help me better understand the Mechanoid commands, and more easily
program it. I do a fair amount of VBA Programming in Microsoft Excel
and have always found the Macro Recording feature a very useful
starting point which has greatly helped me get up to speed.

Thanks for your help/insights.
Seymour
Sep 20 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
2892
by: Ryan Stewart | last post by:
I'm working on a project to collect web application usage statistics. What are the recommended ways of detecting whether a browser is JavaScript enabled and/or capable? Obviously I can write a script to invoke something on the server, and if it works, then it works. Is there a better way? I'm looking for the least intrusive way of doing it, from a web application point of view. i.e. I'd like to be able to drop this into an existing...
3
1691
by: Prabhat | last post by:
Hi All, I have IE6.0 SP1 and I executed one asp file with the below script to check the browser capability but see the result below the script...Copied from MSDN :) <% Set bc = Server.CreateObject("MSWC.BrowserType") %> Browser: <%= bc.browser %><BR> Version: <%= bc.version %><BR> Supports frames?
6
2502
by: Dovelet | last post by:
Hi all, I would like to write DOS program to change the MS Windows Sound Recording source. When I run it with the parameter, it will change the recording source as follow: C:\> abc.exe microphone <- the Sound recording source change to Microphone C:\> abc.exe line <- the sound recording source change to Line-IN
3
1690
by: ms | last post by:
Hi Everyone, You all would be aware of the fact that we boast about .net supporting multiple web browsers. I hope we have all experienced that our screen layouts look different in every other browser. (Netscape being the worst!) So how far can we agree on this statement? Manoj
4
2712
by: Paul W | last post by:
Hi - can someone point me to info on the issues/resolutions of supporting the safari browser? To help me understand, if I was developing pages in say FrontPage, what attributes would I set for 'target browser'? I'm having a helluva time with table layouts etc and goin' stir crazy.. Thanks, Paul.
1
1818
by: Sakharam Phapale | last post by:
Hi All, I am developing an application like sound recorder. While recording if there is a silence for more than given time (say 5 seconds), Recording should be paused.
1
3879
by: Tom Yee | last post by:
I would like to write a Windows Service that can communicate with any open browser windows that an interactive user may be running. The service itself does not need to have a UI. The company that I work for develops software that monitors Customer Service Representative (CSR) performance. When you call a help desk and hear "This call may be monitored for quality purposes", the software that does this monitoring very likely has been...
5
5333
by: Pipp | last post by:
Hi, this simple code works well to add a bookmark on IE, but it doesn't work on Firefox <a href="javascript:window.external.AddFavorite('http:// www.mysite.com','My site is cool');"> Can someone suggest me something ?
0
2124
by: suchiate | last post by:
Hi All, Would like to have a discussion regarding the current and near future mobile technologies on whether which technologies have reached a certain capability for doing video streaming and live recording services? Please share :D Thanks.
0
8352
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8863
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8780
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8549
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6189
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5661
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4192
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4358
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1763
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.