By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,364 Members | 1,254 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,364 IT Pros & Developers. It's quick & easy.

JavaScript web scraping test cases?

P: n/a
I've put together a Python package for scraping / testing pages that
depend on embedded JavaScript code (without depending on IE, Mozilla
or Konqueror, and with the DOM etc. all implemented in pure Python --
mostly a hacked 4DOM, with some bits from pxdom; the JavaScript
interpreter I'm using ATM is spidermonkey). It's still missing a lot
and is pre-alpha, but it works, just barely.

Anyway, the point of this post is that I'm looking for pages to test
it on, so if you have a page that you'd like scraped (one that uses
JavaScript in some non-trivial way, of course! -- for dynamically
modifying forms, setting cookies, or whatever), mail me the details:
better that than some randomly-selected site from the Internet.
Obviously, it should be something that doesn't violate any terms &
conditions of use or otherwise cause people trouble, and preferably
that doesn't require any signup.
[In fact, TBH, my completely ad-hoc methodology with this is to write
some web scraping code, discover that the JavaScript breaks things,
often by depending on some nonstandard DOM feature, hack the DOM a
bit, etc. Hopefully I'll reach a point in understanding where I can
rewrite the DOM from scratch ('scratch' here being 4DOM), properly, to
match some approximation of 'HTML DOM as deployed'...]
John
Jul 18 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
jj*@pobox.com (John J. Lee) writes:
[...]
Anyway, the point of this post is that I'm looking for pages to test
it on, so if you have a page that you'd like scraped (one that uses
JavaScript in some non-trivial way, of course! -- for dynamically
modifying forms, setting cookies, or whatever), mail me the details:
better that than some randomly-selected site from the Internet.
Obviously, it should be something that doesn't violate any terms &
conditions of use or otherwise cause people trouble, and preferably
that doesn't require any signup.

[...]

Nobody?

I'll get my coat. ;-)
John
Jul 18 '05 #2

P: n/a
On Fri, 22 Aug 2003, Skip Montanaro wrote:
>> Anyway, the point of this post is that I'm looking for pages to test
>> it on, so if you have a page that you'd like scraped (one that uses
>> JavaScript in some non-trivial way, of course! ...


John> Nobody?

Sorry, I couldn't think of anything off the top of my head. In my own pages

[...]

Oh, I'm sure I'll have no trouble finding test cases -- I just thought
that, rather than some random sites that are of no use to anyone, there is
bound to be somebody out there who actually wanted to scrape a particular
page in the past, and had not bothered previously thanks to the
inconvenience of having to read & reproduce the effect of the JS code
(particularly code that messes about with forms). It would be nice to be
doing something useful at the same time as writing tests!

Of course, I already have those sites that gave rise to the 'itch' to do
this in the first place, but I'm sure there's lots of the browser object
model that they don't exercise...
John

Jul 18 '05 #3

P: n/a
On Fri, 22 Aug 2003, Skip Montanaro wrote:
>> Anyway, the point of this post is that I'm looking for pages to test
>> it on, so if you have a page that you'd like scraped (one that uses
>> JavaScript in some non-trivial way, of course! ...


John> Nobody?

Sorry, I couldn't think of anything off the top of my head. In my own pages

[...]

Oh, I'm sure I'll have no trouble finding test cases -- I just thought
that, rather than some random sites that are of no use to anyone, there is
bound to be somebody out there who actually wanted to scrape a particular
page in the past, and had not bothered previously thanks to the
inconvenience of having to read & reproduce the effect of the JS code
(particularly code that messes about with forms). It would be nice to be
doing something useful at the same time as writing tests!

Of course, I already have those sites that gave rise to the 'itch' to do
this in the first place, but I'm sure there's lots of the browser object
model that they don't exercise...
John

Jul 18 '05 #4

P: n/a
Anyway, the point of this post is that I'm looking for pages to test
it on, so if you have a page that you'd like scraped (one that uses
JavaScript in some non-trivial way, of course! ...


John> Nobody?

Sorry, I couldn't think of anything off the top of my head. In my own pages
I've only ever used JS in trivial ways. Aside from a calendar on the Mojam
search results pages, I don't think JS is used on our sites at all. Still,
you're welcome to try it out on something like

http://www.mojam.com/concerts/search...lue=greg+brown

Skip

Jul 18 '05 #5

P: n/a
John ...

I'm not sure what types of applications
you're looking for, but I have some JavaScript plots
that might be interesting to test ...

http://fastq.com/~sckitching/JS/Circle_MH.htm

http://fastq.com/~sckitching/JS/DD_Circles.htm

http://fastq.com/~sckitching/JS/Parabola.htm

--
Cousin Stanley
Human Being
Phoenix, Arizona
Jul 18 '05 #6

P: n/a
"Cousin Stanley" <Co***********@hotmail.com> writes:
I'm not sure what types of applications
you're looking for,
The kind that people actually want to use <wink>.

As I said, there's no problem finding test cases, I just thought that
while I was about this, somebody might happen be reading who was
actually trying to scrape a JS page.

but I have some JavaScript plots
that might be interesting to test ...

http://fastq.com/~sckitching/JS/Circle_MH.htm

[...]

Konqueror 3.1 didn't show anything, Mozilla 1.4 printed some pretty
circles, then froze!
John
Jul 18 '05 #7

P: n/a
John ...

Although it's been a while since I tested these scripts
I thought I remembered testing successfully in both
Mozilla 0.95 and IE 5.1 at the time ...

I tested this morning using Moz 1.3.1 and 2 out of 3 failed,
but all 3 worked in IE 6 ...

The JS used in these scripts, although a bit hackish,
doesn't use any particular IE magic ...

I zipped up all 3 scripts for convenience,
if you want to look at the sources ...

http://fastq.com/~sckitching/JS/JS_Plots.zip

Differences in JS/DOM implementations from browser to browser
hurt my head and seem to be an endless source of problems
for web developers ...

--
Cousin Stanley
Human Being
Phoenix, Arizona
Jul 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.