By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,427 Members | 1,354 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,427 IT Pros & Developers. It's quick & easy.

Example Script to parse web page links and extract data?

P: n/a
I'm hoping someone knows of an example script I can see to help me build
mine.

I'm looking for an easy way to automate the below web site browsing and pull
the data I'm searching for.
Here's steps it needs to accomplish...

1) login to the site (windows dialog when hitting web page) *optional*

2) Choose menu link from ASP page (script shows/hides menu items depending
on mouseover) *optional*

3) Basic Search Form and enter zip code or city to pull all the data.

4) After search, table shows many links (hundreds sometimes) to the actual
data I need.
Links are this format... <a href="javascript:GetAgent('AA059')">

5) Each link opens new window with table providing required data.
The URLs that each href opens is this...
http://armls.marketlinx.com/Roster/S...sp?PubID=AA059 where the
PubID is record I need.

Table format looks like this:

<tr>

<td bgcolor="#C0C0C0" align="center">

<a href="javascript:GetAgent('MA142')">

<font face="Arial" size="2">6</font></a></td>

<td><font face="Arial" size="2">

<a href="javascript:GetAgent('MA142')">Alaze</a><br></font></td>

<td><font face="Arial" size="2">Mark <br></font></td>
<td><font face="Arial" size="2">MA142</font><br>

</td>

<td><font face="Arial" size="2">

<a href="javascript:GetBroker('COLD56')">Banker Success
Realty</a><br></font></td>

<td>COLD56</td>

<td><font face="Arial" size="2"><script LANGUAGE="javascript">

<!--

writePhoneNumber('480-999-9999');

//--></script></td>

</tr>

Sep 14 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
livin wrote:
I'm looking for an easy way to automate the below web site browsing and pull
the data I'm searching for.
This is a task that BeautifulSoup[1] is usually good for.
4) After search, table shows many links (hundreds sometimes) to the actual
data I need.
Links are this format... <a href="javascript:GetAgent('AA059')">

5) Each link opens new window with table providing required data.
The URLs that each href opens is this...
http://armls.marketlinx.com/Roster/S...sp?PubID=AA059 where the
PubID is record I need.


I'm not entirely sure I got your problem description right, but I think
points 4 and 5 would look something like:

base_url = 'http://armls.marketlinx.com/.../Member.asp?PubID=AA059'
html = urllib.urlopen(base_url).read()
soup = BeautifulSoup.BeautifulSoup(html)

link_matcher = re.compile(r'javascript:GetAgent('[^']*')
for link_elem in soup('a', {'href': link_matcher}):
...

HTH,

STeVe

[1] http://www.crummy.com/software/BeautifulSoup/
Sep 14 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.