473,511 Members | 9,908 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Example Script to parse web page links and extract data?

I'm hoping someone knows of an example script I can see to help me build
mine.

I'm looking for an easy way to automate the below web site browsing and pull
the data I'm searching for.
Here's steps it needs to accomplish...

1) login to the site (windows dialog when hitting web page) *optional*

2) Choose menu link from ASP page (script shows/hides menu items depending
on mouseover) *optional*

3) Basic Search Form and enter zip code or city to pull all the data.

4) After search, table shows many links (hundreds sometimes) to the actual
data I need.
Links are this format... <a href="javascript:GetAgent('AA059')">

5) Each link opens new window with table providing required data.
The URLs that each href opens is this...
http://armls.marketlinx.com/Roster/S...sp?PubID=AA059 where the
PubID is record I need.

Table format looks like this:

<tr>

<td bgcolor="#C0C0C0" align="center">

<a href="javascript:GetAgent('MA142')">

<font face="Arial" size="2">6</font></a></td>

<td><font face="Arial" size="2">

<a href="javascript:GetAgent('MA142')">Alaze</a><br></font></td>

<td><font face="Arial" size="2">Mark <br></font></td>
<td><font face="Arial" size="2">MA142</font><br>

</td>

<td><font face="Arial" size="2">

<a href="javascript:GetBroker('COLD56')">Banker Success
Realty</a><br></font></td>

<td>COLD56</td>

<td><font face="Arial" size="2"><script LANGUAGE="javascript">

<!--

writePhoneNumber('480-999-9999');

//--></script></td>

</tr>

Sep 14 '05 #1
1 8834
livin wrote:
I'm looking for an easy way to automate the below web site browsing and pull
the data I'm searching for.
This is a task that BeautifulSoup[1] is usually good for.
4) After search, table shows many links (hundreds sometimes) to the actual
data I need.
Links are this format... <a href="javascript:GetAgent('AA059')">

5) Each link opens new window with table providing required data.
The URLs that each href opens is this...
http://armls.marketlinx.com/Roster/S...sp?PubID=AA059 where the
PubID is record I need.


I'm not entirely sure I got your problem description right, but I think
points 4 and 5 would look something like:

base_url = 'http://armls.marketlinx.com/.../Member.asp?PubID=AA059'
html = urllib.urlopen(base_url).read()
soup = BeautifulSoup.BeautifulSoup(html)

link_matcher = re.compile(r'javascript:GetAgent('[^']*')
for link_elem in soup('a', {'href': link_matcher}):
...

HTH,

STeVe

[1] http://www.crummy.com/software/BeautifulSoup/
Sep 14 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
3075
by: Don | last post by:
I want the server-side php script to return a browser page that is essentially a copy of the original client page that contained the <form> which referenced the php script in the first place....
6
7704
by: chuck amadi | last post by:
Hi , Im trying to parse a specific users mailbox (testwwws) and output the body of the messages to a file ,that file will then be loaded into a PostGresql DB at some point . I have read the...
7
9263
by: Michael Foord | last post by:
#!/usr/bin/python -u # 15-09-04 # v1.0.0 # auth_example.py # A simple script manually demonstrating basic authentication. # Copyright Michael Foord # Free to use, modify and relicense. #...
6
6060
by: nate | last post by:
Hello, Does anyone know where I can find an ASP server side script written in JavaScript to parse text fields from a form method='POST' using enctype='multipart/form-data'? I'd also like it to...
2
4132
by: livin | last post by:
I'm hoping someone knows of an example script I can see to help me build mine. I'm looking for an easy way to automate the below web site browsing and pull the data I'm searching for. Here's...
7
7601
by: gorkos | last post by:
Hi, I am two days trying to solve a problem with some pages, which i get through HTTPWebRequest. Error is that some pages need Script to be enabled. But how to do this in HTTPWebRequest class?
1
6883
by: kidkurious | last post by:
I have a script that will read web file, extract the hyperlinks and sort them in alphabetical order. It works fine, but not the way I want. I want to change the script so that it will extract...
2
2945
by: bilaribilari | last post by:
Hi all, I am using Tidy (C) for parsing html pages. I encountered a page that has some script as follows: <script> .... var abc = "<script>some stuff here</" + "script>"; .... </script>
29
2871
by: gs | last post by:
let say I have to deal with various date format and I am give format string from one of the following dd/mm/yyyy mm/dd/yyyy dd/mmm/yyyy mmm/dd/yyyy dd/mm/yy mm/dd/yy dd/mmm/yy mmm/dd/yy
3
4494
by: GazK | last post by:
I have been using an xml parsing script to parse a number of rss feeds and return relevant results to a database. The script has worked well for a couple of years, despite having very crude...
0
7367
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7430
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7089
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7517
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
5072
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4743
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
1581
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
790
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
451
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.