473,383 Members | 1,874 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

scrape url out of brackets?

any idea how to scrape a url out of a file? for instance if i want to
scrape out the href at the end which is "www.cnn.com" is there a way to
do it?

<tr class="rulesbody"><td width="183" class="rulesbody"><a
href="www.cnn.com">

Dec 25 '05 #1
4 1215
ho***********@gmail.com writes:
any idea how to scrape a url out of a file? for instance if i want to
scrape out the href at the end which is "www.cnn.com" is there a way to
do it?
<tr class="rulesbody"><td width="183" class="rulesbody"><a
href="www.cnn.com">


BeautifulSoup.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Dec 25 '05 #2
Regular Expressions are the most common way.
http://docs.python.org/lib/module-re.html

HTML parser is another
http://docs.python.org/lib/module-htmllib.html

Dec 25 '05 #3
so you recommend using some sort of for statement with the html parser
where i tell it to only parse stuff found in the <tr> tag for instance?

Ravi Teja wrote:
Regular Expressions are the most common way.
http://docs.python.org/lib/module-re.html

HTML parser is another
http://docs.python.org/lib/module-htmllib.html


Dec 31 '05 #4
so here is the syntax folks!!!

for anchor in soup.fetch('a', {'target': '_blank'}):
print anchor['href']

ho***********@gmail.com wrote:
so you recommend using some sort of for statement with the html parser
where i tell it to only parse stuff found in the <tr> tag for instance?

Ravi Teja wrote:
Regular Expressions are the most common way.
http://docs.python.org/lib/module-re.html

HTML parser is another
http://docs.python.org/lib/module-htmllib.html


Jan 2 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Jason Steeves | last post by:
I have one .aspx form that my users fill out and this then takes that information and populates a second .aspx form via session variables. I need to screen scrape the second .aspx form and e-mail...
73
by: PC Datasheet | last post by:
Does anyone know how to do a "screen scrape" to get data off a website and enter it into an Access table? Thanks for all help? Steve PC Datasheet
3
by: Ollie | last post by:
I know you can screen scrape a website using the System.Net.HttpWebResponse & System.Net.HttpWebRequest classes. But how do you screen scrape a secured website (https) that takes a username &...
2
by: Rob Lauer | last post by:
I have written two completely separate web applications that cannot talk directly to one another (applications "A" and "B"). Application "A" has a form that takes some input (radio buttons,...
5
by: crjunk | last post by:
I have a screen scrape page that allows the user to submit a url. When they hit submit, the page is returned back to them on my screen scrape page. Which computer actuall connects to the url to...
7
by: Swanand Mokashi | last post by:
Hi all -- I would like to create an application(call it Application "A") that I would like to mimic exactly as a form on a foreign system (Application "F"). Application "F" is on the web (so...
4
by: Nunzio | last post by:
I am trying to build an email address in PHP code using v5.1.2. All works well until I try to surround the email address with angle brackets. Every method I try causes the email address to...
1
by: nbomike | last post by:
Hello. I want to scrape pages from a site that generates pages from form inputs using this web app . However, the URL of the results page (the page I want to scrape) is masked and is always the same....
0
by: Hamayun Khan | last post by:
Hi I need to create site scrape tool(job scrap tool). Each site to be scraped needs to have a different job scrape configured as each site will be different. The job scrape tool will allow...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.