473,396 Members | 1,767 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Help extracting info from HTML source ..

Hello All.
I am learning Python, and have never worked with HTML. However, I would
like to write a simple script to audit my 100+ Netware servers via their web
portal.

I was reading Chapter 8 of Dive into Python, which deals with this topic.
In the web portal of the server, there is a section similar to this:

-- clients and <A
href="http://eugenia.blogsome.com/?s=ipkall">clever</aservices. <--

which I took from SlashDot, but what I'm talking about is using the word
'services' to represent the link to eugenia.blogsome.com.

What I'd like to do is save the two pieces of info relative to the server
name. Probably in a dictionary, such as server1[link] to the page on
eugenia.blogsome.com and server1[description] to 'services'.

I've used the example from Dive into Python to get the actual link in the
source of the HTML, but I don't know how to get the text that is the
hyperlink.

So in the portal, I've got a link 'Scheduled Server Reboot' going to say
/ScheduledTasks/ID000000003/ on Server1, using similar to above clipped HTML
source code.

Can someone please help me? Sure, I could manually go to each server, but I
wouldn't learn anything. I've learned some, but also have real deadlines,
so I eagerly hope for any assistance & instruction.

Thank you!
-Dave
Shelton, CT

Jan 26 '07 #1
2 1838
Hello Shelton,
I am learning Python, and have never worked with HTML. However, I would
like to write a simple script to audit my 100+ Netware servers via their web
portal.
Always use the right tool, BeautilfulSoup
(http://www.crummy.com/software/BeautifulSoup/) is best for web
scraping (IMO).

from urllib import urlopen
from BeautifulSoup import BeautifulSoup

html = urlopen("http://www.python.org").read()
soup = BeautifulSoup(html)
for link in soup("a"):
print link["href"], "-->", link.contents

HTH,
--
Miki
http://pythonwise.blogspot.com/

Jan 26 '07 #2
In article <11**********************@h3g2000cwc.googlegroups. com>,
"Miki" <mi*********@gmail.comwrote:
Hello Shelton,
I am learning Python, and have never worked with HTML. However, I would
like to write a simple script to audit my 100+ Netware servers via their web
portal.
Always use the right tool, BeautilfulSoup
(http://www.crummy.com/software/BeautifulSoup/) is best for web
scraping (IMO).

from urllib import urlopen
from BeautifulSoup import BeautifulSoup

html = urlopen("http://www.python.org").read()
soup = BeautifulSoup(html)
for link in soup("a"):
print link["href"], "-->", link.contents
Agreed. HTML scraping is really complicated once you get into it. It
might be interesting to write such a library just for your own
satisfaction, but if you want to get something done then use a module
that already written, like BeautifulSoup. Another module that will do
the same job but works differently (and more simply, IMO) is HTMLData by
Connelly Barnes:
http://oregonstate.edu/~barnesc/htmldata/

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
Jan 26 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Cognizance | last post by:
Hi gang, I'm an ASP developer by trade, but I've had to create client side scripts with JavaScript many times in the past. Simple things, like validating form elements and such. Now I've been...
2
by: Chris Millar | last post by:
Can anyone help me on converting this vb asp page to C#, thanks in advance. chris. <!DOCTYPE HTML PUBLIC "-//W3C//Dtd HTML 4.0 transitional//EN"> <%...
3
by: news | last post by:
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results...
9
by: jmchadha | last post by:
I have got the following html: "something in html ... etc.. city1... etc... <a class="font1" href="city1.html" onclick="etc."click for <b>info</bon city1 </a> ... some html. city1.. can repeat...
0
by: Jack Wu | last post by:
Hi I've spent a good majority of my day trying to figure out how to have PIL 1.1.5 working on my OSX 10.3.9_PPC machine. I'm still stuck and I have not gotten anywhere. Could somebody please...
1
by: atombee | last post by:
Hi- this is the project that will not end! (sure you've all been there). I had originally purchased a php/css nav bar for the client, but it was buggy as hell, so I decided to do in css, in which I...
1
by: terryspanky | last post by:
----------------------Below are all the codes don't have errors---- The only problem I have is when I Delete, I'ts not deleting the subject that I click. I want to use the above codes to modify the...
0
by: gunimpi | last post by:
http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted...
0
by: sgsiaokia | last post by:
I need help in extracting data from another source file using VBA. I have problems copying the extracted data and format into the required data format. And also, how do i delete the row that is not...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.