473,396 Members | 1,853 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

python html

Hi, I am looking for something where I can go through
a html page and make change the url's for all the
links, images, href's, etc... easily. If anyone knows
of something, please let me know. Thanks.

-steve

__________________________________________________ __
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs

Aug 18 '05 #1
4 1314
Steve Young <dr**********@yahoo.com> writes:
Hi, I am looking for something where I can go through
a html page and make change the url's for all the
links, images, href's, etc... easily. If anyone knows
of something, please let me know. Thanks.


I've been doing a lot of that today. But the tool I'm using is sh and
sed, because what I'm doing is captured nicely by regular expressions
on the URLs. You might consider that option.

If you have well-formed HTML, you can use the HTMLParser module, and
write out the mangled data as it passed through your sublcass of the
HTMLParser class.

If the HTML isn't well-formed (which is probably true for most of the
stuff on the web), you need a more understanding parser. I'd look into
using BeatifulSoup for this, though Iv'e only used it to extract
information from web pages, not to modify them.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Aug 19 '05 #2

Steve Young wrote:
Hi, I am looking for something where I can go through
a html page and make change the url's for all the
links, images, href's, etc... easily. If anyone knows
of something, please let me know. Thanks.


BeautifulSoup or PyMeld

Lorenzo

Aug 19 '05 #3
I do exactly that in my Python CGI proxy (approx). I wrote a very
simple parser called scraper.py that makes it easy.

It won't choke on bad html either.

http://www.voidspace.org.uk/python/recipes.shtml

All the best,

Fuzzyman
http://www.voidspace.org.uk/python

Aug 19 '05 #4
try this

http://miex.tigris.org

i wrote this for checking bad html, correct them and optimize them

Aug 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.