By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,281 Members | 2,268 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,281 IT Pros & Developers. It's quick & easy.

How to apply text changes to HTML, keeping it intact if inside "a" tags

P: n/a
Hello,

I have HTML input to which I apply some changes.

Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.

The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).

As a test case, suppose I want to uppercase all the text
except the text that is within "a href" tags:

ExampleString = """
<footag>Lorem Ipsum</footagis simply
dummy text of <a href="junk.html">the printing</aand
<a href="junk2.html">typesetting <b>industry</b>.</a>
Thanks."""

When applying the text transform, I want to obtain:

<footag>LOREM IPSUM</footagIS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</aAND
<a href="junk2.html">typesetting <b>industry</b>.</a>
THANKS."""
Feature 2:
========
Another thing I may want to do: If the text I would normally
transform is inside an "a href" tag, then do not transform it,
but insert the result of text transformation just after the "</a>".

Using the same example as input, application of
this feature2 would give something like that:

<footag>LOREM IPSUM</footagIS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a><feat2>THE
PRINTING</feat2AND
<a href="junk2.html">typesetting
<b>industry</b>.</a><feat2>TYPESETTING <b>INDUSTRY</b>.</feat2>
THANKS."""

========
Thanks for your help

Sep 27 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a
vb******@gmail.com wrote:
Hello,

I have HTML input to which I apply some changes.

Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.

The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).
<snip/>

Use the BeautifulSoup + XSL. Writing your two features in xsl is close to a
no-brainer, and it is certainly the best tool for the job.

And there are a few implementations for python available.

Diez
Sep 27 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.