473,407 Members | 2,320 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

How to apply text changes to HTML, keeping it intact if inside "a" tags

Hello,

I have HTML input to which I apply some changes.

Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.

The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).

As a test case, suppose I want to uppercase all the text
except the text that is within "a href" tags:

ExampleString = """
<footag>Lorem Ipsum</footagis simply
dummy text of <a href="junk.html">the printing</aand
<a href="junk2.html">typesetting <b>industry</b>.</a>
Thanks."""

When applying the text transform, I want to obtain:

<footag>LOREM IPSUM</footagIS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</aAND
<a href="junk2.html">typesetting <b>industry</b>.</a>
THANKS."""
Feature 2:
========
Another thing I may want to do: If the text I would normally
transform is inside an "a href" tag, then do not transform it,
but insert the result of text transformation just after the "</a>".

Using the same example as input, application of
this feature2 would give something like that:

<footag>LOREM IPSUM</footagIS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a><feat2>THE
PRINTING</feat2AND
<a href="junk2.html">typesetting
<b>industry</b>.</a><feat2>TYPESETTING <b>INDUSTRY</b>.</feat2>
THANKS."""

========
Thanks for your help

Sep 27 '06 #1
1 1232
vb******@gmail.com wrote:
Hello,

I have HTML input to which I apply some changes.

Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.

The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).
<snip/>

Use the BeautifulSoup + XSL. Writing your two features in xsl is close to a
no-brainer, and it is certainly the best tool for the job.

And there are a few implementations for python available.

Diez
Sep 27 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Fazer | last post by:
Hello, I was wondering what would be the easiest way to strip away HTML tags from a string? Or how would I remove everything between < and > also the < , > as well using regex? Thanks for...
3
by: jjliu | last post by:
Could someone tell me how to parse the inside of html tags by perl, such as <meta> </meta> <head> </head> <title> </title> ......... Thanks
5
by: jjliu | last post by:
Could someone tell me how to remove all html tags (and anything inside tags) by perl. Some people suggested me to use HTML::TagFilter but i could not find window version. Thanks very much for your...
2
by: A Hess | last post by:
Today, my wife was chatting with a friend and her friend was telling her about an irc script that she just "needed" and then directed her to this download page:...
4
by: James Geurts | last post by:
Hi all Can someone help me out with a regex to remove all html tags except for <p>,</p>,<br>,<br/> from a string Thank Jim
0
by: jay | last post by:
Hi all, Can someone help me out with a regex to remove all html tags, except for links, from a string. Thanks, -Jay
8
by: Randall Parker | last post by:
I want to generate an HTML tag that will look like: <a href="EquipmentServiceCreate.aspx?serial=X01">Create New Service For X01</a> or on a different instance where the user would be viewing...
5
by: PythonistL | last post by:
I have this script that scrolls the plain text. <script type="text/javascript"> var msg = 'My scrolling text. ..'; var myTimeout; function scrollMsg() {...
1
by: countocram | last post by:
I have big problem, I'm using preg_replace() function for my highlighter function, after searching for particular keyword, once the hightler check box is checked it will highlight the content that...
1
by: jamieg99 | last post by:
Hello, I've got an object that's being converted into SAXSource and then converted into a pdf with FOP. Some of the data however is in HTML format inside the xml tags and is being escaped (&gt;, etc)...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.