473,395 Members | 2,783 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Remove HTML tags (except anchor tag) from a string using regularexpressions

Hello,

I want to remove all html tags from a string "content" except <a
....>xxx</a>.

My script reads like this:

###
import re
content = re.sub('<([^!>]([^>]|\n)*)>', '', content)
###

It works fine. It removes all html tags from "content".
Unfortunately, this also removes <a ...>xxx</a> occurancies.
Any idea, how to modify this to remove all html tags except <a ...>xxx</a>?

Thanks in advance,
Nico
Jul 18 '05 #1
4 4555
How about...

import re
content = re.sub('<([^!(a>)]([^(/a>)]|\n)*)>', '', content)
Seems to work for me.

HTH

-Anand

Jul 18 '05 #2
I meant
content = re.sub ('<[^!(a>)]([^>]|\n)*[^!(/a)]>', '', content)

Sorry for the mistake.
However this seems to also print tags like <b>, <p> etc
also.

-Anand

Jul 18 '05 #3
Nico Grubert wrote:

If it's not to learn, and you simply want it to work, try out this library:

http://zope.org/Members/chrisw/StripOGram/readme
--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Jul 18 '05 #4

Max M wrote:
If it's not to learn, and you simply want it to work, try out this
library:

http://zope.org/Members/chrisw/StripOGram/readme

stripogram.html2safehtml('''first > last''',valid_tags=('i','a','br')) 'first > last' stripogram.html2safehtml('''first < last''',valid_tags=('i','a','br'))

'first first '
keeping in mind that bare ">" and "<" are invalid HTML (should be &gt;
and &lt;), why'd it leave the greater than and why are there two "first"'s ?
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: jjliu | last post by:
Could someone tell me how to remove all html tags (and anything inside tags) by perl. Some people suggested me to use HTML::TagFilter but i could not find window version. Thanks very much for your...
6
by: Christoph Söllner | last post by:
Hi *, is there a html parser available, which could i.e. extract all links from a given text like that: """ <a href="foo.php?param1=test">BAR<img src="none.gif"></a> <a...
1
by: Todd | last post by:
I am trying to mimic the html anchor tag. I have created a class; Public Class anchor <XmlAttributeAttribute()> _ Public href As String Public title As String End Class and get the...
3
by: Ori | last post by:
Hi , I'm working with C#.NET and I'm looking for the following. I have a web page content and I want to pull all the text which appear in the page without all the HTML tags. I know that there...
3
by: dave | last post by:
I want to be able to have my webpage to continually refresh automatically every so many seconds. When it refreshes, I want it to refresh to the bottom of the screen. To accomplish this, I am...
2
by: Jared | last post by:
Hi I have been trying to find an easy way to strip the hyperlinks out of an html string to make exporting ASP.NET GridView's to Excel a bit more user friendly. Couldn't find anything in these...
3
by: ad | last post by:
I have a string , it is make up of html tag and some text, like: <font color=red>Town </font></strong<strong>... How can I remove the html tag form this string with C#
15
by: Francach | last post by:
Hi, I'm trying to use the Beautiful Soup package to parse through the "bookmarks.html" file which Firefox exports all your bookmarks into. I've been struggling with the documentation trying to...
7
by: Xah Lee | last post by:
Summary: when encountering ex as a unit in css, FireFox (and iCab) did not take into account the font-family. Detail: http://xahlee.org/js/ff_pre_ex.html Xah xah@xahlee.org ∑...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.