473,408 Members | 1,980 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

need help with re module

hello

i have that string "<html>hello</a>world<anytag>ok" and i want to
extract all the text , without html tags , the result should be some
thing like that : helloworldok

i have tried that :

from re import findall

chaine = """<html>hello</a>world<anytag>ok"""

print findall('[a-zA-z][^(<.*>)].+?[a-zA-Z]',chaine)
>>['html', 'hell', 'worl', 'anyt', 'ag>o']
the result is not correct ! what would be the correct regex to use ?

Jun 20 '07 #1
2 1002
On Jun 20, 9:58 am, linuxprog <linuxp...@gmail.comwrote:
hello

i have that string "<html>hello</a>world<anytag>ok" and i want to
extract all the text , without html tags , the result should be some
thing like that : helloworldok

i have tried that :

from re import findall

chaine = """<html>hello</a>world<anytag>ok"""

print findall('[a-zA-z][^(<.*>)].+?[a-zA-Z]',chaine)
>>['html', 'hell', 'worl', 'anyt', 'ag>o']

the result is not correct ! what would be the correct regex to use ?
This: [^(<.*>)] is a set that contains everything but the characters
"(","<",".","*",">" and ")". It most certainly doesn't do what you
want it to. Is it absolutely necessary that you use a regular
expression? There are a few HTML parsing libraries out there. The
easiest approach using re might be to do a search and replace on all
tags. Just replace the tags with nothing.

Matt

Jun 20 '07 #2
Here is an example:
>>s = "<html>Hello</a>world<anytag>ok"
matchtags = re.compile(r"<[^>]+>")
matchtags.findall(s)
['<html>', '</a>', '<anytag>']
>>matchtags.sub('',s)
'Helloworldok'

I probably shouldn't have shown you that. It may not work for all
HTML, and you should probably be looking at something like
BeautifulSoup.

Matt

Jun 20 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Dimension7 | last post by:
All, I am comparing to functions to see which is "better". In better, I mean more efficient, optimize, faster, etc. I have read other posts from other boards, but I'm not really sure of the...
4
by: G520 | last post by:
GD module – need basic info... Hi All Some stupid questions from a newbe :-) I want to create graphs from int arrays. I'm told I can use the GD module... I have RedHat 9, PHP4 and...
1
by: Marc | last post by:
Hello, I've fiddled with this for quite a while and thought I had the problem solved. I had a version that would successfully compile and run. But then I had to change the code to use a...
10
by: Jeff Wagner | last post by:
I am in the process of learning Python (obsessively so). I've been through a few tutorials and read a Python book that was lent to me. I am now trying to put what I've learned to use by rewriting...
1
by: Inyeol Lee | last post by:
I'm an OOP newbie, and needs help on subclassing from different module. I made a base module a.py which contains two classes C1 and C2; ## start of a.py class C1(object): def m(self):...
15
by: drdoubt | last post by:
using namespace std In my C++ program, even after applying , I need to use the std namespace with the scope resolution operator, like, std::cout, std::vector. This I found a little bit...
3
by: seberino | last post by:
At top of a module I have an integer like so... foo = 4 In a function in that module I know I need to do 'global foo' to get at the value 4. .... IIRC, for dictionaries you DO NOT have...
7
by: moondaddy | last post by:
I want to create a public enum that can be used throughout a project. I created an enum like this in a module: Public Enum ParentType Project = 0 Stage = 1 VIP = 2 Func = 3 Equipment = 4...
4
by: | last post by:
When I add a new module in the project explorer pane, the wizard inserts a Module1 scope, so any variables I will put there can be accessed with a qulification, e.g. dim a as integer will be...
16
by: didier.doussaud | last post by:
I have a stange side effect in my project : in my project I need to write "gobal" to use global symbol : .... import math .... def f() : global math # necessary ?????? else next line...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.