473,503 Members | 1,666 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

re pattern for matching JS/CSS

I'm working on a program to remove tags from a HTML document, leaving
just the content, but I want to do it simply. I've finished a system
to remove simple tags, but I want all CSS and JS to be removed. What
re pattern could I use to do that?

I've tried
'<script[\S\s]*/script>'
but that didn't work properly. I'm fairly basic in my knowledge of
Python, so I'm still trying to learn re.
What pattern would work?

Dec 15 '06 #1
2 1092
ina

i80and wrote:
I'm working on a program to remove tags from a HTML document, leaving
just the content, but I want to do it simply. I've finished a system
to remove simple tags, but I want all CSS and JS to be removed. What
re pattern could I use to do that?

I've tried
'<script[\S\s]*/script>'
but that didn't work properly. I'm fairly basic in my knowledge of
Python, so I'm still trying to learn re.
What pattern would work?
I use re.compile("<script.*?</script>",re.DOTALL)
for scripts. I strip this out first since my tag stripping re will
strip out script tags as well hope this was of help.

Dec 15 '06 #2
>I've tried
>'<script[\S\s]*/script>'
but that didn't work properly. I'm fairly basic in my knowledge of
Python, so I'm still trying to learn re.
What pattern would work?

I use re.compile("<script.*?</script>",re.DOTALL)
for scripts. I strip this out first since my tag stripping re will
strip out script tags as well hope this was of help.
This won't catch various alterations of

<
script
>
doEvil()
<
/
script
>
which is valid html/xhtml.

For less valid html, but still attemptable, one might find
something like

<scrip<script>hah</script>t>doEvil()</script>

which, if you nuke your pattern, leaves the valid but unwanted

<script>doEvil()</script>

I'd propose that it's better to use something such as
BeautifulSoup that actually parses the HTML, and then skim
through it whitelisting the tags you plan to allow, and skipping
the emission of any tags that don't make the whitelist.

-tkc


Dec 15 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
6968
by: gsv2com | last post by:
One of my weaknesses has always been pattern matching. Something I definitely need to study up on and maybe you guys can give me a pointer here. I'm looking to remove all of this code and just...
176
7997
by: Thomas Reichelt | last post by:
Moin, short question: is there any language combining the syntax, flexibility and great programming experience of Python with static typing? Is there a project to add static typing to Python? ...
9
3185
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # Matching string patterns # # Sometimes you want to know if a string is of # particular pattern. Let's say in your website # you have converted all images...
1
2719
by: Henry | last post by:
I have a table that stores a list of zip codes using a varchar column type, and I need to perform some string prefix pattern matching search. Let's say that I have the columns: 94000-1235 94001...
10
4956
by: bpontius | last post by:
The GES Algorithm A Surprisingly Simple Algorithm for Parallel Pattern Matching "Partially because the best algorithms presented in the literature are difficult to understand and to implement,...
5
5733
by: olaufr | last post by:
Hi, I'd need to perform simple pattern matching within a string using a list of possible patterns. For example, I want to know if the substring starting at position n matches any of the string I...
9
5049
by: Jim Lewis | last post by:
Anyone have experience with string pattern matching? I need a fast way to match variables to strings. Example: string - variables ============ abcaaab - xyz abca - xy eeabcac - vxw x...
2
3372
by: Ole Nielsby | last post by:
First, bear with my xpost. This goes to comp.lang.c++ comp.lang.functional with follow-up to comp.lang.c++ - I want to discuss an aspect of using C++ to implement a functional language, and...
1
2751
by: VanKha | last post by:
I write this program for pattern-matching,but it gives wrong result: #include<iostream> #include<conio.h> #include<string.h> using namespace std; main() { char text,pat;...
5
5029
by: pramodkh | last post by:
Hi All I am trying to match a pattern in a file and insert a line. If the pattern matches then insert a line before the matching pattern line. for example, I have the following content in a...
0
7203
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7087
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7281
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7334
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
5014
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4675
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3168
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
1
737
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
383
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.