re pattern for matching JS/CSS

i80and

I'm working on a program to remove tags from a HTML document, leaving
just the content, but I want to do it simply. I've finished a system
to remove simple tags, but I want all CSS and JS to be removed. What
re pattern could I use to do that?

I've tried
'<script[\S\s]*/script>'
but that didn't work properly. I'm fairly basic in my knowledge of
Python, so I'm still trying to learn re.
What pattern would work?

Dec 15 '06 #1

Subscribe Reply

1092

ina

i80and wrote:

I'm working on a program to remove tags from a HTML document, leaving
just the content, but I want to do it simply. I've finished a system
to remove simple tags, but I want all CSS and JS to be removed. What
re pattern could I use to do that?

I've tried
'<script[\S\s]*/script>'
but that didn't work properly. I'm fairly basic in my knowledge of
Python, so I'm still trying to learn re.
What pattern would work?

I use re.compile("<script.*?</script>",re.DOTALL)
for scripts. I strip this out first since my tag stripping re will
strip out script tags as well hope this was of help.

Dec 15 '06 #2

Tim Chase

>I've tried

>'<script[\S\s]*/script>'
but that didn't work properly. I'm fairly basic in my knowledge of
Python, so I'm still trying to learn re.
What pattern would work?

I use re.compile("<script.*?</script>",re.DOTALL)
for scripts. I strip this out first since my tag stripping re will
strip out script tags as well hope this was of help.

This won't catch various alterations of

<
script

>

doEvil()
<
/
script

>

which is valid html/xhtml.

For less valid html, but still attemptable, one might find
something like

<scrip<script>hah</script>t>doEvil()</script>

which, if you nuke your pattern, leaves the valid but unwanted

<script>doEvil()</script>

I'd propose that it's better to use something such as
BeautifulSoup that actually parses the HTML, and then skim
through it whitelisting the tags you plan to allow, and skipping
the emission of any tags that don't make the whitelist.

-tkc

Dec 15 '06 #3

Similar topics

6968

PHP Pattern Matching - Is there a better solution?

by: gsv2com | last post by:

One of my weaknesses has always been pattern matching. Something I definitely need to study up on and maybe you guys can give me a pointer here. I'm looking to remove all of this code and just...

by: Thomas Reichelt | last post by:

Moin, short question: is there any language combining the syntax, flexibility and great programming experience of Python with static typing? Is there a project to add static typing to Python? ...

Python

3185

[perl-python] string pattern matching

by: Xah Lee | last post by:

# -*- coding: utf-8 -*- # Python # Matching string patterns # # Sometimes you want to know if a string is of # particular pattern. Let's say in your website # you have converted all images...

Python

2719

Need help on simple pattern matching searching

by: Henry | last post by:

I have a table that stores a list of zip codes using a varchar column type, and I need to perform some string prefix pattern matching search. Let's say that I have the columns: 94000-1235 94001...

MySQL Database

4956

New object-oriented parallel pattern matching algorithm

by: bpontius | last post by:

The GES Algorithm A Surprisingly Simple Algorithm for Parallel Pattern Matching "Partially because the best algorithms presented in the literature are difficult to understand and to implement,...

C / C++

5733

Pattern matching with string and list

by: olaufr | last post by:

Hi, I'd need to perform simple pattern matching within a string using a list of possible patterns. For example, I want to know if the substring starting at position n matches any of the string I...

Python

5049

String pattern matching

by: Jim Lewis | last post by:

Anyone have experience with string pattern matching? I need a fast way to match variables to strings. Example: string - variables ============ abcaaab - xyz abca - xy eeabcac - vxw x...

Python

3372

Implementing fp pattern matching, using C++

by: Ole Nielsby | last post by:

First, bear with my xpost. This goes to comp.lang.c++ comp.lang.functional with follow-up to comp.lang.c++ - I want to discuss an aspect of using C++ to implement a functional language, and...

C / C++

2751

Pattern matching

by: VanKha | last post by:

I write this program for pattern-matching,but it gives wrong result: #include<iostream> #include<conio.h> #include<string.h> using namespace std; main() { char text,pat;...

C / C++

5029

Pattern matching and inserting a line in a file

by: pramodkh | last post by:

Hi All I am trying to match a pattern in a file and insert a line. If the pattern matches then insert a line before the matching pattern line. for example, I have the following content in a...

Perl

7203

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7087

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7281

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7334

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

5014

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

4675

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

3168

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

737

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

383

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General