473,375 Members | 1,342 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,375 software developers and data experts.

regexp

Hello

I need to use some regular expressions for more than one line.
And i would like to use some modificators like: /m or /s in perl.
For example:
re.sub("<script.*>.*</script>","",data)

will not cut out all javascript code if it's spread on many lines.
I could use something like /s from perl which treats . as all signs
(including new line). How can i do that ?

Maybe there is other way to achieve the same results ?

Thanx
Dec 19 '06 #1
10 1766
vertigo wrote:
I need to use some regular expressions for more than one line.
And i would like to use some modificators like: /m or /s in perl.
For example:
re.sub("<script.*>.*</script>","",data)

will not cut out all javascript code if it's spread on many lines.
that won't cut out all javascript code period.

</F>

Dec 19 '06 #2
On Tuesday 19 December 2006 13:15, vertigo wrote:
Hello

I need to use some regular expressions for more than one line.
And i would like to use some modificators like: /m or /s in perl.
For example:
re.sub("<script.*>.*</script>","",data)

will not cut out all javascript code if it's spread on many lines.
I could use something like /s from perl which treats . as all signs
(including new line). How can i do that ?

Maybe there is other way to achieve the same results ?

Thanx
Take a look at Chapter 8 of 'Dive Into Python.'
http://diveintopython.org/toc/index.html

You can modify the code there and get the results that you need. Buy the book
if you can :) It has lots of neat examples.

- Jonathan Curran
Dec 19 '06 #3

vertigo wrote:
>I need to use some regular expressions for more than one line.
And i would like to use some modificators like: /m or /s in perl.
For example:
re.sub("<script.*>.*</script>","",data)
will not cut out all javascript code if it's spread on many lines.

that won't cut out all javascript code period.
do you have any idea what will do ?
i need to cut everything but the pure text data.

Thanx
Dec 19 '06 #4
Hello
On Tuesday 19 December 2006 13:15, vertigo wrote:
>Hello

I need to use some regular expressions for more than one line.
And i would like to use some modificators like: /m or /s in perl.
For example:
re.sub("<script.*>.*</script>","",data)

will not cut out all javascript code if it's spread on many lines.
I could use something like /s from perl which treats . as all signs
(including new line). How can i do that ?

Maybe there is other way to achieve the same results ?

Thanx

Take a look at Chapter 8 of 'Dive Into Python.'
http://diveintopython.org/toc/index.html
i read whole regexp chapter - but there was no solution for my problem.
Example:

re.sub("<!--.*-->","",htmldata)
would remove only comments which are in one line.
If comment is in many lines like this:
<!--start
of
commend, end-->

it would not work. It's because '.' sign does not matches '\n' sign.

Does anybody knows solution for this particular problem ?

Thanx
Dec 19 '06 #5
You want re.sub("(?s)<!--.*?-->", "", htmldata)

Explanation: To make the dot match all characters, including newlines,
you need to set the DOTALL flag. You can set the flag using the (?_)
syntax, which is explained in section 4.2.1 of the Python Library
Reference.

A more readable way to do this is:

obj = re.compile("<!--.*?-->", re.DOTALL)
re.sub("", htmldata)
On Dec 19, 3:59 pm, vertigo <s...@spam.plwrote:
Hello


On Tuesday 19 December 2006 13:15, vertigo wrote:
Hello
I need to use some regular expressions for more than one line.
And i would like to use some modificators like: /m or /s in perl.
For example:
re.sub("<script.*>.*</script>","",data)
will not cut out all javascript code if it's spread on many lines.
I could use something like /s from perl which treats . as all signs
(including new line). How can i do that ?
Maybe there is other way to achieve the same results ?
Thanx
Take a look at Chapter 8 of 'Dive Into Python.'
http://diveintopython.org/toc/index.htmli read whole regexp chapter - but there was no solution for my problem.
Example:

re.sub("<!--.*-->","",htmldata)
would remove only comments which are in one line.
If comment is in many lines like this:
<!--start
of
commend, end-->

it would not work. It's because '.' sign does not matches '\n' sign.

Does anybody knows solution for this particular problem ?

Thanx- Hide quoted text -- Show quoted text -
Dec 19 '06 #6
Oops, I mean obj.sub("", htmldata)

On Dec 19, 4:15 pm, johnzen...@gmail.com wrote:
You want re.sub("(?s)<!--.*?-->", "", htmldata)

Explanation: To make the dot match all characters, including newlines,
you need to set the DOTALL flag. You can set the flag using the (?_)
syntax, which is explained in section 4.2.1 of the Python Library
Reference.

A more readable way to do this is:

obj = re.compile("<!--.*?-->", re.DOTALL)
re.sub("", htmldata)

On Dec 19, 3:59 pm, vertigo <s...@spam.plwrote:
Hello
On Tuesday 19 December 2006 13:15, vertigo wrote:
>Hello
>I need to use some regular expressions for more than one line.
>And i would like to use some modificators like: /m or /s in perl.
>For example:
>re.sub("<script.*>.*</script>","",data)
>will not cut out all javascript code if it's spread on many lines.
>I could use something like /s from perl which treats . as all signs
>(including new line). How can i do that ?
>Maybe there is other way to achieve the same results ?
>Thanx
Take a look at Chapter 8 of 'Dive Into Python.'
>http://diveintopython.org/toc/index.htmliread whole regexp chapter - but there was no solution for my problem.
Example:
re.sub("<!--.*-->","",htmldata)
would remove only comments which are in one line.
If comment is in many lines like this:
<!--start
of
commend, end-->
it would not work. It's because '.' sign does not matches '\n' sign.
Does anybody knows solution for this particular problem ?
Thanx- Hide quoted text -- Show quoted text -- Hide quoted text -- Show quoted text -
Dec 19 '06 #7
Hello

Thanx for help, i have one more question:

i noticed that while matching regexp python tries to match as wide as it's
possible,
for example:
re.sub("<!--.*-->","",htmldata)
would cut out everything before first "<!--" and last "-->" in the
document.
Can i force re to math as narrow as possible ?
(to match first "<!--" with the first "-->" after the "<!--" and to repeat
this procedure while mentioned pattern is still found) ?

Thanx

Dec 19 '06 #8

vertigoi noticed that while matching regexp python tries to match as wide as it's
vertigopossible,
vertigofor example:
vertigore.sub("<!--.*-->","",htmldata)
vertigowould cut out everything before first "<!--" and last "-->" in the
vertigodocument.
vertigoCan i force re to math as narrow as possible ?

http://docs.python.org/lib/re-syntax.html

Search for "greedy".

Skip
Dec 19 '06 #9
On Tuesday 19 December 2006 15:32, Paul Arthur wrote:
On 2006-12-19, vertigo <sp**@spam.plwrote:
Hello
Take a look at Chapter 8 of 'Dive Into Python.'
http://diveintopython.org/toc/index.html
i read whole regexp chapter -

Did you read Chapter 8? Regexes are 7; 8 is about processing HTML.
Regexes are not well suited to this type of processing.
but there was no solution for my problem.
Example:

re.sub("<!--.*-->","",htmldata)
would remove only comments which are in one line.
If comment is in many lines like this:
<!--start
of
commend, end-->

it would not work. It's because '.' sign does not matches '\n' sign.

Does anybody knows solution for this particular problem ?

Yes. Use DOTALL mode.
Paul, I mentioned Chapter 8 so that the HTML processing section would be taken
a look at. What Vertigo wants can be done with relative ease with SGMLlib.

Anyway, if you (Vertigo) want to use regular expressions to do this, you can
try and use some regular expression testing programs. I'm not quite sure of
the name but there is one that comes with KDE.

- Jonathan Curran
Dec 20 '06 #10
Not just Python, but every Regex engine works this way. You want a ?
after your *, as in <--(.*?)--if you want it to catch the first
available "-->".

At this point in your adventure, you might be wondering whether regular
expressions are more trouble than they are worth. They are. There are
two libraries you need to take a look at, and soon: BeautifulSoup for
parsing HTML, and PyParsing for parsing everything else. Take the time
you were planning to spend on deciphering regexes like
"(\d{1,3}\.){3}\d{1,3}" and spend it learning the basics of those
libraries instead -- you will not regret it.

On Dec 19, 4:39 pm, vertigo <s...@spam.plwrote:
Hello

Thanx for help, i have one more question:

i noticed that while matching regexp python tries to match as wide as it's
possible,
for example:
re.sub("<!--.*-->","",htmldata)
would cut out everything before first "<!--" and last "-->" in the
document.
Can i force re to math as narrow as possible ?
(to match first "<!--" with the first "-->" after the "<!--" and to repeat
this procedure while mentioned pattern is still found) ?

Thanx
Dec 20 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Anand Pillai | last post by:
To search a word in a group of words, say a paragraph or a web page, would a string search or a regexp search be faster? The string search would of course be, if str.find(substr) != -1:...
5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
0
by: Chris Croughton | last post by:
I'm trying to use the EXSLT regexp package from http://www.exslt.org/regexp/functions/match/index.html (specifically the match function) with the libxml xltproc (which supports EXSLT), but...
4
by: Jon Maz | last post by:
Hi All, I want to strip the accents off characters in a string so that, for example, the (Spanish) word "práctico" comes out as "practico" - but ignoring case, so that "PRÁCTICO" comes out as...
8
by: Dmitry Korolyov | last post by:
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web server. A single-line asp:textbox control and regexp validator attached to it. ^\d+$ expression does match an empty...
26
by: Matt Kruse | last post by:
Are there any current browsers that have Javascript support, but not RegExp support? For example, cell phone browsers, blackberrys, or other "minimal" browsers? I know that someone using Netscape...
7
by: Csaba Gabor | last post by:
I need to come up with a function function regExpPos (text, re, parenNum) { ... } that will return the position within text of RegExp.$parenNum if there is a match, and -1 otherwise. For...
4
by: conan | last post by:
This regexp '<widget class=".*" id=".*">' works well with 'grep' for matching lines of the kind <widget class="GtkWindow" id="window1"> on a XML .glade file However that's not true for the...
6
by: runsun pan | last post by:
Hi I am wondering why I couldn't get what I want in the following 3 cases of re: (A) var p=/(+-?+):(+)/g p.exec("style='font-size:12'") -- // expected
4
by: Matt | last post by:
Hello all, I have just discovered (the long way) that using a RegExp object with the 'global' flag set produces inconsistent results when its test() method is executed. I realize that 'global'...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.