473,386 Members | 1,720 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

regexp substitution - a lot of work!

Hi Python crazies!:))
There is a problem to be solved. I have a text and I have to parse
it using a lot of regular expressions. In (lin)u(ni)x I could write in
bash:

cat file | sed 's/../../' | sed 's/../../' .. .. .. > parsed_file

I write a parser in python and what I must do is:

regexp = re.compile(..)
result = regexp.search(data)
while result:
data = data[:result.start()] + .. result.group(..) + \
data[result.end():]
result = regexp.search(data)

... for one regexp substitution

instead of just: s/../../

That is quite a lot of work! Don't you know some better and easier way?
Thanks in advance,

---------------------------------------_.)--
| Lukas Holcik (xh******@fi.muni.cz) (\=)*
----------------------------------------''--
Jul 18 '05 #1
6 1758
Lukas Holcik <xh******@fi.muni.cz> writes:
Hi Python crazies!:))
There is a problem to be solved. I have a text and I have to parse
it using a lot of regular expressions. In (lin)u(ni)x I could write in
bash: cat file | sed 's/../../' | sed 's/../../' .. .. .. > parsed_file
In Unix you would actually do:

$ sed 's/pat1/rep1/ s/pat2/rep2/ ...' <infile >outfile

to do the replacements in one pass. (you will now anyway :)
I write a parser in python and what I must do is: regexp = re.compile(..)
result = regexp.search(data)
while result:
data = data[:result.start()] + .. result.group(..) + \
data[result.end():]
result = regexp.search(data) ... for one regexp substitution instead of just: s/../../
That is quite a lot of work! Don't you know some better and easier way?
Thanks in advance,


http://aspn.activestate.com/ASPN/Coo...n/Recipe/81330

Eddie
Jul 18 '05 #2
Lukas Holcik wrote:
That is quite a lot of work! Don't you know some better and easier way?
Thanks in advance,
Why not use re.sub?
regexp = re.compile(..)
result = regexp.search(data)
while result:
data = data[:result.start()] + .. result.group(..) + \
data[result.end():]
result = regexp.search(data)

... for one regexp substitution

regexp = re.compile(a_pattern)
result = regexp.sub(a_replacement, data)

Or use a callback if you need to modify the match:

def add_one(match):
return match.group(1) + '1'

regexp = re.compile(a_pattern)
result = regexp.sub(add_one, data)
Jul 18 '05 #3
Yes, sorry, I was in such a hurry I didn't found it in the documentation,
but when I want to use a lot of very different expressions using a lot of
different grouping, which would be easy to implement using s/(..)/x\1x/
then it is quite problematic having to use re.sub(), isn't it?

for example these perlish expressions:
's/^<i>.*?</i>\s*//'
's/<b>\s*</b><br>.*//s'
's/^(?:<[^>]>)?\t(.*?)<br>/<p>\1</p>/'
's/\t+| {2,}/ /'
's/(<[^/][^>]*>)([^\n])/\1\n\1/'
's/(?<!\n)(</[^>]*>)/\n\1/'
's/ /\n/'
couldn't it be easier to call external perl (or sed) ? Thanks,

---------------------------------------_.)--
| Lukas Holcik (xh******@fi.muni.cz) (\=)*
----------------------------------------''--

On Wed, 16 Jun 2004, Tuure Laurinolli wrote:
Lukas Holcik wrote:
That is quite a lot of work! Don't you know some better and easier way?
Thanks in advance,


Why not use re.sub?
regexp = re.compile(..)
result = regexp.search(data)
while result:
data = data[:result.start()] + .. result.group(..) + \
data[result.end():]
result = regexp.search(data)

... for one regexp substitution

regexp = re.compile(a_pattern)
result = regexp.sub(a_replacement, data)

Or use a callback if you need to modify the match:

def add_one(match):
return match.group(1) + '1'

regexp = re.compile(a_pattern)
result = regexp.sub(add_one, data)

Jul 18 '05 #4
Lukas Holcik <xh******@fi.muni.cz> wrote in
news:Pi*******************************@nymfe30.fi. muni.cz:
Yes, sorry, I was in such a hurry I didn't found it in the
documentation, but when I want to use a lot of very different
expressions using a lot of different grouping, which would be easy to
implement using s/(..)/x\1x/ then it is quite problematic having to
use re.sub(), isn't it?


I don't understand your point. The Python equivalent is:

re.sub('(..)', r'x\1x', s)

or using a precompiled pattern:

pat.sub(r'x\1x', s)
Jul 18 '05 #5
Is it really that easy? Now I see Python is simply the best:)))!

I just didn't know, how to use groups in a different way than
MatchObject.group(..). You already answered that, thanks!:)

---------------------------------------_.)--
| Lukas Holcik (xh******@fi.muni.cz) (\=)*
----------------------------------------''--

On Thu, 17 Jun 2004, Duncan Booth wrote:
Lukas Holcik <xh******@fi.muni.cz> wrote in
news:Pi*******************************@nymfe30.fi. muni.cz:
Yes, sorry, I was in such a hurry I didn't found it in the
documentation, but when I want to use a lot of very different
expressions using a lot of different grouping, which would be easy to
implement using s/(..)/x\1x/ then it is quite problematic having to
use re.sub(), isn't it?


I don't understand your point. The Python equivalent is:

re.sub('(..)', r'x\1x', s)

or using a precompiled pattern:

pat.sub(r'x\1x', s)

Jul 18 '05 #6
Lukas Holcik quoted someone writing:
I don't understand your point. The Python equivalent is:

re.sub('(..)', r'x\1x', s)

or using a precompiled pattern:

pat.sub(r'x\1x', s)


footnote: you can use a callback instead of the replacement pattern.
callbacks are often faster, and can lead more readable code:

http://effbot.org/zone/re-sub.htm#callbacks

(as the other examples on that page show, you can do a lot of weird
stuff with re.sub callbacks...)

</F>


Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: McKirahan | last post by:
How would I use a regular expression to remove all trailing Carriage Returns and Line Feeds (%0D%0A) from a textarea's value? Thanks in advance. Also, are they any great references for learning...
4
by: Ian | last post by:
Hi, Hopefully a simple question but my brain is hurting... I want to make a regex substitution, using search and replace patterns contained in variables. What I want to do is: $f =...
1
by: Florian Schulze | last post by:
See the following results: Python 2.3.5 (#62, Feb 8 2005, 16:23:02) on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> s = "1" >>>...
6
by: Pat | last post by:
I have a regexp in Perl that converts the last digit of an ip address to '9'. This is a very particular case so I don't want to go off on a tangent of IP octets. ( my $s = $str ) =~...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.