473,659 Members | 2,662 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Looking for help with Regular Expression

Hi,

I'm looking for a little advice about regular expressions. I want to
capture a string of text that falls between an opening squre bracket
and a closing square bracket (e.g., "[" and "]") but I've run into a
small problem.

I've been using this: '''\[(.*?)\]''' as my pattern. I was expecting
this to be greedy but the funny thing is that it's not greedy enough in
some situations.

Here's my problem: The end of my string sometimes contains a cross
reference to a section in a book and the subsections are cited using
square brackets exactly like the one I'm using as the ending point in
my original regular expression.

E.g., the text string in my data looks like this: <core:emph
typestyle="it"> see</core:emph> discussion in
&#xa7;&#x2002;5 12.16[3][b]]

But my regular expression is stopping after the first "]" so after I
add the new markup the output looks like this:

<core:emph typestyle="it"> see</core:emph> discussion in
&#xa7;&#x2002;5 12.16[3]</fn:note>[b]]

So the last subsection is outside of the note tag. I want something
like this:

<core:emph typestyle="it"> see</core:emph> discussion in
&#xa7;&#x2002;5 12.16[3][b]]</fn:note>

I'm not sure how to make my capture more greedy so I've resorted to
cleaning up the data after I make the first round of replacements:

data = re.sub(r'''\[(\d*?)\]</fn:note>\[(\w)\]\]''',
'''[\1][\2]]</fn:note>''', data)

There's got to be a better way but I'm not sure what it is.

Thanks,

Greg

May 24 '06 #1
3 1289
ProvoWallis wrote:
Hi,

I'm looking for a little advice about regular expressions. I want to
capture a string of text that falls between an opening squre bracket
and a closing square bracket (e.g., "[" and "]") but I've run into a
small problem.

I've been using this: '''\[(.*?)\]''' as my pattern. I was expecting
this to be greedy but the funny thing is that it's not greedy enough in
some situations.

Here's my problem: The end of my string sometimes contains a cross
reference to a section in a book and the subsections are cited using
square brackets exactly like the one I'm using as the ending point in
my original regular expression.

E.g., the text string in my data looks like this: <core:emph
typestyle="it"> see</core:emph> discussion in
&#xa7;&#x2002;5 12.16[3][b]]

But my regular expression is stopping after the first "]" so after I
add the new markup the output looks like this:

<core:emph typestyle="it"> see</core:emph> discussion in
&#xa7;&#x2002;5 12.16[3]</fn:note>[b]]

So the last subsection is outside of the note tag. I want something
like this:

<core:emph typestyle="it"> see</core:emph> discussion in
&#xa7;&#x2002;5 12.16[3][b]]</fn:note>

I'm not sure how to make my capture more greedy so I've resorted to
cleaning up the data after I make the first round of replacements:

data = re.sub(r'''\[(\d*?)\]</fn:note>\[(\w)\]\]''',
'''[\1][\2]]</fn:note>''', data)

There's got to be a better way but I'm not sure what it is.


I do: Pyparsing.

from pyparsing import *
crossref = Suppress("[") + Word(alphanums, exact=1) + Suppress("]")
footnote = (
Suppress("[") + SkipTo(crossref ) +
ZeroOrMore(cros sref) + Suppress("]")
)

footnote.parseS tring("[&#xa7;&#x2002;5 12.16[3][b]]").asList()

py> footnote.parseS tring("[&#xa7;&#x2002;5 12.16[3][b]]").asList()
['&#xa7;&#x2002; 512.16', '3', 'b']

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
May 24 '06 #2
'''\[(.*?)\]'''
?-> when this char after(*, +, ?, {n}, {n,}, {n,m}), the match pattern
is not greedy

e.g.1
String: 512.16[3][b]]
Pattern:'''\[(.*)\]'''
This will match "[3][b]]"

e.g.2
String: 512.16[3][b]]
Pattern:'''\[(.*)?\]'''
This will match "[3]" and "[b]"

May 24 '06 #3
Seem to be a lot of regular expression questions lately. There is a
neat little RE demonstrator buried down in
Python24/Tools/Scripts/redemo.py, which makes it easy to experiment
with regular expressions and immediately see the effect of changes. It
would be helpful if it were mentioned in the RE documentation, although
I can understand why one might not want a language reference to deal
with informally-supported tools.

May 24 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
3690
by: Steve | last post by:
Hello, I am writing a script that calls a URL and reads the resulting HTML into a function that strips out everthing and returns ONLY the links, this is so that I can build a link index of various pages. I have been programming in PHP for over 2 years now and have never encountered a problem like the one I am having now. To me this seems like it should be just about the simplest thing in the world, but I must admit I'm stumped BIG TIME!...
4
5149
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following testMyColumn{1, 100}Test Basically I want the regular expression to add the word test infront of the
3
2292
by: Joe | last post by:
Hi, I have been using a regular expression that I don’t uite understand to filter the valid email address. My regular expression is as follows: <asp:RegularExpressionValidator id="valValidEmail" runat="server" ControlToValidate="txtEmail" ValidationExpression="^(+)(\.+)*@(+)(\.+)*(\.{2,4})$"
5
1736
by: John | last post by:
I am new in Regular Expression. Could someone please help me in following expression? 1. the string cannot be empty 2. the string can only contains AlphaNumeric characters. No space or any special characters are allowed 3. space characters at the end of string is ok 4. the string cannot contains only numeric characters, in other word, the string must contains a least one alpha character Thanks for the help
1
3705
by: Rahul | last post by:
Hi Everybody I have some problem in my script. please help me. This is script file. I have one *.inq file. I want run this script in XML files. But this script errors shows . If u want i am attach this script files and inq files. I cant understand this error. Please suggest me. You can talk with my yahoo id b_sahoo1@yahoo.com. Now i am online. Plz....Plz..Plz...
3
2562
by: Zach | last post by:
Hello, Please forgive if this is not the most appropriate newsgroup for this question. Unfortunately I didn't find a newsgroup specific to regular expressions. I have the following regular expression. ^(.+?) uses (?!a spoon)\.$
6
2229
by: deepak_kamath_n | last post by:
Hello, I am relatively new to the world of regex and require some help in forming a regular expression to achieve the following: I have an input stream similar to: Slot: slot1 Description: this is a description Slot: slot2
14
2259
by: Chris | last post by:
I need a pattern that matches a string that has the same number of '(' as ')': findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = Can anybody help me out? Thanks for any help!
3
1828
by: Mr.Steskal | last post by:
Posted: Wed Jul 11, 2007 7:01 am Post subject: Regular Expression Help -------------------------------------------------------------------------------- I need help writing a regular expression that only returns part of a string. For Example I have a multi-line text fragment like below:
0
8427
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8850
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8626
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7355
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5649
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4334
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2749
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1975
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1737
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.