473,408 Members | 1,809 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

local greediness ???

hi, all. I need to process a file with the following format:
$ cat sample
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
........

my goal here is for each line, extract every '(.*)' (including the
round
brackets, put them in a list, and extract every float on the same line
and put them in a list.. here is my code:

p = re.compile(r'\[.*\]$')
num = re.compile(r'[-\d]+[.\d]*')
brac = re.compile(r'\(.*?\)')

for line in ifp:
if p.match(line):
x = num.findall(line)
y = brac.findall(line)
print x, y len(x), len(y)

Now, this works for most of the lines. however, I'm having problems
with
lines such as line 3 above (in the sample file). here, (bbb\)) contains
an escaped
')' and the re I use will match it (because of the non-greedy '?'). But
I want this to
be ignored since it's escaped. is there a such thing as local
greediness??
Can anyone suggest a way to deal with this here..
thanks.

Apr 19 '06 #1
3 1198
On 19/04/2006 3:09 PM, ty****@gmail.com wrote:
hi, all. I need to process a file with the following format:
$ cat sample
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
.......

my goal here is for each line, extract every '(.*)' (including the
round
brackets, put them in a list, and extract every float on the same line
and put them in a list.. here is my code:

p = re.compile(r'\[.*\]$')
num = re.compile(r'[-\d]+[.\d]*')
brac = re.compile(r'\(.*?\)')

for line in ifp:
if p.match(line):
x = num.findall(line)
y = brac.findall(line)
print x, y len(x), len(y)

Now, this works for most of the lines. however, I'm having problems
with
lines such as line 3 above (in the sample file). here, (bbb\)) contains
an escaped
')' and the re I use will match it (because of the non-greedy '?'). But
I want this to
be ignored since it's escaped. is there a such thing as local
greediness??
Can anyone suggest a way to deal with this here..
thanks.


For a start, your brac pattern is better rewritten to avoid the
non-greedy ? tag: r'\([^)]*\)' -- this says the middle part is zero or
more occurrences of a single character that is not a ')'

To handle the pesky backslash-as-escape, we need to extend that to: zero
or more occurrences of either (a) a single character that is not a ')'
or (b) the two-character string r"\)". This gives us something like this:

#>>> brac = re.compile(r'\((?:\\\)|[^)])*\)')
#>>> tests = r"(xxx)123.4(bbb\))5.6(end\Zhere)7.8()9.0(\))1.2(a b\)cd)"
#>>> brac.findall(tests)
['(xxx)', '(bbb\\))', '(end\\Zhere)', '()', '(\\))', '(ab\\)cd)']
#>>>

Pretty, isn't it? Maybe better done with a hand-coded state machine.
Apr 19 '06 #2
How about using the numbers as delimiters:
pat = re.compile(r"[\d\.\-]+")
pat.split("[(some text)2.3(more text)4.5(more text here)]") ['[(some text)', '(more text)', '(more text here)]'] pat.findall("[(some text)2.3(more text)4.5(more text here)]") ['2.3', '4.5'] pat.split("[(xxx)11.0(bbb\))8.9(end here)] ") ['[(xxx)', '(bbb\\))', '(end here)] '] pat.findall("[(xxx)11.0(bbb\))8.9(end here)] ")
['11.0', '8.9']

ty****@gmail.com wrote: hi, all. I need to process a file with the following format:
$ cat sample
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
.......

my goal here is for each line, extract every '(.*)' (including the
round
brackets, put them in a list, and extract every float on the same line
and put them in a list.. here is my code:

p = re.compile(r'\[.*\]$')
num = re.compile(r'[-\d]+[.\d]*')
brac = re.compile(r'\(.*?\)')

for line in ifp:
if p.match(line):
x = num.findall(line)
y = brac.findall(line)
print x, y len(x), len(y)

Now, this works for most of the lines. however, I'm having problems
with
lines such as line 3 above (in the sample file). here, (bbb\)) contains
an escaped
')' and the re I use will match it (because of the non-greedy '?'). But
I want this to
be ignored since it's escaped. is there a such thing as local
greediness??
Can anyone suggest a way to deal with this here..
thanks.


Apr 19 '06 #3
<ty****@gmail.com> wrote in message
news:11*********************@v46g2000cwv.googlegro ups.com...
hi, all. I need to process a file with the following format:
$ cat sample
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
.......

my goal here is for each line, extract every '(.*)' (including the
round
brackets, put them in a list, and extract every float on the same line
and put them in a list..


Are you wedded to re's? Here's a pyparsing approach for your perusal. It
uses the new QuotedString class, treating your ()-enclosed elements as
custom quoted strings (including backslash escape support).

Some other things the parser does for you during parsing:
- converts the numeric strings to floats
- processes the \) escaped paren, returning just the )
Why not? While parsing, the parser "knows" it has just parsed a floating
point number (or an escaped character), go ahead and do the conversion too.

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net.)

--------------------
test = r"""
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
"""
from pyparsing import oneOf,Combine,Optional,Word,nums,QuotedString,Supp ress

# define a floating point number
sign = oneOf("+ -")
floatNum = Combine( Optional(sign) + Word(nums) + "." + Word(nums) )

# have parser convert to actual floats while parsing
floatNum.setParseAction(lambda s,l,t: float(t[0]))

# define a "quoted string" where ()'s are the opening and closing quotes
parenString = QuotedString("(",endQuoteChar=")",escChar="\\")

# define the overall entry structure
entry = Suppress("[") + parenString + floatNum + parenString + floatNum +
parenString + Suppress("]")

# scan for floats
for toks,start,end in floatNum.scanString(test):
print toks[0]
print

# scan for paren strings
for toks,start,end in parenString.scanString(test):
print toks[0]
print

# scan for entries
for toks,start,end in entry.scanString(test):
print toks
print
--------------------
Gives:
2.3
4.5
-1.2
12.0
11.0
8.9

some text
more text
more text here
aa bb ccc
kdk
xxxyyy
xxx
bbb)
end here

['some text', 2.2999999999999998, 'more text', 4.5, 'more text here']
['aa bb ccc', -1.2, 'kdk', 12.0, 'xxxyyy']
['xxx', 11.0, 'bbb)', 8.9000000000000004, 'end here']

Apr 19 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Steffen | last post by:
Hi! I'm trying to access a EntityBean from a servlet via the bean's local home interface. The EJB and the Servlet are together in one .ear file and I'm using JBoss 3.2.3. I think the...
1
by: jiing | last post by:
Now let me describe what I have done and my purpose: Originally, I want to user ports to install phpBB But I found that phpBB doesn't support mysql 5.x (but the ports installed mySQL 5.0.0...
0
by: Jan | last post by:
While building PHP 4.4.0 on a fresh installed FreeBSD 5.4 I get the following error: www# /bin/sh /usr/home/c/console/toolscon/web/2/php-4.4.0/libtool --silent --preserve-dup-deps...
9
by: Stefan Turalski \(stic\) | last post by:
Hi, I done sth like this: for(int i=0; i<10; i++) {...} and after this local declaration of i variable I try to inicialize int i=0;
23
by: Timothy Madden | last post by:
Hello all. I program C++ since a lot of time now and I still don't know this simple thing: what's the problem with local functions so they are not part of C++ ? There surely are many people...
6
by: Brad | last post by:
I have a win2003 server workstation with multiple webs, each web has it's own ip address. In VS2005, if I select to open an existing web site, select Local IIS, the dialog correctly displays a...
1
by: orp | last post by:
We've been struggling on how to determine if a local user is already in a local group. We have code (C#) that creates a local user, if the user doesn't already exist. And, we have code (C#) that...
55
by: Zytan | last post by:
I see that static is more restricted in C# than in C++. It appears usable only on classes and methods, and data members, but cannot be created within a method itself. Surely this is possible in...
28
by: cpluslearn | last post by:
Hi, I have a local class inside a function template. I am able to wrap any type in my local class. Is it legal C++? What type of class is Local? Is it a class template or regular class? Thanks...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.