local greediness ???

tygerc

hi, all. I need to process a file with the following format:
$ cat sample
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
........

my goal here is for each line, extract every '(.*)' (including the
round
brackets, put them in a list, and extract every float on the same line
and put them in a list.. here is my code:

p = re.compile(r'\[.*\]$')
num = re.compile(r'[-\d]+[.\d]*')
brac = re.compile(r'$.*?$')

for line in ifp:
if p.match(line):
x = num.findall(line)
y = brac.findall(line)
print x, y len(x), len(y)

Now, this works for most of the lines. however, I'm having problems
with
lines such as line 3 above (in the sample file). here, (bbb\)) contains
an escaped
')' and the re I use will match it (because of the non-greedy '?'). But
I want this to
be ignored since it's escaped. is there a such thing as local
greediness??
Can anyone suggest a way to deal with this here..
thanks.

Apr 19 '06 #1

Subscribe Post Reply

1198

John Machin

On 19/04/2006 3:09 PM, ty****@gmail.com wrote:

hi, all. I need to process a file with the following format:
$ cat sample
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
.......

my goal here is for each line, extract every '(.*)' (including the
round
brackets, put them in a list, and extract every float on the same line
and put them in a list.. here is my code:

p = re.compile(r'\[.*\]$')
num = re.compile(r'[-\d]+[.\d]*')
brac = re.compile(r'$.*?$')

for line in ifp:
if p.match(line):
x = num.findall(line)
y = brac.findall(line)
print x, y len(x), len(y)

Now, this works for most of the lines. however, I'm having problems
with
lines such as line 3 above (in the sample file). here, (bbb\)) contains
an escaped
')' and the re I use will match it (because of the non-greedy '?'). But
I want this to
be ignored since it's escaped. is there a such thing as local
greediness??
Can anyone suggest a way to deal with this here..
thanks.

For a start, your brac pattern is better rewritten to avoid the
non-greedy ? tag: r'$[^)]*$' -- this says the middle part is zero or
more occurrences of a single character that is not a ')'

To handle the pesky backslash-as-escape, we need to extend that to: zero
or more occurrences of either (a) a single character that is not a ')'
or (b) the two-character string r"\)". This gives us something like this:

#>>> brac = re.compile(r'$(?:\\$|[^)])*\)')
#>>> tests = r"(xxx)123.4(bbb\))5.6(end\Zhere)7.8()9.0(\))1.2(a b\)cd)"
#>>> brac.findall(tests)
['(xxx)', '(bbb\\))', '(end\\Zhere)', '()', '(\\))', '(ab\\)cd)']
#>>>

Pretty, isn't it? Maybe better done with a hand-coded state machine.

Apr 19 '06 #2

johnzenger

How about using the numbers as delimiters:

pat = re.compile(r"[\d\.\-]+")
pat.split("[(some text)2.3(more text)4.5(more text here)]") ['[(some text)', '(more text)', '(more text here)]'] pat.findall("[(some text)2.3(more text)4.5(more text here)]") ['2.3', '4.5'] pat.split("[(xxx)11.0(bbb\))8.9(end here)] ") ['[(xxx)', '(bbb\\))', '(end here)] '] pat.findall("[(xxx)11.0(bbb\))8.9(end here)] ")
['11.0', '8.9']

ty****@gmail.com wrote: hi, all. I need to process a file with the following format:
$ cat sample
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
.......

my goal here is for each line, extract every '(.*)' (including the
round
brackets, put them in a list, and extract every float on the same line
and put them in a list.. here is my code:

p = re.compile(r'\[.*\]$')
num = re.compile(r'[-\d]+[.\d]*')
brac = re.compile(r'$.*?$')

for line in ifp:
if p.match(line):
x = num.findall(line)
y = brac.findall(line)
print x, y len(x), len(y)

Now, this works for most of the lines. however, I'm having problems
with
lines such as line 3 above (in the sample file). here, (bbb\)) contains
an escaped
')' and the re I use will match it (because of the non-greedy '?'). But
I want this to
be ignored since it's escaped. is there a such thing as local
greediness??
Can anyone suggest a way to deal with this here..
thanks.

Apr 19 '06 #3

Paul McGuire

<ty****@gmail.com> wrote in message
news:11*********************@v46g2000cwv.googlegro ups.com...

hi, all. I need to process a file with the following format:
$ cat sample
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
.......

my goal here is for each line, extract every '(.*)' (including the
round
brackets, put them in a list, and extract every float on the same line
and put them in a list..

Are you wedded to re's? Here's a pyparsing approach for your perusal. It
uses the new QuotedString class, treating your ()-enclosed elements as
custom quoted strings (including backslash escape support).

Some other things the parser does for you during parsing:
- converts the numeric strings to floats
- processes the \) escaped paren, returning just the )
Why not? While parsing, the parser "knows" it has just parsed a floating
point number (or an escaped character), go ahead and do the conversion too.

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net.)

--------------------
test = r"""
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
"""
from pyparsing import oneOf,Combine,Optional,Word,nums,QuotedString,Supp ress

# define a floating point number
sign = oneOf("+ -")
floatNum = Combine( Optional(sign) + Word(nums) + "." + Word(nums) )

# have parser convert to actual floats while parsing
floatNum.setParseAction(lambda s,l,t: float(t[0]))

# define a "quoted string" where ()'s are the opening and closing quotes
parenString = QuotedString("(",endQuoteChar=")",escChar="\\")

# define the overall entry structure
entry = Suppress("[") + parenString + floatNum + parenString + floatNum +
parenString + Suppress("]")

# scan for floats
for toks,start,end in floatNum.scanString(test):
print toks[0]
print

# scan for paren strings
for toks,start,end in parenString.scanString(test):
print toks[0]
print

# scan for entries
for toks,start,end in entry.scanString(test):
print toks
print
--------------------
Gives:
2.3
4.5
-1.2
12.0
11.0
8.9

some text
more text
more text here
aa bb ccc
kdk
xxxyyy
xxx
bbb)
end here

['some text', 2.2999999999999998, 'more text', 4.5, 'more text here']
['aa bb ccc', -1.2, 'kdk', 12.0, 'xxxyyy']
['xxx', 11.0, 'bbb)', 8.9000000000000004, 'end here']

Apr 19 '06 #4

Similar topics

JNDI access to local EJB interface

by: Steffen | last post by:

Hi! I'm trying to access a EntityBean from a servlet via the bean's local home interface. The EJB and the Servlet are together in one .ear file and I'm using JBoss 3.2.3. I think the...

Java

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)

by: jiing | last post by:

Now let me describe what I have done and my purpose: Originally, I want to user ports to install phpBB But I found that phpBB doesn't support mysql 5.x (but the ports installed mySQL 5.0.0...

MySQL Database

/usr/local/include/expat.h:971: error: conflicting types for 'XML_FEATURE_UNICODE'

by: Jan | last post by:

While building PHP 4.4.0 on a fresh installed FreeBSD 5.4 I get the following error: www# /bin/sh /usr/home/c/console/toolscon/web/2/php-4.4.0/libtool --silent --preserve-dup-deps...

PHP

Local variables - quick question

by: Stefan Turalski $stic$ | last post by:

Hi, I done sth like this: for(int i=0; i<10; i++) {...} and after this local declaration of i variable I try to inicialize int i=0;

C# / C Sharp

Why no local functions ?

by: Timothy Madden | last post by:

Hello all. I program C++ since a lot of time now and I still don't know this simple thing: what's the problem with local functions so they are not part of C++ ? There surely are many people...

C / C++

VS2005 - Unable to open local iis site other than localhost

by: Brad | last post by:

I have a win2003 server workstation with multiple webs, each web has it's own ip address. In VS2005, if I select to open an existing web site, select Local IIS, the dialog correctly displays a...

ASP.NET

Determine if Local User is in Local Group

by: orp | last post by:

We've been struggling on how to determine if a local user is already in a local group. We have code (C#) that creates a local user, if the user doesn't already exist. And, we have code (C#) that...

C# / C Sharp

Does C# have static local variables like C++?

by: Zytan | last post by:

I see that static is more restricted in C# than in C++. It appears usable only on classes and methods, and data members, but cannot be created within a method itself. Surely this is possible in...

C# / C Sharp

Crazy Local Class

by: cpluslearn | last post by:

Hi, I have a local class inside a function template. I am able to wrap any type in my local class. Is it legal C++? What type of class is Local? Is it a class template or regular class? Thanks...

C / C++

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA