473,386 Members | 1,644 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Issue with regular expressions

Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' " some words" with and "without quotes " '
p = re.compile(magic_regular_expression) $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?

Any help would be much appreciated.

Thanks!!

Julien
Jun 27 '08 #1
10 1654
On Apr 29, 8:46*am, Julien <jpha...@gmail.comwrote:
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' * " *some words" *with and "without * *quotes * " *'
p = re.compile(magic_regular_expression) * $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?
Julien -

I dabbled with re's for a few minutes trying to get your solution,
then punted and used pyparsing instead. Pyparsing will run slower
than re, but many people find it much easier to work with readable
class names and instances rather than re's typoglyphics:

from pyparsing import OneOrMore, Word, printables, dblQuotedString,
removeQuotes

# when a quoted string is found, remove the quotes,
# then strip whitespace from the contents
dblQuotedString.setParseAction(removeQuotes,
lambda s:s[0].strip())

# define terms to be found in query string
term = dblQuotedString | Word(printables)
query_terms = OneOrMore(term)

# parse query string to extract terms
query = ' " some words" with and "without quotes " '
print tuple(query_terms.parseString(query))

Gives:
('some words', 'with', 'and', 'without quotes')

The pyparsing wiki is at http://pyparsing.wikispaces.com. You'll find
an examples page that includes a search query parser, and pointers to
a number of online documentation and presentation sources.

-- Paul
Jun 27 '08 #2
Julien wrote:
Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' " some words" with and "without quotes " '
p = re.compile(magic_regular_expression) $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?

Any help would be much appreciated.
Hi,

I think re is not the best tool for you. Maybe there's a regular
expression that does what you want but it will be quite complex and hard
to maintain.

I suggest you split the query with the double quotes and process
alternate inside/outside chunks. Something like:

import re

def spulit(s):
inq = False
for term in s.split('"'):
if inq:
yield re.sub('\s+', ' ', term.strip())
else:
for word in term.split():
yield word
inq = not inq

for token in spulit(' " some words" with and "without quotes " '):
print token
Cheers,
RB
Jun 27 '08 #3
| # ---- Double Quote Text ----
| " # match a double quote
| ( # - Two Possiblities:
| \\. # match two backslashes followed by anything
(include newline)
| | # OR
| [^"] # do not match a single quote
| )* # - from zero to many
| " # finally match a double quote
|
| | # ======== OR ========
|
| # ---- Single Quote Text ----
| ' # match a single quote
| ( # - Two Possiblities:
| \\. # match two backslashes followed by anything
(include newline)
| | # OR
| [^'] # do not match a single quote
| )* # - from zero to many
| ' # finally match a single quote
| """, DOTALL|VERBOSE)

Used this before (minus those | at the beginning) to find double
quotes and single quotes in a file (there is more to this that looks
for C++ and C style quotes but that isn't needed here), perhaps you
can take it another step to not do changes to these matches?

r""""(\\.|[^"])*"|'(\\.|[^'])*'""", DOTALL)

is it in a single line :)
Jun 27 '08 #4
Julien wrote:
Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' " some words" with and "without quotes " '
p = re.compile(magic_regular_expression) $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?
Here's one way with a single regexp plus an extra filter function.
>>import re
p = re.compile('("([^"]+)")|([^ \t]+)')
m = p.findall(q)
m
[('" some words"', ' some words', ''), ('', '', 'with'), ('', '',
'and'), ('"without quotes "', 'without quotes ', '')]
>>def f(t):
.... if t[0] == '':
.... return t[2]
.... else:
.... return t[1]
....
>>map(f, m)
[' some words', 'with', 'and', 'without quotes ']

If you want to strip away the leading/trailing whitespace from the
quoted strings, then change the last return statement to
be "return t[1].strip()".

Paul
Jun 27 '08 #5
Julien <jp*****@gmail.comwrites:
I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' " some words" with and "without quotes " '
p = re.compile(magic_regular_expression) $ <--- the magic happens
m = p.match(query)
I don't think you can achieve this with a single regular expression.
Your best bet is to use p.findall() to find all plausible matches, and
then rework them a bit. For example:

p = re.compile(r'"[^"]*"|[\S]+')
p.findall(query)
['" some words"', 'with', 'and', '"without quotes "']

At that point, you can easily iterate through the list and remove the
quotes and excess whitespace.
Jun 27 '08 #6
On Apr 29, 2:46*pm, Julien <jpha...@gmail.comwrote:
Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' * " *some words" *with and "without * *quotes * " *'
p = re.compile(magic_regular_expression) * $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?

Any help would be much appreciated.

Thanks!!

Julien
You can't do it simply and completely with regular expressions alone
because of the requirement to strip the quotes and normalize
whitespace, but its not too hard to write a function to do it. Viz:

import re

wordre = re.compile('"[^"]+"|[a-zA-Z]+').findall
def findwords(src):
ret = []
for x in wordre(src):
if x[0] == '"':
#strip off the quotes and normalise spaces
ret.append(' '.join(x[1:-1].split()))
else:
ret.append(x)
return ret

query = ' " Some words" with and "without quotes " '
print findwords(query)

Running this gives
['Some words', 'with', 'and', 'without quotes']

HTH

Harvey
Jun 27 '08 #7
On Apr 29, 6:46 am, Julien <jpha...@gmail.comwrote:
Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' " some words" with and "without quotes " '
p = re.compile(magic_regular_expression) $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?

Any help would be much appreciated.

Thanks!!

Julien
I don't know if it is possible to do it all with one regex, but it
doesn't seem practical. I would check-out the shlex module.
>>import shlex

query = ' " some words" with and "without quotes " '
shlex.split(query)
[' some words', 'with', 'and', 'without quotes ']

To get rid of the leading and trailing space you can then use strip:
>>[s.strip() for s in shlex.split(query)]
['some words', 'with', 'and', 'without quotes']

The only problem is getting rid of the extra white-space in the middle
of the expression, for which re might still be a good solution.
>>import re
[re.sub(r"\s+", ' ', s.strip()) for s in shlex.split(query)]
['some words', 'with', 'and', 'without quotes']

Matt
Jun 27 '08 #8
On Apr 29, 9:20*am, Paul McGuire <pt...@austin.rr.comwrote:
On Apr 29, 8:46*am, Julien <jpha...@gmail.comwrote:
I'd like to select terms in a string, so I can then do a search in my
database.
query = ' * " *some words" *with and "without * *quotes * " *'
p = re.compile(magic_regular_expression) * $ <--- the magic happens
m = p.match(query)
I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')
Oh! It wasn't until Matimus's post that I saw that you wanted the
interior whitespace within the quoted strings collapsed also. Just
add another parse action to the chain of functions on dblQuotedString:

# when a quoted string is found, remove the quotes,
# then strip whitespace from the contents, then
# collapse interior whitespace
dblQuotedString.setParseAction(removeQuotes,
lambda s:s[0].strip(),
lambda s:" ".join(s[0].split()))

Plugging this into the previous script now gives:
('some words', 'with', 'and', 'without quotes')

-- Paul
Jun 27 '08 #9
On Apr 29, 9:46 am, Julien <jpha...@gmail.comwrote:
Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' " some words" with and "without quotes " '
p = re.compile(magic_regular_expression) $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?
As other replies mention, there is no single expression since you are
doing two things: find all matches and substitute extra spaces within
the quoted matches. It can be done with two expressions though:

def normquery(text, findterms=re.compile(r'"([^"]+)"|(\S+)').findall,
normspace=re.compile(r'\s{2,}').sub):
return [normspace(' ', (t[0] or t[1]).strip()) for t in
findterms(text)]
>>normquery(' "some words" with and "without quotes " ')
['some words', 'with', 'and', 'without quotes']

HTH,
George
Jun 27 '08 #10
On Apr 29, 3:46 pm, Julien <jpha...@gmail.comwrote:
Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = ' " some words" with and "without quotes " '
p = re.compile(magic_regular_expression) $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?

Any help would be much appreciated.
With simpleparse:

----------------------------------------------------------

from simpleparse.parser import Parser
from simpleparse.common import strings
from simpleparse.dispatchprocessor import DispatchProcessor, getString
grammar = '''
text := (quoted / unquoted / ws)+
quoted := string
unquoted := -ws+
ws := [ \t\r\n]+
'''

class MyProcessor(DispatchProcessor):

def __init__(self, groups):
self.groups = groups

def quoted(self, val, buffer):
self.groups.append(' '.join(getString(val, buffer)
[1:-1].split()))

def unquoted(self, val, buffer):
self.groups.append(getString(val, buffer))

def ws(self, val, buffer):
pass

groups = []
parser = Parser(grammar, 'text')
proc = MyProcessor(groups)
parser.parse(TESTS[1][1][0], processor=proc)

print groups
----------------------------------------------------------

G.
Jun 27 '08 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
2
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I...
4
by: Együd Csaba | last post by:
Hi All, I'd like to "compress" the following two filter expressions into one - assuming that it makes sense regarding query execution performance. .... where (adate LIKE "2004.01.10 __:30" or...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
3
by: a | last post by:
I'm a newbie needing to use some Regular Expressions in PHP. Can I safely use the results of my tests using 'The Regex Coach' (http://www.weitz.de/regex-coach/index.html) Are the Regular...
20
chunk1978
by: chunk1978 | last post by:
hi everyone... i'm preparing to complete a validated form through client-side javascript with regular expressions... and yes the form will also be validated server-side as well... anyway, my regex...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
13
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can...
7
by: Nightcrawler | last post by:
Hi all, I am trying to use regular expressions to parse out mp3 titles into three different groups (artist, title and remix). I currently have three ways to name a mp3 file: Artist - Title ...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.