Parsing a search string

Freddie

Happy new year! Since I have run out of alcohol, I'll ask a question that I
haven't really worked out an answer for yet. Is there an elegant way to turn
something like:

moo cow "farmer john" -zug

into:

['moo', 'cow', 'farmer john'], ['zug']

I'm trying to parse a search string so I can use it for SQL WHERE constraints,
preferably without horrifying regular expressions. Uhh yeah.

From 2005,
Freddie

Jul 18 '05 #1

Subscribe Post Reply

3591

M.E.Farmer

How ,
I just posted on something similar earlier ;)
Ok first of all you might want to try shlex it is in the standard
library.
If you don't know what cStringIO is dont worry about it it is just to
give a file like object to pass to shlex.
If you have a file just pass it in opened.
example: a = shlex.shlex(open('mytxt.txt','r'))

py>import cStringIO
py>d = cStringIO.StringIO()
py>d.write('moo cow "farmer john" -zug')
py>d.seek(0)
py>a = shlex.shlex(d)
py>a.get_token()
'moo'
py>a.get_token()
'cow'
py>a.get_token()
'"farmer john"'
py>a.get_token()
'-'
py>a.get_token()
'zug'
py>a.get_token()
''
# ok we try again this time we add - to valid chars so we can get it
grouped as a single token .
py>d.seek(0)
py>a = shlex.shlex(d)
py>a.wordchars += '-' # add the hyphen
py>a.get_token()
'moo'
py>a.get_token()
'cow'
py>a.get_token()
'"farmer john"'
py>a.get_token()
'-zug'
py>a.get_token()
''

Hth,
M.E.Farmer

Jul 18 '05 #2

Fuzzyman

That's not bad going considering you've only run out of alcohol at 6 in
the morning and *then* ask python questions.

Anyway - you could write a charcter-by-character parser function that
would do that in a few minutes...

My 'listquote' module has one - but it splits on commas not whitespace.
Sounds like you're looking for a one-liner though.... regular
expressions *could* do it...............

Regards,

Fuzzy
http://www.voidspace.org.uk/atlantib...tml#llistquote

Jul 18 '05 #3

Reinhold Birkenfeld

Freddie wrote:

Happy new year! Since I have run out of alcohol, I'll ask a question that I
haven't really worked out an answer for yet. Is there an elegant way to turn
something like:
> moo cow "farmer john" -zug

into:

['moo', 'cow', 'farmer john'], ['zug']

I'm trying to parse a search string so I can use it for SQL WHERE constraints,
preferably without horrifying regular expressions. Uhh yeah.

The shlex approach, finished:

searchstring = 'moo cow "farmer john" -zug'
lexer = shlex.shlex(searchstring)
lexer.wordchars += '-'
poslist, neglist = [], []
while 1:
token = lexer.get_token()
# token is '' on eof
if not token: break
# remove quotes
if token[0] in '"\'':
token = token[1:-1]
# select in which list to put it
if token[0] == '-':
neglist.append(token[1:])
else:
poslist.append(token)

regards,
Reinhold

Jul 18 '05 #4

M.E.Farmer

As I noted before shlex requires a file like object or a open file .
py> import shlex
py> a = shlex.shlex('fgfgfg dgfgfdgfdg')
py> a.get_token()
Traceback (most recent call last):
File "<input>", line 1, in ?
File ".\shlex.py", line 74, in get_token
raw = self.read_token()
File ".\shlex.py", line 100, in read_token
nextchar = self.instream.read(1)
AttributeError: 'str' object has no attribute 'read'

M.E.Farmer

Jul 18 '05 #5

Reinhold Birkenfeld

M.E.Farmer wrote:

As I noted before shlex requires a file like object or a open file .
py> import shlex
py> a = shlex.shlex('fgfgfg dgfgfdgfdg')
py> a.get_token()
Traceback (most recent call last):
File "<input>", line 1, in ?
File ".\shlex.py", line 74, in get_token
raw = self.read_token()
File ".\shlex.py", line 100, in read_token
nextchar = self.instream.read(1)
AttributeError: 'str' object has no attribute 'read'

Which Python version are you using?

The docs say that since Py2.3 strings are accepted.

regards,
Reinhold

Jul 18 '05 #6

It's me

I am right in the middle of doing text parsing so I used your example as a
mental exercise. :-)

Here's a NDFA for your text:

b 0 1-9 a-Z , . + - ' " \n
S0: S0 E E S1 E E E S3 E S2 E
S1: T1 E E S1 E E E E E E T1
S2: S2 E E S2 E E E E E T2 E
S3: T3 E E S3 E E E E E E T3

and the end-states are:

E: error in text
T1: You have the words: moo, cow
T2: You get "farmer john" (w quotes)
T3: You get zug

Can't gurantee that I did it right - I did it really quick - and it's
*specific* to your text string.

Now just need to hire a programmer to write some clean Python parsing code.
:-)

--
It's me

"Freddie" <li**********@zebra-madcowdisease.giraffe-org> wrote in message
news:kX****************@text.usenetserver.com...

Happy new year! Since I have run out of alcohol, I'll ask a question that I haven't really worked out an answer for yet. Is there an elegant way to turn something like:
> moo cow "farmer john" -zug
into:

['moo', 'cow', 'farmer john'], ['zug']

I'm trying to parse a search string so I can use it for SQL WHERE

constraints, preferably without horrifying regular expressions. Uhh yeah.

From 2005,
Freddie

Jul 18 '05 #7

M.E.Farmer

Ah! that is what the __future__ brings I guess.........
Damn that progress making me outdated ;)
Python 2.2.3 ( a lot of extensions I use are stuck there , so I still
use it)
M.E.Farmer

Jul 18 '05 #8

Reinhold Birkenfeld

M.E.Farmer wrote:

Ah! that is what the __future__ brings I guess.........
Damn that progress making me outdated ;)
Python 2.2.3 ( a lot of extensions I use are stuck there , so I still
use it)

I'm also positively surprised how many cute little additions are there
every new Python version. Great thanks to the great devs!

Reinhold

Jul 18 '05 #9

Andrew Dalke

"It's me" wrote:

Here's a NDFA for your text:

b 0 1-9 a-Z , . + - ' " \n
S0: S0 E E S1 E E E S3 E S2 E
S1: T1 E E S1 E E E E E E T1
S2: S2 E E S2 E E E E E T2 E
S3: T3 E E S3 E E E E E E T3

Now if I only had an NDFA for parsing that syntax...

:)
Andrew
da***@dalkescientific.com

Jul 18 '05 #10

It's me

"Andrew Dalke" <da***@dalkescientific.com> wrote in message
news:pa****************************@dalkescientifi c.com...

"It's me" wrote:
Here's a NDFA for your text:

b 0 1-9 a-Z , . + - ' " \n
S0: S0 E E S1 E E E S3 E S2 E
S1: T1 E E S1 E E E E E E T1
S2: S2 E E S2 E E E E E T2 E
S3: T3 E E S3 E E E E E E T3
Now if I only had an NDFA for parsing that syntax...

Just finished one (don't ask me to show it - very clumpsy Python code -
still in learning mode). :)

Here's one for parsing integer:

# b 0 1-9 , . + - ' " a-Z \n
# S0: S0 S0 S1 T0 E S2 S2 E E E T0
# S1: S3 S1 S1 T1 E E E E E E T1
# S2: E S2 S1 E E E E E E E E
# S3: S3 T2 T2 T1 T2 T2 T2 T2 T2 T2 T1

T0: you got a null token
T1: you got a good token, separator was ","
T2: you got a good token b, separator was " "
E: bad token

:)
Andrew
da***@dalkescientific.com

Jul 18 '05 #11

Brian Beck

Freddie wrote:

I'm trying to parse a search string so I can use it for SQL WHERE
constraints, preferably without horrifying regular expressions. Uhh yeah.

If you're interested, I've written a function that parses query strings
using a customizable version of Google's search syntax.

Features include:
- Binary operators like OR
- Unary operators like '-' for exclusion
- Customizable modifiers like Google's site:, intitle:, inurl: syntax
- *No* query is an error (invalid characters are fixed up, etc.)
- Result is a dictionary in one of two possible forms, both geared
towards being input to an search method for your database

I'd be glad to post the code, although I'd probably want to have a last
look at it before I let others see it...

--
Brian Beck
Adventurer of the First Order

Jul 18 '05 #12

John Machin

Andrew Dalke wrote:

"It's me" wrote:
Here's a NDFA for your text:

b 0 1-9 a-Z , . + - ' " \n
S0: S0 E E S1 E E E S3 E S2 E
S1: T1 E E S1 E E E E E E T1
S2: S2 E E S2 E E E E E T2 E
S3: T3 E E S3 E E E E E E T3

Now if I only had an NDFA for parsing that syntax...

Parsing your sentence as written ("if I only had"): If you were the
sole keeper of the secret??

Parsing it as intended ("if only I had"), and ignoring the smiley:
Looks like a fairly straight-forward state-transition table to me. The
column headings are not aligned properly in the message, b means blank,
a-Z is bletchworthy, but the da Vinci code it ain't.

If only we had an NDFA (whatever that is) for guessing what acronyms
mean ...

Where I come from:
DFA = deterministic finite-state automaton
NFA = non-det......
SFA = content-free
NFI = concept-free
NDFA = National Dairy Farmers' Association

HTH, and Happy New Year!

Jul 18 '05 #13

It's me

"John Machin" <sj******@lexicon.net> wrote in message
news:11**********************@c13g2000cwb.googlegr oups.com...

Andrew Dalke wrote:
"It's me" wrote:
Here's a NDFA for your text:

b 0 1-9 a-Z , . + - ' " \n
S0: S0 E E S1 E E E S3 E S2 E
S1: T1 E E S1 E E E E E E T1
S2: S2 E E S2 E E E E E T2 E
S3: T3 E E S3 E E E E E E T3
Now if I only had an NDFA for parsing that syntax...

Parsing your sentence as written ("if I only had"): If you were the
sole keeper of the secret??

Parsing it as intended ("if only I had"), and ignoring the smiley:
Looks like a fairly straight-forward state-transition table to me.

Exactly.
The
column headings are not aligned properly in the message, b means blank,
a-Z is bletchworthy, but the da Vinci code it ain't.

If only we had an NDFA (whatever that is) for guessing what acronyms
mean ...

I believe (I am not a computer science major):

NDFA = non-deterministic finite automata

and:

S: state
T: terminal
E: error

So, S1 means State #1..T1 means Terminal #1, so forth....

You are correct that parsing that table is not hard.

a) Set up a stack and place the buffer onto the stack, start with S0
b) For each character that comes from the stack, looking up the next state
for that token
c) If it's not a T or E state, jump to that state
d) If it's a T or E state, finish

Jul 18 '05 #14

Freddie

Reinhold Birkenfeld wrote:

Freddie wrote:
Happy new year! Since I have run out of alcohol, I'll ask a question that I
haven't really worked out an answer for yet. Is there an elegant way to turn
something like:
> moo cow "farmer john" -zug

into:

['moo', 'cow', 'farmer john'], ['zug']

I'm trying to parse a search string so I can use it for SQL WHERE constraints,
preferably without horrifying regular expressions. Uhh yeah.

The shlex approach, finished:

searchstring = 'moo cow "farmer john" -zug'
lexer = shlex.shlex(searchstring)
lexer.wordchars += '-'
poslist, neglist = [], []
while 1:
token = lexer.get_token()
# token is '' on eof
if not token: break
# remove quotes
if token[0] in '"\'':
token = token[1:-1]
# select in which list to put it
if token[0] == '-':
neglist.append(token[1:])
else:
poslist.append(token)

regards,
Reinhold

Thanks for this, though there was one issue:

lexer = shlex.shlex('moo cow +"farmer john" -dog')
lexer.wordchars += '-+'
while 1: .... tok = lexer.get_token()
.... if not tok: break
.... print tok
....
moo
cow
+"farmer
john"
-dog

The '+"farmer john"' part would be turned into two seperate words, '+"farmer'
and 'john"'. I ended up using shlex.split() (which the docs say is new in
Python 2.3), which gives me the desired result. Thanks for the help from
yourself and M.E.Farmer :)

Freddie
shlex.split('moo cow +"farmer john" -"evil dog"') ['moo', 'cow', '+farmer john', '-evil dog'] shlex.split('moo cow +"farmer john" -"evil dog" +elephant')

['moo', 'cow', '+farmer john', '-evil dog', '+elephant']

Jul 18 '05 #15

M.E.Farmer

py>b = shlex.shlex(a)
py>while 1:
.... tok = b.get_token()
.... if not tok: break
.... print tok
....
moo
cow
+
"farmer john"
-
dog

Just wanted to share this just in case it might be relevant .
It seems if we don't add +- to wordchars then we get a different split
on "farmer john".
M.E.Farmer

Jul 18 '05 #16

Similar topics

Search string parsing

by: flam | last post by:

Hello, I am having a hard time spliting a string into an array for use in a search. Here is the situation. The user will input a search string. Normally I can just split the string by "split...

Perl

parsing date string

by: z. f. | last post by:

HI, i have string in format dd/mm/yyyyy hh:mm:ss and giving this as an input to DateTime.Parse gives a string was not recognized as a valid date time format string error. how do i make the parse...

Visual Basic .NET

Generate MySQL "WHERE" clause from a Google-like search string.

by: davehansen22 | last post by:

Is there a way to generate a MySQL WHERE clause from a search string like this: "(dave OR hansen) php programmer" I would want to use the generated MySQL clause against a "memo" type field. ...

PHP

Search String in Database Tables

by: atl10spro | last post by:

Hello Everyone, I am new to MS Access and although I have created several different databases I lack the VB knowledge to code a search function. I am turning to your expertise for assistance. ...

Microsoft Access / VBA

Search string in textfile

by: beersa | last post by:

Hi All, I am looking for fastest search alogrithm in vb.net or vb2005. I write a code in vb2005 to find a string in given text file. The text file contains about 120,000 lines and search string...

Visual Basic .NET

how to search string for words??

by: sovixi | last post by:

Hi I want to search a text file (.txt) for words that I specify in my program. However, I can’t come up with any working solution. My program search string in a text file for ‘met|meeting|meets’ but...

Python

Removing wild cards from search string

by: Nitinkcv | last post by:

Hi, I have a textbox and a button. In my textbox i have to enter the query string(say shoes) and on clicking the button takes me to a page show all item related to the search string( in this case...

Javascript

Parsing A String In C++

by: baju123 | last post by:

Hi, Would anybody tell me an efficient method to parse a string inside a variable which is line separated ... Like this value1: line1 /n value2: line2 /n value3: line3 /n

C / C++

Parse a search string

by: Mufasa | last post by:

Does anybody have any code that will take a search string and parse it in to the appropriate parts? For instance: +google -news -bush TIA - J.

C# / C Sharp

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA