472,992 Members | 3,637 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,992 software developers and data experts.

need simple parsing ability

[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young
--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)
Jul 18 '05 #1
10 1838
COMMA = ","
OPT_WS = "[ \t]*"
STEM = "([a-zA-Z_]*)"
NUMBER = "([0-9]+)"
OPT_NUMBER = NUMBER + "?"
OPT_SECOND_NUMBER = "(?:-" + NUMBER + ")?"

import re
splitter = re.compile(COMMA + OPT_WS).split
print `STEM + OPT_NUMBER + OPT_SECOND_NUMBER`
parser = re.compile(STEM + OPT_NUMBER + OPT_SECOND_NUMBER).match

def expand(stem, n0, n1):
if not n1:
if n0:
yield "%s%s" % (stem, n0)
else:
yield stem
return
l = len(n0)
n0 = int(n0, 10)
n1 = int(n1, 10)

for i in range(n0, n1+1):
yield "%s%0*d" % (stem, l, i)

def parse_string(line):
items = splitter(line)
parsed_items = [parser(i) for i in items]
for i, pi in zip(items, parsed_items):
if i is None:
raise ValueError, "Invalid item: %r" % i
stem = pi.group(1)
n0 = pi.group(2)
n1 = pi.group(3)
if n1 and not n0:
raise ValueError, "Invalid item: %r" % i
for j in expand(stem, n0, n1):
yield j

def test():
s = "9,foo7-9,bar_09-12,2-4,spam"
print s, list(parse_string(s))

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFA9/gvJd01MZaTXX0RAt7rAJ9AKextRdnmcRfQ+y50vJN4pm3RKwCf QE+c
iffKuKaIvlCedpMysL4vqkY=
=/dUJ
-----END PGP SIGNATURE-----

Jul 18 '05 #2
On Fri, 16 Jul 2004, george young wrote:
I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.


The following should do the trick, using nothing more than the built-in
re package:

---

import re

def expand(pattern):
r = re.search('\d+-\d+$',pattern)
if r is None:
yield pattern
return
s,e = r.group().split('-')
for n in xrange(int(s),int(e)+1):
yield pattern[:r.start()]+str(n)

def expand_list(pattern_list):
return [ w for pattern in pattern_list.split(',')
for w in expand(pattern) ]

print expand_list('9,foo7-9,2-4,xxx')

---

If you want to let the syntax be a little more lenient, replace
"pattern_list.split(',')" in expand_list() with
"re.split('\s*,\s*',pattern_list)". This will allow spaces to surround
commas.

Note that because this uses generators, it won't work on Pythons prior to
2.3.

Hope this helps!

Jul 18 '05 #3


Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #4

With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort, maybe
fs.sort()

print fs
/Jean Brouwers
In article <16******************************************@no.s pam.net>,
Jean Brouwers <JB***********************@no.spam.net> wrote:
Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #5
On Fri, 16 Jul 2004 17:10:03 GMT
Jean Brouwers <JB***********************@no.spam.net> threw this fish to the penguins:
With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))


Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

(I'm contracting out construction of a special circle of hell for users
who define [foo7, foo08, foo9, foo10] -- they won't be around to complain
that it parses wrong ;-)

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11 ^^^^^^^^ ^^^^^^^^^^^^^^^^^ (Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)
Jul 18 '05 #6
"george young" <gr*@ll.mit.edu> wrote in message
news:20040716111324.09267883.gr*@ll.mit.edu...
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young
--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)


Here's a pyparsing solution. The best way to read this is to first look
over the grammar definitions, then to the parse actions attached to the
different bits of the grammar. The most complicated part is the parse
action for integer ranges, in which we try to keep leading zeroes if they
were given in the original string.

You said exception handling is not a big deal, but it is built into
pyparsing. So use as much or as little as you like.

-- Paul
# download pyparsing at http://pyparsing.sourceforge.net

from pyparsing import
Word,delimitedList,alphas,alphanums,nums,Literal,S tringEnd,ParseException

# define basic grammar
integer = Word(nums)
integerRange = integer.setResultsName("start") + "-" + \
integer.setResultsName("end")
word = Word(alphas+"_")
wordRange = word.setResultsName("base") + ( integerRange | integer )
waferList = delimitedList( integerRange | integer | wordRange | word ) + \
StringEnd()

# define parse actions (to expand range references)
def expandIntRange(st,loc,toks):
expandedNums = range( int(toks.start), int(toks.end)+1 )
# make sure leading zeroes are retained
if toks.start.startswith('0'):
return [ "%0*d"%(len(toks.start),n) for n in expandedNums ]
else:
return [ str(n) for n in expandedNums ]

def expandWordRange(st,loc,toks):
baseNumPairs = zip( [toks.base]*(len(toks)-1), toks[1:] )
return [ "".join(pair) for pair in baseNumPairs ]

# attach parse actions to grammar elements
integerRange.setParseAction( expandIntRange )
wordRange.setParseAction( expandWordRange )

# run tests (last one an error)
testData = """
9,foo7-9,2-4,xxx
9,foo_7- 9, 2-4, xxx
9 , foo07-09,2 - 4, bar6, xxx
9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10-11
9,foo7-9,2-4,xxx,5- 9, bar, foo_06, foo_010-11
9,foo7-9,2-4,xxx,foo_099-101
9,f07-09-12,xxx
"""

for t in testData.split("\n")[1:-1]:
try:
print t
print waferList.parseString(t)
except ParseException, pe:
print t
print (" "*pe.loc) + "^"
print pe.msg
print

=====================
output:
9,foo7-9,2-4,xxx
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx']

9,foo_7- 9, 2-4, xxx
['9', 'foo_7', 'foo_8', 'foo_9', '2', '3', '4', 'xxx']

9 , foo07-09,2 - 4, bar6, xxx
['9', 'foo07', 'foo08', 'foo09', '2', '3', '4', 'bar6', 'xxx']

9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10-11
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', '5', '6', '7', '8', '9',
'bar', 'foo_6', 'foo_10', 'foo_11']

9,foo7-9,2-4,xxx,5- 9, bar, foo_06, foo_010-11
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', '5', '6', '7', '8', '9',
'bar', 'foo_06', 'foo_010', 'foo_011']

9,foo7-9,2-4,xxx,foo_099-101
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', 'foo_099', 'foo_100',
'foo_101']

9,f07-09-12,xxx
9,f07-09-12,xxx
^
Expected end of text


Jul 18 '05 #7
On Fri, 16 Jul 2004, george young wrote:
Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.


An updated version of what I previously posted should do the trick:

---

import re

def expand(pattern):
r = re.search('\d+-\d+$',pattern)
if r is None:
yield pattern
return
s,e = r.group().split('-')
l = len(s)
for n in xrange(int(s),int(e)+1):
yield pattern[:r.start()]+'%0*d' % (l,n)

def expand_list(pattern_list):
return [ w for pattern in re.split('\s*,\s*',pattern_list)
for w in expand(pattern) ]

pattern_list = '9,foo07-11,2-4,xxx'

print expand_list(pattern_list)

# --> ['9', 'foo07', 'foo08', 'foo09', 'foo10', 'foo11', '2', '3', '4', 'xxx']

---

Why do I feel like there's a contest going? ;)

Jul 18 '05 #8

Another fix, to handle leading zeros.
ns = '9,2-4,xxx,5, bar, foo_6-11,x07-9'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# list of names and expanded names
fs = []
for n in ns:
r = n.split('-')
if len(r) == 2: # expand name with range
h = r[0].rstrip('0123456789') # header
r[0] = r[0][len(h):]
if r[0][0] != '0':
h += '%d'
else: # leading zeros
w = [len(i) for i in r]
if w[1] > w[0]:
raise ValueError, 'bad range: ' + n
h += '%%0%dd' % max(w)
for i in range(int(r[0],10), 1+int(r[1],10)):
fs.append(h % i)
else: # simple name
fs.append(n)
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)]).keys()
# sort, maybe
fs.sort()

print fs
['2', '3', '4', '5', '9', 'bar', 'foo_10', 'foo_11', 'foo_6',
'foo_7', 'foo_8', 'foo_9', 'x07', 'x08', 'x09', 'xxx']
There is still a question about a range specification like

foo09-123

which is treated as as error in the code above.

/Jean Brouwers

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
On Fri, 16 Jul 2004 17:10:03 GMT
Jean Brouwers <JB***********************@no.spam.net> threw this fish to the
penguins:
With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))


Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

(I'm contracting out construction of a special circle of hell for users
who define [foo7, foo08, foo9, foo10] -- they won't be around to complain
that it parses wrong ;-)

In article <20040716111324.09267883.gr*@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:

> [python 2.3.3, x86 linux]
> For each run of my app, I have a known set of (<100) wafer names.
> Names are sometimes simply integers, sometimes a short string, and
> sometimes a short string followed by an integer, e.g.:
>
> 5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11
>
> I need to read user input of a subset of these. The user will type a
> set of names separated by commas (with optional white space), but there
> may also be sequences indicated by a dash between two integers, e.g.:
>
> "9-11" meaning 9,10,11
> "foo_11-13" meaning foo_11, foo_12, and foo_13.
> "foo_9-11" meaning foo_9,foo_10,foo_11, or
> "bar09-11" meaning bar09,bar10,bar11 ^^^^^^^^ ^^^^^^^^^^^^^^^^^ > (Yes, I have to deal with integers with and without leading zeros)
> [I'll proclaim inverse sequences like "foo_11-9" invalid]
> So a sample input might be:
>
> 9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx
>
> The order of the resultant list of names is not important; I have
> to sort them later anyway.
>
> Fancy error recovery is not needed; an invalid input string will be
> peremptorily wiped from the screen with an annoyed beep.
>
> Can anyone suggest a clean way of doing this? I don't mind
> installing and importing some parsing package, as long as my code
> using it is clear and simple. Performance is not an issue.
>
>
> -- George Young

Jul 18 '05 #9

Further (, final?) update and check some range errors.

/Jean Brouwers
ns = '9,2-4,xxx, bar, foo_6-11,x07-9, 0-1, 00-1'

# list of names and expanded names
fs = []
for n in ns.split(','):
n = n.strip()
r = n.split('-')
if len(r) == 2: # expand name with range
h = r[0].rstrip('0123456789') # header
r[0] = r[0][len(h):]
# range can't be empty
if not (r[0] and r[1]):
raise ValueError, 'empty range: ' + n
# handle leading zeros
if r[0] == '0' or r[0][0] != '0':
h += '%d'
else:
w = [len(i) for i in r]
if w[1] > w[0]:
raise ValueError, 'wide range: ' + n
h += '%%0%dd' % max(w)
# check range
r = [int(i, 10) for i in r]
if r[0] > r[1]:
raise ValueError, 'bad range: ' + n
for i in range(r[0], r[1]+1):
fs.append(h % i)
else: # simple name
fs.append(n)

# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)]).keys()
# sort, maybe
fs.sort()

print fs
['0', '00', '01', '1', '2', '3', '4', '9', 'bar', 'foo_10',
'foo_11', 'foo_6', 'foo_7', 'foo_8', 'foo_9', 'x07', 'x08', 'x09',
'xxx']


In article <16******************************************@no.s pam.net>,
Jean Brouwers <JB***********************@no.spam.net> wrote:
With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort, maybe
fs.sort()

print fs
/Jean Brouwers
In article <160720040947530644%JB***********************@no.s pam.net>,
Jean Brouwers <JB***********************@no.spam.net> wrote:
Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #10
george young wrote:

....
Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young


Here's a $50 solution. Using the code below, you can do:

mywafers = allwafers('q9-43, s12, a11-23')

Then you can use tests like:

if 'wafer23' in mywafers:
...

---

import sets

def elements(wafers):
'''get wafer names from wafer-names and wafer-name-ranges'''
for element in wafers:
if '-' not in element:
yield element # simple name
continue
# name with range
start, final = element.split('-')
preamble = start.rstrip('0123456789') # header
initial = start[len(preamble):]
if len(final) == len(initial) or initial.startswith('0'):
# use fixed length strings.
assert len(initial) >= len(final) # eg: foo01-009
pattern = preamble + '%0' + str(len(initial)) + 'd'
else:
# unequal length: should be foo08-09, not foo008-9
assert not (final.startswith('0') or initial.startswith('0'))
pattern = preamble + '%d'
for number in range(int(initial), int(final)+1):
yield pattern % number

def allwafers(spec):
return sets.Set(elements([element.strip()
for element in spec.split(',')]))

if __name__ == '__main__':
ns = ('9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10,'
' foo_11, q08-12')
nr = ('9 foo7 foo8 foo9 2 3 4 xxx 5 6 7 8 9 bar foo_6 foo_10 '
' foo_11 q08 q09 q10 q11 q12').split()
assert sets.Set(nr) == allwafers(ns)
--
-Scott David Daniels
Sc***********@Acm.Org
Jul 18 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

34
by: yensao | last post by:
Hi, I have a hard time to understand difference and similarities between Relational database model and the Object-Oriented model. Can somebody help me with this? Thank you in advance. ...
16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
20
by: Steve | last post by:
I have a web app that needs to parse through a file that is located on the client machine. I get the file string from a query string & then parse it. It is working fine on my development box but...
21
by: Scott Marquardt | last post by:
What are some good strategic approaches to using freeform text fields for data that needs to be queried? We have a product whose tables we can't change, and I need to count on a "description" field...
1
by: Nut Cracker | last post by:
Hello, If anyone can point me to a good ASP based Control Panel for IIS5, I would be much obliged. I hacked together an ASP site for file uploads and sharing. Its very simple, and basically...
18
by: Q. John Chen | last post by:
I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How...
3
by: gatorbeaver | last post by:
I am trying to upgrade an application that is currently using Xerces and is only parsing XML files that are on the local host. I want to add the ability to parse XML files that are located on a...
27
by: comp.lang.tcl | last post by:
My TCL proc, XML_GET_ALL_ELEMENT_ATTRS, is supposed to convert an XML file into a TCL list as follows: attr1 {val1} attr2 {val2} ... attrN {valN} This is the TCL code that does this: set...
19
by: santanu mishra | last post by:
Hi , I am stuck with a requirement from my client to change the date format from mm/dd/yy to dd/mm/yy .If any body can help me out with this regard as its very much urgent. Regards,...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
3
SueHopson
by: SueHopson | last post by:
Hi All, I'm trying to create a single code (run off a button that calls the Private Sub) for our parts list report that will allow the user to filter by either/both PartVendor and PartType. On...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.