By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,677 Members | 1,094 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,677 IT Pros & Developers. It's quick & easy.

need simple parsing ability

P: n/a
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young
--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)
Jul 18 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
COMMA = ","
OPT_WS = "[ \t]*"
STEM = "([a-zA-Z_]*)"
NUMBER = "([0-9]+)"
OPT_NUMBER = NUMBER + "?"
OPT_SECOND_NUMBER = "(?:-" + NUMBER + ")?"

import re
splitter = re.compile(COMMA + OPT_WS).split
print `STEM + OPT_NUMBER + OPT_SECOND_NUMBER`
parser = re.compile(STEM + OPT_NUMBER + OPT_SECOND_NUMBER).match

def expand(stem, n0, n1):
if not n1:
if n0:
yield "%s%s" % (stem, n0)
else:
yield stem
return
l = len(n0)
n0 = int(n0, 10)
n1 = int(n1, 10)

for i in range(n0, n1+1):
yield "%s%0*d" % (stem, l, i)

def parse_string(line):
items = splitter(line)
parsed_items = [parser(i) for i in items]
for i, pi in zip(items, parsed_items):
if i is None:
raise ValueError, "Invalid item: %r" % i
stem = pi.group(1)
n0 = pi.group(2)
n1 = pi.group(3)
if n1 and not n0:
raise ValueError, "Invalid item: %r" % i
for j in expand(stem, n0, n1):
yield j

def test():
s = "9,foo7-9,bar_09-12,2-4,spam"
print s, list(parse_string(s))

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFA9/gvJd01MZaTXX0RAt7rAJ9AKextRdnmcRfQ+y50vJN4pm3RKwCf QE+c
iffKuKaIvlCedpMysL4vqkY=
=/dUJ
-----END PGP SIGNATURE-----

Jul 18 '05 #2

P: n/a
On Fri, 16 Jul 2004, george young wrote:
I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.


The following should do the trick, using nothing more than the built-in
re package:

---

import re

def expand(pattern):
r = re.search('\d+-\d+$',pattern)
if r is None:
yield pattern
return
s,e = r.group().split('-')
for n in xrange(int(s),int(e)+1):
yield pattern[:r.start()]+str(n)

def expand_list(pattern_list):
return [ w for pattern in pattern_list.split(',')
for w in expand(pattern) ]

print expand_list('9,foo7-9,2-4,xxx')

---

If you want to let the syntax be a little more lenient, replace
"pattern_list.split(',')" in expand_list() with
"re.split('\s*,\s*',pattern_list)". This will allow spaces to surround
commas.

Note that because this uses generators, it won't work on Pythons prior to
2.3.

Hope this helps!

Jul 18 '05 #3

P: n/a


Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #4

P: n/a

With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort, maybe
fs.sort()

print fs
/Jean Brouwers
In article <16******************************************@no.s pam.net>,
Jean Brouwers <JB***********************@no.spam.net> wrote:
Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #5

P: n/a
On Fri, 16 Jul 2004 17:10:03 GMT
Jean Brouwers <JB***********************@no.spam.net> threw this fish to the penguins:
With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))


Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

(I'm contracting out construction of a special circle of hell for users
who define [foo7, foo08, foo9, foo10] -- they won't be around to complain
that it parses wrong ;-)

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11 ^^^^^^^^ ^^^^^^^^^^^^^^^^^ (Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)
Jul 18 '05 #6

P: n/a
"george young" <gr*@ll.mit.edu> wrote in message
news:20040716111324.09267883.gr*@ll.mit.edu...
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young
--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)


Here's a pyparsing solution. The best way to read this is to first look
over the grammar definitions, then to the parse actions attached to the
different bits of the grammar. The most complicated part is the parse
action for integer ranges, in which we try to keep leading zeroes if they
were given in the original string.

You said exception handling is not a big deal, but it is built into
pyparsing. So use as much or as little as you like.

-- Paul
# download pyparsing at http://pyparsing.sourceforge.net

from pyparsing import
Word,delimitedList,alphas,alphanums,nums,Literal,S tringEnd,ParseException

# define basic grammar
integer = Word(nums)
integerRange = integer.setResultsName("start") + "-" + \
integer.setResultsName("end")
word = Word(alphas+"_")
wordRange = word.setResultsName("base") + ( integerRange | integer )
waferList = delimitedList( integerRange | integer | wordRange | word ) + \
StringEnd()

# define parse actions (to expand range references)
def expandIntRange(st,loc,toks):
expandedNums = range( int(toks.start), int(toks.end)+1 )
# make sure leading zeroes are retained
if toks.start.startswith('0'):
return [ "%0*d"%(len(toks.start),n) for n in expandedNums ]
else:
return [ str(n) for n in expandedNums ]

def expandWordRange(st,loc,toks):
baseNumPairs = zip( [toks.base]*(len(toks)-1), toks[1:] )
return [ "".join(pair) for pair in baseNumPairs ]

# attach parse actions to grammar elements
integerRange.setParseAction( expandIntRange )
wordRange.setParseAction( expandWordRange )

# run tests (last one an error)
testData = """
9,foo7-9,2-4,xxx
9,foo_7- 9, 2-4, xxx
9 , foo07-09,2 - 4, bar6, xxx
9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10-11
9,foo7-9,2-4,xxx,5- 9, bar, foo_06, foo_010-11
9,foo7-9,2-4,xxx,foo_099-101
9,f07-09-12,xxx
"""

for t in testData.split("\n")[1:-1]:
try:
print t
print waferList.parseString(t)
except ParseException, pe:
print t
print (" "*pe.loc) + "^"
print pe.msg
print

=====================
output:
9,foo7-9,2-4,xxx
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx']

9,foo_7- 9, 2-4, xxx
['9', 'foo_7', 'foo_8', 'foo_9', '2', '3', '4', 'xxx']

9 , foo07-09,2 - 4, bar6, xxx
['9', 'foo07', 'foo08', 'foo09', '2', '3', '4', 'bar6', 'xxx']

9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10-11
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', '5', '6', '7', '8', '9',
'bar', 'foo_6', 'foo_10', 'foo_11']

9,foo7-9,2-4,xxx,5- 9, bar, foo_06, foo_010-11
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', '5', '6', '7', '8', '9',
'bar', 'foo_06', 'foo_010', 'foo_011']

9,foo7-9,2-4,xxx,foo_099-101
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', 'foo_099', 'foo_100',
'foo_101']

9,f07-09-12,xxx
9,f07-09-12,xxx
^
Expected end of text


Jul 18 '05 #7

P: n/a
On Fri, 16 Jul 2004, george young wrote:
Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.


An updated version of what I previously posted should do the trick:

---

import re

def expand(pattern):
r = re.search('\d+-\d+$',pattern)
if r is None:
yield pattern
return
s,e = r.group().split('-')
l = len(s)
for n in xrange(int(s),int(e)+1):
yield pattern[:r.start()]+'%0*d' % (l,n)

def expand_list(pattern_list):
return [ w for pattern in re.split('\s*,\s*',pattern_list)
for w in expand(pattern) ]

pattern_list = '9,foo07-11,2-4,xxx'

print expand_list(pattern_list)

# --> ['9', 'foo07', 'foo08', 'foo09', 'foo10', 'foo11', '2', '3', '4', 'xxx']

---

Why do I feel like there's a contest going? ;)

Jul 18 '05 #8

P: n/a

Another fix, to handle leading zeros.
ns = '9,2-4,xxx,5, bar, foo_6-11,x07-9'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# list of names and expanded names
fs = []
for n in ns:
r = n.split('-')
if len(r) == 2: # expand name with range
h = r[0].rstrip('0123456789') # header
r[0] = r[0][len(h):]
if r[0][0] != '0':
h += '%d'
else: # leading zeros
w = [len(i) for i in r]
if w[1] > w[0]:
raise ValueError, 'bad range: ' + n
h += '%%0%dd' % max(w)
for i in range(int(r[0],10), 1+int(r[1],10)):
fs.append(h % i)
else: # simple name
fs.append(n)
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)]).keys()
# sort, maybe
fs.sort()

print fs
['2', '3', '4', '5', '9', 'bar', 'foo_10', 'foo_11', 'foo_6',
'foo_7', 'foo_8', 'foo_9', 'x07', 'x08', 'x09', 'xxx']
There is still a question about a range specification like

foo09-123

which is treated as as error in the code above.

/Jean Brouwers

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
On Fri, 16 Jul 2004 17:10:03 GMT
Jean Brouwers <JB***********************@no.spam.net> threw this fish to the
penguins:
With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))


Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

(I'm contracting out construction of a special circle of hell for users
who define [foo7, foo08, foo9, foo10] -- they won't be around to complain
that it parses wrong ;-)

In article <20040716111324.09267883.gr*@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:

> [python 2.3.3, x86 linux]
> For each run of my app, I have a known set of (<100) wafer names.
> Names are sometimes simply integers, sometimes a short string, and
> sometimes a short string followed by an integer, e.g.:
>
> 5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11
>
> I need to read user input of a subset of these. The user will type a
> set of names separated by commas (with optional white space), but there
> may also be sequences indicated by a dash between two integers, e.g.:
>
> "9-11" meaning 9,10,11
> "foo_11-13" meaning foo_11, foo_12, and foo_13.
> "foo_9-11" meaning foo_9,foo_10,foo_11, or
> "bar09-11" meaning bar09,bar10,bar11 ^^^^^^^^ ^^^^^^^^^^^^^^^^^ > (Yes, I have to deal with integers with and without leading zeros)
> [I'll proclaim inverse sequences like "foo_11-9" invalid]
> So a sample input might be:
>
> 9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx
>
> The order of the resultant list of names is not important; I have
> to sort them later anyway.
>
> Fancy error recovery is not needed; an invalid input string will be
> peremptorily wiped from the screen with an annoyed beep.
>
> Can anyone suggest a clean way of doing this? I don't mind
> installing and importing some parsing package, as long as my code
> using it is clear and simple. Performance is not an issue.
>
>
> -- George Young

Jul 18 '05 #9

P: n/a

Further (, final?) update and check some range errors.

/Jean Brouwers
ns = '9,2-4,xxx, bar, foo_6-11,x07-9, 0-1, 00-1'

# list of names and expanded names
fs = []
for n in ns.split(','):
n = n.strip()
r = n.split('-')
if len(r) == 2: # expand name with range
h = r[0].rstrip('0123456789') # header
r[0] = r[0][len(h):]
# range can't be empty
if not (r[0] and r[1]):
raise ValueError, 'empty range: ' + n
# handle leading zeros
if r[0] == '0' or r[0][0] != '0':
h += '%d'
else:
w = [len(i) for i in r]
if w[1] > w[0]:
raise ValueError, 'wide range: ' + n
h += '%%0%dd' % max(w)
# check range
r = [int(i, 10) for i in r]
if r[0] > r[1]:
raise ValueError, 'bad range: ' + n
for i in range(r[0], r[1]+1):
fs.append(h % i)
else: # simple name
fs.append(n)

# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)]).keys()
# sort, maybe
fs.sort()

print fs
['0', '00', '01', '1', '2', '3', '4', '9', 'bar', 'foo_10',
'foo_11', 'foo_6', 'foo_7', 'foo_8', 'foo_9', 'x07', 'x08', 'x09',
'xxx']


In article <16******************************************@no.s pam.net>,
Jean Brouwers <JB***********************@no.spam.net> wrote:
With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort, maybe
fs.sort()

print fs
/Jean Brouwers
In article <160720040947530644%JB***********************@no.s pam.net>,
Jean Brouwers <JB***********************@no.spam.net> wrote:
Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('0123456789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20*************************@ll.mit.edu>, george young
<gr*@ll.mit.edu> wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #10

P: n/a
george young wrote:

....
Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young


Here's a $50 solution. Using the code below, you can do:

mywafers = allwafers('q9-43, s12, a11-23')

Then you can use tests like:

if 'wafer23' in mywafers:
...

---

import sets

def elements(wafers):
'''get wafer names from wafer-names and wafer-name-ranges'''
for element in wafers:
if '-' not in element:
yield element # simple name
continue
# name with range
start, final = element.split('-')
preamble = start.rstrip('0123456789') # header
initial = start[len(preamble):]
if len(final) == len(initial) or initial.startswith('0'):
# use fixed length strings.
assert len(initial) >= len(final) # eg: foo01-009
pattern = preamble + '%0' + str(len(initial)) + 'd'
else:
# unequal length: should be foo08-09, not foo008-9
assert not (final.startswith('0') or initial.startswith('0'))
pattern = preamble + '%d'
for number in range(int(initial), int(final)+1):
yield pattern % number

def allwafers(spec):
return sets.Set(elements([element.strip()
for element in spec.split(',')]))

if __name__ == '__main__':
ns = ('9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10,'
' foo_11, q08-12')
nr = ('9 foo7 foo8 foo9 2 3 4 xxx 5 6 7 8 9 bar foo_6 foo_10 '
' foo_11 q08 q09 q10 q11 q12').split()
assert sets.Set(nr) == allwafers(ns)
--
-Scott David Daniels
Sc***********@Acm.Org
Jul 18 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.