need simple parsing ability

george young

[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,fo o_11, or
"bar09-11" meaning bar09,bar10,bar 11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo 9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young
--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)

Jul 18 '05 #1

Subscribe Reply

1873

Jeff Epler

COMMA = ","
OPT_WS = "[ \t]*"
STEM = "([a-zA-Z_]*)"
NUMBER = "([0-9]+)"
OPT_NUMBER = NUMBER + "?"
OPT_SECOND_NUMB ER = "(?:-" + NUMBER + ")?"

import re
splitter = re.compile(COMM A + OPT_WS).split
print `STEM + OPT_NUMBER + OPT_SECOND_NUMB ER`
parser = re.compile(STEM + OPT_NUMBER + OPT_SECOND_NUMB ER).match

def expand(stem, n0, n1):
if not n1:
if n0:
yield "%s%s" % (stem, n0)
else:
yield stem
return
l = len(n0)
n0 = int(n0, 10)
n1 = int(n1, 10)

for i in range(n0, n1+1):
yield "%s%0*d" % (stem, l, i)

def parse_string(li ne):
items = splitter(line)
parsed_items = [parser(i) for i in items]
for i, pi in zip(items, parsed_items):
if i is None:
raise ValueError, "Invalid item: %r" % i
stem = pi.group(1)
n0 = pi.group(2)
n1 = pi.group(3)
if n1 and not n0:
raise ValueError, "Invalid item: %r" % i
for j in expand(stem, n0, n1):
yield j

def test():
s = "9,foo7-9,bar_09-12,2-4,spam"
print s, list(parse_stri ng(s))

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFA9/gvJd01MZaTXX0RA t7rAJ9AKextRdnm cRfQ+y50vJN4pm3 RKwCfQE+c
iffKuKaIvlCedpM ysL4vqkY=
=/dUJ
-----END PGP SIGNATURE-----

Jul 18 '05 #2

Christopher T King

On Fri, 16 Jul 2004, george young wrote:

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,fo o_11, or
"bar09-11" meaning bar09,bar10,bar 11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo 9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

The following should do the trick, using nothing more than the built-in
re package:

---

import re

def expand(pattern) :
r = re.search('\d+-\d+$',pattern)
if r is None:
yield pattern
return
s,e = r.group().split ('-')
for n in xrange(int(s),i nt(e)+1):
yield pattern[:r.start()]+str(n)

def expand_list(pat tern_list):
return [ w for pattern in pattern_list.sp lit(',')
for w in expand(pattern) ]

print expand_list('9, foo7-9,2-4,xxx')

---

If you want to let the syntax be a little more lenient, replace
"pattern_list.s plit(',')" in expand_list() with
"re.split('\s*, \s*',pattern_li st)". This will allow spaces to surround
commas.

Note that because this uses generators, it won't work on Pythons prior to
2.3.

Hope this helps!

Jul 18 '05 #3

Jean Brouwers

Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('012345 6789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20************ *************@l l.mit.edu>, george young
<gr*@ll.mit.edu > wrote:

[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,fo o_11, or
"bar09-11" meaning bar09,bar10,bar 11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo 9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #4

Jean Brouwers

With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('012345 6789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort, maybe
fs.sort()

print fs
/Jean Brouwers
In article <16************ *************** *************** @no.spam.net>,
Jean Brouwers <JB************ ***********@no. spam.net> wrote:

Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('012345 6789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20************ *************@l l.mit.edu>, george young
<gr*@ll.mit.edu > wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,fo o_11, or
"bar09-11" meaning bar09,bar10,bar 11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo 9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #5

george young

On Fri, 16 Jul 2004 17:10:03 GMT
Jean Brouwers <JB************ ***********@no. spam.net> threw this fish to the penguins:

With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('012345 6789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))

Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

(I'm contracting out construction of a special circle of hell for users
who define [foo7, foo08, foo9, foo10] -- they won't be around to complain
that it parses wrong ;-)

In article <20************ *************@l l.mit.edu>, george young
<gr*@ll.mit.edu > wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,fo o_11, or
"bar09-11" meaning bar09,bar10,bar 11 ^^^^^^^^ ^^^^^^^^^^^^^^^ ^^ (Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo 9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)

Jul 18 '05 #6

Paul McGuire

"george young" <gr*@ll.mit.edu > wrote in message
news:2004071611 1324.09267883.g r*@ll.mit.edu.. .

[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,fo o_11, or
"bar09-11" meaning bar09,bar10,bar 11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo 9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young
--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)

Here's a pyparsing solution. The best way to read this is to first look
over the grammar definitions, then to the parse actions attached to the
different bits of the grammar. The most complicated part is the parse
action for integer ranges, in which we try to keep leading zeroes if they
were given in the original string.

You said exception handling is not a big deal, but it is built into
pyparsing. So use as much or as little as you like.

-- Paul
# download pyparsing at http://pyparsing.sourceforge.net

from pyparsing import
Word,delimitedL ist,alphas,alph anums,nums,Lite ral,StringEnd,P arseException

# define basic grammar
integer = Word(nums)
integerRange = integer.setResu ltsName("start" ) + "-" + \
integer.setResu ltsName("end")
word = Word(alphas+"_" )
wordRange = word.setResults Name("base") + ( integerRange | integer )
waferList = delimitedList( integerRange | integer | wordRange | word ) + \
StringEnd()

# define parse actions (to expand range references)
def expandIntRange( st,loc,toks):
expandedNums = range( int(toks.start) , int(toks.end)+1 )
# make sure leading zeroes are retained
if toks.start.star tswith('0'):
return [ "%0*d"%(len(tok s.start),n) for n in expandedNums ]
else:
return [ str(n) for n in expandedNums ]

def expandWordRange (st,loc,toks):
baseNumPairs = zip( [toks.base]*(len(toks)-1), toks[1:] )
return [ "".join(pai r) for pair in baseNumPairs ]

# attach parse actions to grammar elements
integerRange.se tParseAction( expandIntRange )
wordRange.setPa rseAction( expandWordRange )

# run tests (last one an error)
testData = """
9,foo7-9,2-4,xxx
9,foo_7- 9, 2-4, xxx
9 , foo07-09,2 - 4, bar6, xxx
9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10-11
9,foo7-9,2-4,xxx,5- 9, bar, foo_06, foo_010-11
9,foo7-9,2-4,xxx,foo_099-101
9,f07-09-12,xxx
"""

for t in testData.split( "\n")[1:-1]:
try:
print t
print waferList.parse String(t)
except ParseException, pe:
print t
print (" "*pe.loc) + "^"
print pe.msg
print

=============== ======
output:
9,foo7-9,2-4,xxx
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx']

9,foo_7- 9, 2-4, xxx
['9', 'foo_7', 'foo_8', 'foo_9', '2', '3', '4', 'xxx']

9 , foo07-09,2 - 4, bar6, xxx
['9', 'foo07', 'foo08', 'foo09', '2', '3', '4', 'bar6', 'xxx']

9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10-11
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', '5', '6', '7', '8', '9',
'bar', 'foo_6', 'foo_10', 'foo_11']

9,foo7-9,2-4,xxx,5- 9, bar, foo_06, foo_010-11
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', '5', '6', '7', '8', '9',
'bar', 'foo_06', 'foo_010', 'foo_011']

9,foo7-9,2-4,xxx,foo_099-101
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', 'foo_099', 'foo_100',
'foo_101']

9,f07-09-12,xxx
9,f07-09-12,xxx
^
Expected end of text

Jul 18 '05 #7

Christopher T King

On Fri, 16 Jul 2004, george young wrote:

Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

An updated version of what I previously posted should do the trick:

---

import re

def expand(pattern) :
r = re.search('\d+-\d+$',pattern)
if r is None:
yield pattern
return
s,e = r.group().split ('-')
l = len(s)
for n in xrange(int(s),i nt(e)+1):
yield pattern[:r.start()]+'%0*d' % (l,n)

def expand_list(pat tern_list):
return [ w for pattern in re.split('\s*,\ s*',pattern_lis t)
for w in expand(pattern) ]

pattern_list = '9,foo07-11,2-4,xxx'

print expand_list(pat tern_list)

# --> ['9', 'foo07', 'foo08', 'foo09', 'foo10', 'foo11', '2', '3', '4', 'xxx']

---

Why do I feel like there's a contest going? ;)

Jul 18 '05 #8

Jean Brouwers

Another fix, to handle leading zeros.
ns = '9,2-4,xxx,5, bar, foo_6-11,x07-9'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# list of names and expanded names
fs = []
for n in ns:
r = n.split('-')
if len(r) == 2: # expand name with range
h = r[0].rstrip('012345 6789') # header
r[0] = r[0][len(h):]
if r[0][0] != '0':
h += '%d'
else: # leading zeros
w = [len(i) for i in r]
if w[1] > w[0]:
raise ValueError, 'bad range: ' + n
h += '%%0%dd' % max(w)
for i in range(int(r[0],10), 1+int(r[1],10)):
fs.append(h % i)
else: # simple name
fs.append(n)
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)]).keys()
# sort, maybe
fs.sort()

print fs

['2', '3', '4', '5', '9', 'bar', 'foo_10', 'foo_11', 'foo_6',
'foo_7', 'foo_8', 'foo_9', 'x07', 'x08', 'x09', 'xxx']
There is still a question about a range specification like

foo09-123

which is treated as as error in the code above.

/Jean Brouwers

In article <20************ *************@l l.mit.edu>, george young
<gr*@ll.mit.edu > wrote:

On Fri, 16 Jul 2004 17:10:03 GMT
Jean Brouwers <JB************ ***********@no. spam.net> threw this fish to the
penguins:
With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('012345 6789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))

Mmm, not quite. If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11]
which is wrong. It should yield fs==[foo08, foo09, foo10, foo11].
I.e., it must maintain leading zeros in ranges.

(I'm contracting out construction of a special circle of hell for users
who define [foo7, foo08, foo9, foo10] -- they won't be around to complain
that it parses wrong ;-)

In article <20040716111324 .09267883.gr*@l l.mit.edu>, george young
<gr*@ll.mit.edu > wrote:

> [python 2.3.3, x86 linux]
> For each run of my app, I have a known set of (<100) wafer names.
> Names are sometimes simply integers, sometimes a short string, and
> sometimes a short string followed by an integer, e.g.:
>
> 5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11
>
> I need to read user input of a subset of these. The user will type a
> set of names separated by commas (with optional white space), but there
> may also be sequences indicated by a dash between two integers, e.g.:
>
> "9-11" meaning 9,10,11
> "foo_11-13" meaning foo_11, foo_12, and foo_13.
> "foo_9-11" meaning foo_9,foo_10,fo o_11, or
> "bar09-11" meaning bar09,bar10,bar 11 ^^^^^^^^ ^^^^^^^^^^^^^^^ ^^ > (Yes, I have to deal with integers with and without leading zeros)
> [I'll proclaim inverse sequences like "foo_11-9" invalid]
> So a sample input might be:
>
> 9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo 9,2,3,4,xxx
>
> The order of the resultant list of names is not important; I have
> to sort them later anyway.
>
> Fancy error recovery is not needed; an invalid input string will be
> peremptorily wiped from the screen with an annoyed beep.
>
> Can anyone suggest a clean way of doing this? I don't mind
> installing and importing some parsing package, as long as my code
> using it is clear and simple. Performance is not an issue.
>
>
> -- George Young

Jul 18 '05 #9

Jean Brouwers

Further (, final?) update and check some range errors.

/Jean Brouwers
ns = '9,2-4,xxx, bar, foo_6-11,x07-9, 0-1, 00-1'

# list of names and expanded names
fs = []
for n in ns.split(','):
n = n.strip()
r = n.split('-')
if len(r) == 2: # expand name with range
h = r[0].rstrip('012345 6789') # header
r[0] = r[0][len(h):]
# range can't be empty
if not (r[0] and r[1]):
raise ValueError, 'empty range: ' + n
# handle leading zeros
if r[0] == '0' or r[0][0] != '0':
h += '%d'
else:
w = [len(i) for i in r]
if w[1] > w[0]:
raise ValueError, 'wide range: ' + n
h += '%%0%dd' % max(w)
# check range
r = [int(i, 10) for i in r]
if r[0] > r[1]:
raise ValueError, 'bad range: ' + n
for i in range(r[0], r[1]+1):
fs.append(h % i)
else: # simple name
fs.append(n)

# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)]).keys()
# sort, maybe
fs.sort()

print fs

['0', '00', '01', '1', '2', '3', '4', '9', 'bar', 'foo_10',
'foo_11', 'foo_6', 'foo_7', 'foo_8', 'foo_9', 'x07', 'x08', 'x09',
'xxx']

In article <16************ *************** *************** @no.spam.net>,
Jean Brouwers <JB************ ***********@no. spam.net> wrote:
With two fixes, one bug and one typo:

ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('012345 6789') # header
for i in range(int(r[0][len(h):]), 1 + int(r[1])):
fs.append(h + str(i))
# remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort, maybe
fs.sort()

print fs
/Jean Brouwers
In article <16072004094753 0644%JB******** *************** @no.spam.net>,
Jean Brouwers <JB************ ***********@no. spam.net> wrote:
Here is one possible way to do that with just Python:
ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'

# list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
# expand names with range
fs = []
for n in ns:
r = n.split('-')
if len(r) != 2: # simple name
fs.append(n)
else: # name with range
h = r[0].rstrip('012345 6789') # header
for i in range(int(r[0][len(h):]), int(r[1])):
fs.append(h + str(i))
# remove duplicitates
fs = dict([(n, i) for i, n in enumerate(fs)])
fs = fs.keys()
# sort
fs.sort()

print fs
/Jean Brouwers

In article <20************ *************@l l.mit.edu>, george young
<gr*@ll.mit.edu > wrote:
[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,fo o_11, or
"bar09-11" meaning bar09,bar10,bar 11

(Yes, I have to deal with integers with and without leading zeros)
[I'll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo 9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don't mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young

Jul 18 '05 #10

Similar topics

7039

Need help to understand difference, and contrast between Relational database model and the Object-Oriented model

by: yensao | last post by:

Hi, I have a hard time to understand difference and similarities between Relational database model and the Object-Oriented model. Can somebody help me with this? Thank you in advance. Yensao

Oracle Database

2861

Help with a Simple Question

by: Terry | last post by:

Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed loaded into cache, the slideshow doesn't look very nice. I am not sure how/when to call the slideshow() function to make sure it starts after...

Javascript

1963

Parsing a local file

by: Steve | last post by:

I have a web app that needs to parse through a file that is located on the client machine. I get the file string from a query string & then parse it. It is working fine on my development box but when I put it out on the test server the application is looking for the file on the server & not on the client machine. I am essentially taking...

.NET Framework

3132

parsing varchar fields

by: Scott Marquardt | last post by:

What are some good strategic approaches to using freeform text fields for data that needs to be queried? We have a product whose tables we can't change, and I need to count on a "description" field for storing a value. Two, actually. I'm thinking of adopting this convention: InvoiceNumber@VendorAcronym There'd be a lot of vendors. ...

Microsoft SQL Server

1292

Need Code Sample for ASP and and browsing files and folders on the Server

by: Nut Cracker | last post by:

Hello, If anyone can point me to a good ASP based Control Panel for IIS5, I would be much obliged. I hacked together an ASP site for file uploads and sharing. Its very simple, and basically shows the user the files that exist in one folder. I want to expand this functionallity so that they can create/delete subfolders and files under...

ASP.NET

3018

Simple Regular Expression need

by: Q. John Chen | last post by:

I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How ??

ASP.NET

2444

Xerces and URL access/parsing

by: gatorbeaver | last post by:

I am trying to upgrade an application that is currently using Xerces and is only parsing XML files that are on the local host. I want to add the ability to parse XML files that are located on a web-server. This is being done using C++ on the Fedora Core 4 OS. I have seen many examples but have not been able to get any to work. Can someone...

C / C++

5100

TCL/PHP/XML problem: I need to convert an XML file into a TCL list

by: comp.lang.tcl | last post by:

My TCL proc, XML_GET_ALL_ELEMENT_ATTRS, is supposed to convert an XML file into a TCL list as follows: attr1 {val1} attr2 {val2} ... attrN {valN} This is the TCL code that does this: set contents ]; close $fileID

PHP

1876

Need of a code snipet which converts mm/dd/yy to dd/mm/yy

by: santanu mishra | last post by:

Hi , I am stuck with a requirement from my client to change the date format from mm/dd/yy to dd/mm/yy .If any body can help me out with this regard as its very much urgent. Regards, Santanu

Javascript

7411

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

7669

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

7926

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

7439

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

7773

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

5343

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

4962

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3450

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1028

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP