Em Ter, 2006-04-18 Ã*s 17:25 -0700,
b8*******@yahoo.com escreveu:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt
and I want to extract the numbers 531, 2285, ...,359.
Some ways:
1) Regular expressions, as you said:
from re import compile
find = compile("a53bc_([1-9]*)\\.txt").findall
find('a53bc_531.txt\na53bc_2285.txt\na53bc_359.txt ')
['531', '2285', '359']
2) Using ''.split: [x.split('.')[0].split('_')[1] for x in 'a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt'.splitlines()]
['531', '2285', '359']
3) Using indexes (be careful!): [x[6:-4] for x in 'a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt'.splitlines()]
['531', '2285', '359']
Measuring speeds:
$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_([1-9]*)\\.txt").findall; s = "a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt"' 'find(s)'
100000 loops, best of 3: 3.03 usec per loop
$ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n"[:-1]' "[x.split('.')[0].split('_')[1] for x in
s.splitlines()]"
100000 loops, best of 3: 7.64 usec per loop
$ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n"[:-1]' "[x[6:-4] for x in s.splitlines()]"
100000 loops, best of 3: 2.47 usec per loop
$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_([1-9]*)\\.txt").findall; s = ("a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt\n"*1000)[:-1]' 'find(s)'
1000 loops, best of 3: 1.95 msec per loop
$ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n" * 1000)[:-1]' "[x.split('.')[0].split('_')[1] for x
in s.splitlines()]"
100 loops, best of 3: 6.51 msec per loop
$ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n" * 1000)[:-1]' "[x[6:-4] for x in s.splitlines()]"
1000 loops, best of 3: 1.53 msec per loop
Summary: using indexes is less powerful than regexps, but faster.
HTH,
--
Felipe.