473,383 Members | 1,870 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

extracting a substring

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
....
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!

Apr 19 '06 #1
5 2285
Em Ter, 2006-04-18 Ã*s 17:25 -0700, b8*******@yahoo.com escreveu:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.


Some ways:

1) Regular expressions, as you said:
from re import compile
find = compile("a53bc_([1-9]*)\\.txt").findall
find('a53bc_531.txt\na53bc_2285.txt\na53bc_359.txt ') ['531', '2285', '359']

2) Using ''.split: [x.split('.')[0].split('_')[1] for x in 'a53bc_531.txt \na53bc_2285.txt\na53bc_359.txt'.splitlines()]
['531', '2285', '359']

3) Using indexes (be careful!): [x[6:-4] for x in 'a53bc_531.txt\na53bc_2285.txt

\na53bc_359.txt'.splitlines()]
['531', '2285', '359']

Measuring speeds:

$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_([1-9]*)\\.txt").findall; s = "a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt"' 'find(s)'
100000 loops, best of 3: 3.03 usec per loop

$ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n"[:-1]' "[x.split('.')[0].split('_')[1] for x in
s.splitlines()]"
100000 loops, best of 3: 7.64 usec per loop

$ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n"[:-1]' "[x[6:-4] for x in s.splitlines()]"
100000 loops, best of 3: 2.47 usec per loop
$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_([1-9]*)\\.txt").findall; s = ("a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt\n"*1000)[:-1]' 'find(s)'
1000 loops, best of 3: 1.95 msec per loop

$ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n" * 1000)[:-1]' "[x.split('.')[0].split('_')[1] for x
in s.splitlines()]"
100 loops, best of 3: 6.51 msec per loop

$ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n" * 1000)[:-1]' "[x[6:-4] for x in s.splitlines()]"
1000 loops, best of 3: 1.53 msec per loop
Summary: using indexes is less powerful than regexps, but faster.

HTH,

--
Felipe.

Apr 19 '06 #2
b8*******@yahoo.com wrote:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!

Try this:
import re
pattern = re.compile("a53bc_([0-9]*).txt")

s = "a53bc_531.txt"
match = pattern.match(s)
if match: .... print int(match.group(1))
.... else:
.... print "No match"
....
531


Hope that helps,
Gary Herron
Apr 19 '06 #3
You don't need a regex for this, as long as the prefix and suffix are fixed
lengths, the following will do:
"a53bc_531.txt"[6:-4] '531'
"a53bc_2285.txt"[6:-4]
'2285'

b8*******@yahoo.com wrote:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!


--
Dale Strickland-Clark
Riverhall Systems - www.riverhall.co.uk

Apr 19 '06 #4
b8*******@yahoo.com wrote:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.


In that case a fixed slice will do what you want:

In [1]: s='a53bc_531.txt'

In [2]: s[6:-4]
Out[2]: '531'

Kent
Apr 19 '06 #5
rx
and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.


I'm not sure about what you mean by "always fixed" but I guess it means that
you have n files with a fixed start and a changing ending, and m files with
a fixed start and a changing ending, ....

import re
filenames=['ac99_124.txt', 'ac99_344.txt', 'ac99_445.txt']
numbers=[]
for i in filenames:
numbers.append(int(re.compile('[^_]*_(?P<number>[^.]*).txt').match(i).group('number')))

this sets numbers to: [124, 344, 445]
Apr 19 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: cpp_weenie | last post by:
Given a std::string of the form "default(N)" where N is an integer number of any length (e.g. the literal string might be "default(243)"), what is the quickest way to extract the characters...
3
by: Abby Lee | last post by:
I have a string with 10 charaters 123456789A. I only want the first 6 charaters followed by the last two...getting rid of the 7th and 8th charaters. If I use document.forms.MyTextBox.value =...
3
by: War Eagle | last post by:
I've been looking at .substring and .trim methods and I still have a question about extracting substrings from a textbox. Basically the textbox contains the full path to a file ... for example ...
6
by: RSH | last post by:
Hi, I have quite a few .DAT data files that i need to extract the data out of. When i open the files in a text editor I see all of the text that I need to get at BUT there are a lot of junk...
0
by: Ankit Aneja | last post by:
string comm="CONTSCAN E:\\projects backup\\ankitclam backup\\Clamtest\\testing\\hello.txt\r\n" int x=comm.Length; x=x-7; string path; path=comm.Substring(9,x); MessageBox.Show(path); it...
1
by: RogueClient | last post by:
Hi all, I have what I suspect is a day one rookie question yet for the life of me I can't find a good answer on the net. I need to extract a quoted substring from within a string like so: ...
1
by: annaannie | last post by:
hello sir, My aim is to extract 'id' and 'ac' from given XML files,and store the results in two different files.the code i wrote can extract 'ids',and give the output in a file.But i cant extract...
1
by: Phat G5 (G3) | last post by:
I found this little script for extracting parameters from a url but wondered what the shortest and most efficient way to do it would be, like the following or via regexp? function...
0
by: pankajd | last post by:
hi all, i need an urgent help for writing a shell script which will extract out and print a substring which is the shortest substring from the given string where first and last character of that...
11
by: Ebenezer | last post by:
Let's suppose I have some nodes in an XML file, with an URL attribute: <node url="mypage.php?name1=value1&foo=bar&foo2=bar2&name2=value0" /> <node...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.