473,796 Members | 2,640 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

extracting a substring

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
....
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!

Apr 19 '06 #1
5 2301
Em Ter, 2006-04-18 Ã*s 17:25 -0700, b8*******@yahoo .com escreveu:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.


Some ways:

1) Regular expressions, as you said:
from re import compile
find = compile("a53bc_ ([1-9]*)\\.txt").find all
find('a53bc_531 .txt\na53bc_228 5.txt\na53bc_35 9.txt') ['531', '2285', '359']

2) Using ''.split: [x.split('.')[0].split('_')[1] for x in 'a53bc_531.txt \na53bc_2285.tx t\na53bc_359.tx t'.splitlines()]
['531', '2285', '359']

3) Using indexes (be careful!): [x[6:-4] for x in 'a53bc_531.txt\ na53bc_2285.txt

\na53bc_359.txt '.splitlines()]
['531', '2285', '359']

Measuring speeds:

$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_ ([1-9]*)\\.txt").find all; s = "a53bc_531. txt
\na53bc_2285.tx t\na53bc_359.tx t"' 'find(s)'
100000 loops, best of 3: 3.03 usec per loop

$ python2.4 -m timeit -s 's = "a53bc_531.txt\ na53bc_2285.txt
\na53bc_359.txt \n"[:-1]' "[x.split('.')[0].split('_')[1] for x in
s.splitlines()]"
100000 loops, best of 3: 7.64 usec per loop

$ python2.4 -m timeit -s 's = "a53bc_531.txt\ na53bc_2285.txt
\na53bc_359.txt \n"[:-1]' "[x[6:-4] for x in s.splitlines()]"
100000 loops, best of 3: 2.47 usec per loop
$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_ ([1-9]*)\\.txt").find all; s = ("a53bc_531. txt
\na53bc_2285.tx t\na53bc_359.tx t\n"*1000)[:-1]' 'find(s)'
1000 loops, best of 3: 1.95 msec per loop

$ python2.4 -m timeit -s 's = ("a53bc_531.txt \na53bc_2285.tx t
\na53bc_359.txt \n" * 1000)[:-1]' "[x.split('.')[0].split('_')[1] for x
in s.splitlines()]"
100 loops, best of 3: 6.51 msec per loop

$ python2.4 -m timeit -s 's = ("a53bc_531.txt \na53bc_2285.tx t
\na53bc_359.txt \n" * 1000)[:-1]' "[x[6:-4] for x in s.splitlines()]"
1000 loops, best of 3: 1.53 msec per loop
Summary: using indexes is less powerful than regexps, but faster.

HTH,

--
Felipe.

Apr 19 '06 #2
b8*******@yahoo .com wrote:
Hi,
I have a bunch of strings like
a53bc_531.tx t
a53bc_2285.t xt
...
a53bc_359.tx t

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!

Try this:
import re
pattern = re.compile("a53 bc_([0-9]*).txt")

s = "a53bc_531. txt"
match = pattern.match(s )
if match: .... print int(match.group (1))
.... else:
.... print "No match"
....
531


Hope that helps,
Gary Herron
Apr 19 '06 #3
You don't need a regex for this, as long as the prefix and suffix are fixed
lengths, the following will do:
"a53bc_531. txt"[6:-4] '531'
"a53bc_2285.txt "[6:-4]
'2285'

b8*******@yahoo .com wrote:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!


--
Dale Strickland-Clark
Riverhall Systems - www.riverhall.co.uk

Apr 19 '06 #4
b8*******@yahoo .com wrote:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.


In that case a fixed slice will do what you want:

In [1]: s='a53bc_531.tx t'

In [2]: s[6:-4]
Out[2]: '531'

Kent
Apr 19 '06 #5
rx
and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.


I'm not sure about what you mean by "always fixed" but I guess it means that
you have n files with a fixed start and a changing ending, and m files with
a fixed start and a changing ending, ....

import re
filenames=['ac99_124.txt', 'ac99_344.txt', 'ac99_445.txt']
numbers=[]
for i in filenames:
numbers.append( int(re.compile( '[^_]*_(?P<number>[^.]*).txt').match( i).group('numbe r')))

this sets numbers to: [124, 344, 445]
Apr 19 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
14816
by: cpp_weenie | last post by:
Given a std::string of the form "default(N)" where N is an integer number of any length (e.g. the literal string might be "default(243)"), what is the quickest way to extract the characters representing the integer into another std::string? In the example above, I'd want to end up with a std:string whose value is "243". The substrings "default(" and ")" are invariant - they're always present in the string I have to work with. ...
3
1386
by: Abby Lee | last post by:
I have a string with 10 charaters 123456789A. I only want the first 6 charaters followed by the last two...getting rid of the 7th and 8th charaters. If I use document.forms.MyTextBox.value = TheString.substring()...I get the first 6 charaters. (,8,9) does not work nor does (,) and I get just as bad results with (0,1,2,3,4,5,8,9) how do I get a string with the 7th and 8th charaters missing?
3
2657
by: War Eagle | last post by:
I've been looking at .substring and .trim methods and I still have a question about extracting substrings from a textbox. Basically the textbox contains the full path to a file ... for example C:\testfile.txt or ... C:\download\temp\dlls\testfile.dll I want to be able to extract the file name from this textbox (and put it in a string) when I click a button. Can someone illustrate how to write this line of code?
6
1852
by: RSH | last post by:
Hi, I have quite a few .DAT data files that i need to extract the data out of. When i open the files in a text editor I see all of the text that I need to get at BUT there are a lot of junk (binary?) characters and white space in non logical formatting positions. Here is a small sample of what the data looks like: 0~ 0501101010512505011132451235 >   ô ô
0
1164
by: Ankit Aneja | last post by:
string comm="CONTSCAN E:\\projects backup\\ankitclam backup\\Clamtest\\testing\\hello.txt\r\n" int x=comm.Length; x=x-7; string path; path=comm.Substring(9,x); MessageBox.Show(path); it works fine and give path as E:\projects backup\ankitclam backup\Clamtest\testing\hello.txt
1
7391
by: RogueClient | last post by:
Hi all, I have what I suspect is a day one rookie question yet for the life of me I can't find a good answer on the net. I need to extract a quoted substring from within a string like so: Original String: This is a 'test' Result: test Is there a simple way to do this without using complex regex expressions? It goes without saying I don't know what will be within the quotes beforehand......
1
1409
by: annaannie | last post by:
hello sir, My aim is to extract 'id' and 'ac' from given XML files,and store the results in two different files.the code i wrote can extract 'ids',and give the output in a file.But i cant extract 'ac'.I want to extract all values of ac ,for eg ac="Q708T3",ie the output file should contain only Q708T3. Kindly provide a solution. The input file( ie XML ) is as follows: <?xml version="1.0" ?> - <EBIApplicationResult...
1
1601
by: Phat G5 (G3) | last post by:
I found this little script for extracting parameters from a url but wondered what the shortest and most efficient way to do it would be, like the following or via regexp? function getParameter(paramName) { var currentUrl = window.location.search var strBegin = currentUrl.indexOf(paramName) + (paramName.length+1) var strEnd = currentUrl.indexOf("&",strBegin) if (strEnd==-1)
0
3557
by: pankajd | last post by:
hi all, i need an urgent help for writing a shell script which will extract out and print a substring which is the shortest substring from the given string where first and last character of that substring will be given by the user. for e.g. if str="abcdpqracdpqaserd" now if the user gives 'a' and 'd' as the first and last character of the substringi.e. command line arguments.this should extract out acd as the shortest string. please give...
11
4571
by: Ebenezer | last post by:
Let's suppose I have some nodes in an XML file, with an URL attribute: <node url="mypage.php?name1=value1&foo=bar&foo2=bar2&name2=value0" /> <node url="myotherpage.php?name4=value4&foo=bar3&foo2=bar5&name2=value8" /> and so on. Let's suppose I want to retrieve this @url parameter, BUT ONLY with the values, in querystring, associated with "foo" and "foo2" (thus discarding name1, name2, name4 and every other different ones).
0
9685
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9535
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10465
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10242
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10200
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7558
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5453
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5582
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3744
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.