extracting a substring

b83503104

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
....
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!

Apr 19 '06 #1

Subscribe Post Reply

2285

Felipe Almeida Lessa

Em Ter, 2006-04-18 Ã*s 17:25 -0700, b8*******@yahoo.com escreveu:

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

Some ways:

1) Regular expressions, as you said:

from re import compile
find = compile("a53bc_([1-9]*)\\.txt").findall
find('a53bc_531.txt\na53bc_2285.txt\na53bc_359.txt ') ['531', '2285', '359']

2) Using ''.split: [x.split('.')[0].split('_')[1] for x in 'a53bc_531.txt \na53bc_2285.txt\na53bc_359.txt'.splitlines()]
['531', '2285', '359']

3) Using indexes (be careful!): [x[6:-4] for x in 'a53bc_531.txt\na53bc_2285.txt

\na53bc_359.txt'.splitlines()]
['531', '2285', '359']

Measuring speeds:

$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_([1-9]*)\\.txt").findall; s = "a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt"' 'find(s)'
100000 loops, best of 3: 3.03 usec per loop

$ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n"[:-1]' "[x.split('.')[0].split('_')[1] for x in
s.splitlines()]"
100000 loops, best of 3: 7.64 usec per loop

$ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n"[:-1]' "[x[6:-4] for x in s.splitlines()]"
100000 loops, best of 3: 2.47 usec per loop
$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_([1-9]*)\\.txt").findall; s = ("a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt\n"*1000)[:-1]' 'find(s)'
1000 loops, best of 3: 1.95 msec per loop

$ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n" * 1000)[:-1]' "[x.split('.')[0].split('_')[1] for x
in s.splitlines()]"
100 loops, best of 3: 6.51 msec per loop

$ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n" * 1000)[:-1]' "[x[6:-4] for x in s.splitlines()]"
1000 loops, best of 3: 1.53 msec per loop
Summary: using indexes is less powerful than regexps, but faster.

HTH,

--
Felipe.

Apr 19 '06 #2

Gary Herron

b8*******@yahoo.com wrote:

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!

Try this:

import re
pattern = re.compile("a53bc_([0-9]*).txt")

s = "a53bc_531.txt"
match = pattern.match(s)
if match: .... print int(match.group(1))
.... else:
.... print "No match"
....
531

Hope that helps,
Gary Herron

Apr 19 '06 #3

Dale Strickland-Clark

You don't need a regex for this, as long as the prefix and suffix are fixed
lengths, the following will do:

"a53bc_531.txt"[6:-4] '531'
"a53bc_2285.txt"[6:-4]
'2285'

b8*******@yahoo.com wrote:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!

--
Dale Strickland-Clark
Riverhall Systems - www.riverhall.co.uk

Apr 19 '06 #4

Kent Johnson

b8*******@yahoo.com wrote:

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

In that case a fixed slice will do what you want:

In [1]: s='a53bc_531.txt'

In [2]: s[6:-4]
Out[2]: '531'

Kent

Apr 19 '06 #5

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I'm not sure about what you mean by "always fixed" but I guess it means that
you have n files with a fixed start and a changing ending, and m files with
a fixed start and a changing ending, ....

import re
filenames=['ac99_124.txt', 'ac99_344.txt', 'ac99_445.txt']
numbers=[]
for i in filenames:
numbers.append(int(re.compile('[^_]*_(?P<number>[^.]*).txt').match(i).group('number')))

this sets numbers to: [124, 344, 445]

Apr 19 '06 #6

Similar topics

Extracting a substring

by: cpp_weenie | last post by:

Given a std::string of the form "default(N)" where N is an integer number of any length (e.g. the literal string might be "default(243)"), what is the quickest way to extract the characters...

C / C++

extracting parts of a string

by: Abby Lee | last post by:

I have a string with 10 charaters 123456789A. I only want the first 6 charaters followed by the last two...getting rid of the 7th and 8th charaters. If I use document.forms.MyTextBox.value =...

Javascript

{simple} Extracting Substrings from textbox

by: War Eagle | last post by:

I've been looking at .substring and .trim methods and I still have a question about extracting substrings from a textbox. Basically the textbox contains the full path to a file ... for example ...

C# / C Sharp

extracting the text out of a binary file

by: RSH | last post by:

Hi, I have quite a few .DAT data files that i need to extract the data out of. When i open the files in a text editor I see all of the text that I need to get at BUT there are a lot of junk...

Visual Basic .NET

extracting substring

by: Ankit Aneja | last post by:

string comm="CONTSCAN E:\\projects backup\\ankitclam backup\\Clamtest\\testing\\hello.txt\r\n" int x=comm.Length; x=x-7; string path; path=comm.Substring(9,x); MessageBox.Show(path); it...

C# / C Sharp

Extracting substring question

by: RogueClient | last post by:

Hi all, I have what I suspect is a day one rookie question yet for the life of me I can't find a good answer on the net. I need to extract a quoted substring from within a string like so: ...

Python

extracting from an XML file

by: annaannie | last post by:

hello sir, My aim is to extract 'id' and 'ac' from given XML files,and store the results in two different files.the code i wrote can extract 'ids',and give the output in a file.But i cant extract...

Java

Extracting parameters

by: Phat G5 (G3) | last post by:

I found this little script for extracting parameters from a url but wondered what the shortest and most efficient way to do it would be, like the following or via regexp? function...

Javascript

shell script for extracting out the shortest substring from given 1st and last char.

by: pankajd | last post by:

hi all, i need an urgent help for writing a shell script which will extract out and print a substring which is the shortest substring from the given string where first and last character of that...

Linux

XSLT - Extracting name-value pairs

by: Ebenezer | last post by:

Let's suppose I have some nodes in an XML file, with an URL attribute: <node url="mypage.php?name1=value1&foo=bar&foo2=bar2&name2=value0" /> <node...

.NET Framework

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware