By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,520 Members | 1,766 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,520 IT Pros & Developers. It's quick & easy.

Array of Chars to String

P: n/a
Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:
astr = "Bob Carol Ted Alice"
letters = "adB"
some_func(astr,letters)

"Bad"

I can write this like this:

astr = "Bob Carol Ted Alice"
letters = "adB"

import sets
alist = [lttr for lttr in astr if lttr in Set(letters)]
newstr = ""
for lttr in alist:
newstr += lttr

But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.

Any ideas?

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jul 19 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a
James Stroud <js*****@mbi.ucla.edu> writes:
But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.


"".join

'as
Jul 19 '05 #2

P: n/a
rbt
James Stroud wrote:
Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:

astr = "Bob Carol Ted Alice"
letters = "adB"
some_func(astr,letters)
"Bad"


astr = "Bob Carol Ted Alice"
letters = "adB"
both = [x for x in astr if x in letters]
print both
['B', 'a', 'd']

Jul 19 '05 #3

P: n/a
On Tue, 19 Apr 2005 13:33:17 -0700, James Stroud <js*****@mbi.ucla.edu> wrote:
Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:
astr = "Bob Carol Ted Alice"
letters = "adB"
some_func(astr,letters)"Bad"

I can write this like this:

astr = "Bob Carol Ted Alice"
letters = "adB"

import sets
alist = [lttr for lttr in astr if lttr in Set(letters)]
newstr = ""
for lttr in alist:
newstr += lttr

But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.

Any ideas?

James

I think this will be worth it if your string to modify is _very_ long:
def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])): ... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... some_func("Bob Carol Ted Alice", 'adB')

'Bad'

see help(str.translate)

If you want to use it in a loop, with the same "letters" I'd want to eliminate the repeated
calculation of the deletions. You could make a factory function that returns a function
that uses deletions from a closure cell. But don't optimize prematurely ;-)

Regards,
Bengt Richter
Jul 19 '05 #4

P: n/a
Bengt Richter wrote:
I think this will be worth it if your string to modify is _very_ long:
>>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])): ... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... >>> some_func("Bob Carol Ted Alice", 'adB') 'Bad'

According to my measurements the string doesn't have to be long at all before
your method is faster - cool use of str.translate:
def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])): ... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... some_func("Bob Carol Ted Alice", 'adB') 'Bad'
def func_join(s, letters): ... return "".join(letter for letter in s if letter in set(letters))
... def func_join1(s, letters): ... return "".join(letter for letter in s if letter in letters)

for multiplier in (1, 10, 100, 1000, 10000): ... print "List multiplier: %s" % multiplier
... print shell.timefunc(func_join, "Bob Carol Ted Alice" * multiplier, 'adB')
... print shell.timefunc(func_join1, "Bob Carol Ted Alice" * multiplier,
'adB')
... print shell.timefunc(some_func, "Bob Carol Ted Alice" * multiplier, 'adB')
...
List multiplier: 1
func_join(...) 11267 iterations, 44.38usec per call
func_join1(...) 38371 iterations, 13.03usec per call
some_func(...) 1230 iterations, 406.69usec per call
List multiplier: 10
func_join(...) 1381 iterations, 362.40usec per call
func_join1(...) 7984 iterations, 62.63usec per call
some_func(...) 1226 iterations, 407.94usec per call
List multiplier: 100
func_join(...) 140 iterations, 3.59msec per call
func_join1(...) 873 iterations, 0.57msec per call
some_func(...) 1184 iterations, 422.42usec per call
List multiplier: 1000
func_join(...) 15 iterations, 35.50msec per call
func_join1(...) 90 iterations, 5.57msec per call
some_func(...) 949 iterations, 0.53msec per call
List multiplier: 10000
func_join(...) 2 iterations, 356.53msec per call
func_join1(...) 9 iterations, 55.59msec per call
some_func(...) 313 iterations, 1.60msec per call


Michael

Jul 19 '05 #5

P: n/a
Michael Spencer wrote:
Bengt Richter wrote:
> I think this will be worth it if your string to modify is _very_ long:

>>> def some_func(s, letters, table=''.join([chr(i) for i in

xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in
letters]))
...
>>> some_func("Bob Carol Ted Alice", 'adB')

'Bad'

According to my measurements the string doesn't have to be long at all
before your method is faster - cool use of str.translate:

....and here's a version that appears faster than "".join across all lengths of
strings:
import string
def some_func1(s, letters, table=string.maketrans("","")): ... return s.translate(table, table.translate(table, letters))
... some_func1("Bob Carol Ted Alice", "adB") 'Bad'
Timings follow:
def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])): ... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... def some_func1(s, letters, table=string.maketrans("","")): ... return s.translate(table, table.translate(table, letters))
... for multiplier in (1, 10, 100, 1000, 10000): ... print "List multiplier: %s" % multiplier
... print shell.timefunc(some_func, "Bob Carol Ted Alice" * multiplier, 'adB')
... print shell.timefunc(some_func1, "Bob Carol Ted Alice" * multiplier,
'adB')
...
List multiplier: 1
some_func(...) 1224 iterations, 408.57usec per call
some_func1(...) 61035 iterations, 8.19usec per call
List multiplier: 10
some_func(...) 1223 iterations, 408.95usec per call
some_func1(...) 54420 iterations, 9.19usec per call
List multiplier: 100
some_func(...) 1190 iterations, 420.48usec per call
some_func1(...) 23436 iterations, 21.34usec per call
List multiplier: 1000
some_func(...) 951 iterations, 0.53msec per call
some_func1(...) 3870 iterations, 129.21usec per call
List multiplier: 10000
some_func(...) 309 iterations, 1.62msec per call
some_func1(...) 417 iterations, 1.20msec per call


Jul 19 '05 #6

P: n/a
On Tue, 19 Apr 2005 17:00:02 -0700, Michael Spencer <ma**@telcopartners.com> wrote:
Michael Spencer wrote:
Bengt Richter wrote:
> I think this will be worth it if your string to modify is _very_ long:

>>> def some_func(s, letters, table=''.join([chr(i) for i in
xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in
letters]))
...
>>> some_func("Bob Carol Ted Alice", 'adB')
'Bad'

According to my measurements the string doesn't have to be long at all
before your method is faster - cool use of str.translate:

...and here's a version that appears faster than "".join across all lengths of
strings:
>>> import string
>>> def some_func1(s, letters, table=string.maketrans("","")): ... return s.translate(table, table.translate(table, letters))
... >>> some_func1("Bob Carol Ted Alice", "adB") 'Bad' >>>

Good one! ;-)

BTW, since str has .translate, why not .maketrans?

Anyway, this will be something to keep in mind when doing character-based joinery ;-)
Timings follow:

Let's just say improved ;-)
(or see parent post)

Regards,
Bengt Richter
Jul 19 '05 #7

P: n/a
Michael Spencer wrote:
*def*func_join(s,*letters):

...*****return*"".join(letter*for*letter*in*s*if*l etter*in*set(letters))


Make that

def func_join(s, letters):
letter_set = set(letters)
return*"".join(letter*for*letter*in*s*if*letter*in *letter_set)

for a fair timing of a set lookup as opposed to set creation.

Peter

Jul 19 '05 #8

P: n/a
Bengt Richter wrote:
... BTW, since str has .translate, why not .maketrans?

Probably because, while I can imagine u'whatever'.translate using a
256-wide table (and raising exceptions for other the rest), I have
more problems imagining the size of the table for a UCS-4 unicode
setup (32 bits per character). I suppose it could be done, but a
nave program might be in for a big shock about memory consumption.

--Scott David Daniels
Sc***********@Acm.Org
Jul 19 '05 #9

P: n/a
Peter Otten wrote:
Michael Spencer wrote:

> def func_join(s, letters):


... return "".join(letter for letter in s if letter in set(letters))

Make that

def func_join(s, letters):
letter_set = set(letters)
return "".join(letter for letter in s if letter in letter_set)

for a fair timing of a set lookup as opposed to set creation.

Peter

Sorry - yes! I trip up over the early-binding of the outer loop, but the
late-binding of the condition

Anyway, here are the revised timings, which confirm the speed-advantage of the
translate approach. And, as before, with such a short list of white-listed
letters, it does not pay to create a set at all, even outside the loop. Note
the speed advantage of func_translate1 is 50:1 for long strings, so as Bengt
pointed out, it's worth keeping this in mind for character-based filtering/joining.
def func_join1(s, letters): ... return "".join(letter for letter in s if letter in letters)
... def func_join2(s, letters): ... letter_set = set(letters)
... return "".join(letter for letter in s if letter in letter_set)
... def func_translate1(s, letters, table=string.maketrans("","")): ... return s.translate(table, table.translate(table, letters))
...
for multiplier in (1, 10, 100, 1000, 10000): ... print "List multiplier: %s" % multiplier
... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
multiplier, 'adB')
... print shell.timefunc(func_join1, "Bob Carol Ted Alice" * multiplier,
'adB')
... print shell.timefunc(func_join2, "Bob Carol Ted Alice" * multiplier,
'adB')
...
List multiplier: 1
func_translate1(...) 62295 iterations, 8.03usec per call
func_join1(...) 36510 iterations, 13.69usec per call
func_join2(...) 30139 iterations, 16.59usec per call
List multiplier: 10
func_translate1(...) 53145 iterations, 9.41usec per call
func_join1(...) 7821 iterations, 63.93usec per call
func_join2(...) 7031 iterations, 71.12usec per call
List multiplier: 100
func_translate1(...) 23170 iterations, 21.58usec per call
func_join1(...) 858 iterations, 0.58msec per call
func_join2(...) 777 iterations, 0.64msec per call
List multiplier: 1000
func_translate1(...) 3761 iterations, 132.96usec per call
func_join1(...) 87 iterations, 5.76msec per call
func_join2(...) 81 iterations, 6.18msec per call
List multiplier: 10000
func_translate1(...) 407 iterations, 1.23msec per call
func_join1(...) 9 iterations, 56.27msec per call
func_join2(...) 8 iterations, 64.76msec per call


Jul 19 '05 #10

P: n/a
Michael Spencer wrote:
Anyway, here are the revised timings...
... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
multiplier, 'adB')


What is shell.timefunc?

Thanks,
Kent
Jul 19 '05 #11

P: n/a
Kent Johnson wrote:
Michael Spencer wrote:
Anyway, here are the revised timings...
... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
multiplier, 'adB')

What is shell.timefunc?


This snippet, which I attach to my interactive shell, since I find timeit
awkward to use in that context:

def _get_timer():
if sys.platform == "win32":
return time.clock
else:
return time.time
return

def timefunc(func, *args, **kwds):
timer = _get_timer()
count, totaltime = 0, 0
while totaltime < 0.5:
t1 = timer()
res = func(*args, **kwds)
t2 = timer()
totaltime += (t2-t1)
count += 1
if count > 1000:
unit = "usec"
timeper = totaltime * 1000000 / count
else:
unit = "msec"
timeper = totaltime * 1000 / count
return "%s(...) %s iterations, %.2f%s per call" % \
(func.__name__, count, timeper, unit)

Michael

Jul 19 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.