473,399 Members | 4,192 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

Array of Chars to String

Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:
astr = "Bob Carol Ted Alice"
letters = "adB"
some_func(astr,letters)

"Bad"

I can write this like this:

astr = "Bob Carol Ted Alice"
letters = "adB"

import sets
alist = [lttr for lttr in astr if lttr in Set(letters)]
newstr = ""
for lttr in alist:
newstr += lttr

But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.

Any ideas?

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jul 19 '05 #1
11 2972
James Stroud <js*****@mbi.ucla.edu> writes:
But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.


"".join

'as
Jul 19 '05 #2
rbt
James Stroud wrote:
Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:

astr = "Bob Carol Ted Alice"
letters = "adB"
some_func(astr,letters)
"Bad"


astr = "Bob Carol Ted Alice"
letters = "adB"
both = [x for x in astr if x in letters]
print both
['B', 'a', 'd']

Jul 19 '05 #3
On Tue, 19 Apr 2005 13:33:17 -0700, James Stroud <js*****@mbi.ucla.edu> wrote:
Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:
astr = "Bob Carol Ted Alice"
letters = "adB"
some_func(astr,letters)"Bad"

I can write this like this:

astr = "Bob Carol Ted Alice"
letters = "adB"

import sets
alist = [lttr for lttr in astr if lttr in Set(letters)]
newstr = ""
for lttr in alist:
newstr += lttr

But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.

Any ideas?

James

I think this will be worth it if your string to modify is _very_ long:
def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])): ... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... some_func("Bob Carol Ted Alice", 'adB')

'Bad'

see help(str.translate)

If you want to use it in a loop, with the same "letters" I'd want to eliminate the repeated
calculation of the deletions. You could make a factory function that returns a function
that uses deletions from a closure cell. But don't optimize prematurely ;-)

Regards,
Bengt Richter
Jul 19 '05 #4
Bengt Richter wrote:
I think this will be worth it if your string to modify is _very_ long:
>>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])): ... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... >>> some_func("Bob Carol Ted Alice", 'adB') 'Bad'

According to my measurements the string doesn't have to be long at all before
your method is faster - cool use of str.translate:
def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])): ... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... some_func("Bob Carol Ted Alice", 'adB') 'Bad'
def func_join(s, letters): ... return "".join(letter for letter in s if letter in set(letters))
... def func_join1(s, letters): ... return "".join(letter for letter in s if letter in letters)

for multiplier in (1, 10, 100, 1000, 10000): ... print "List multiplier: %s" % multiplier
... print shell.timefunc(func_join, "Bob Carol Ted Alice" * multiplier, 'adB')
... print shell.timefunc(func_join1, "Bob Carol Ted Alice" * multiplier,
'adB')
... print shell.timefunc(some_func, "Bob Carol Ted Alice" * multiplier, 'adB')
...
List multiplier: 1
func_join(...) 11267 iterations, 44.38usec per call
func_join1(...) 38371 iterations, 13.03usec per call
some_func(...) 1230 iterations, 406.69usec per call
List multiplier: 10
func_join(...) 1381 iterations, 362.40usec per call
func_join1(...) 7984 iterations, 62.63usec per call
some_func(...) 1226 iterations, 407.94usec per call
List multiplier: 100
func_join(...) 140 iterations, 3.59msec per call
func_join1(...) 873 iterations, 0.57msec per call
some_func(...) 1184 iterations, 422.42usec per call
List multiplier: 1000
func_join(...) 15 iterations, 35.50msec per call
func_join1(...) 90 iterations, 5.57msec per call
some_func(...) 949 iterations, 0.53msec per call
List multiplier: 10000
func_join(...) 2 iterations, 356.53msec per call
func_join1(...) 9 iterations, 55.59msec per call
some_func(...) 313 iterations, 1.60msec per call


Michael

Jul 19 '05 #5
Michael Spencer wrote:
Bengt Richter wrote:
> I think this will be worth it if your string to modify is _very_ long:

>>> def some_func(s, letters, table=''.join([chr(i) for i in

xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in
letters]))
...
>>> some_func("Bob Carol Ted Alice", 'adB')

'Bad'

According to my measurements the string doesn't have to be long at all
before your method is faster - cool use of str.translate:

....and here's a version that appears faster than "".join across all lengths of
strings:
import string
def some_func1(s, letters, table=string.maketrans("","")): ... return s.translate(table, table.translate(table, letters))
... some_func1("Bob Carol Ted Alice", "adB") 'Bad'
Timings follow:
def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])): ... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
... def some_func1(s, letters, table=string.maketrans("","")): ... return s.translate(table, table.translate(table, letters))
... for multiplier in (1, 10, 100, 1000, 10000): ... print "List multiplier: %s" % multiplier
... print shell.timefunc(some_func, "Bob Carol Ted Alice" * multiplier, 'adB')
... print shell.timefunc(some_func1, "Bob Carol Ted Alice" * multiplier,
'adB')
...
List multiplier: 1
some_func(...) 1224 iterations, 408.57usec per call
some_func1(...) 61035 iterations, 8.19usec per call
List multiplier: 10
some_func(...) 1223 iterations, 408.95usec per call
some_func1(...) 54420 iterations, 9.19usec per call
List multiplier: 100
some_func(...) 1190 iterations, 420.48usec per call
some_func1(...) 23436 iterations, 21.34usec per call
List multiplier: 1000
some_func(...) 951 iterations, 0.53msec per call
some_func1(...) 3870 iterations, 129.21usec per call
List multiplier: 10000
some_func(...) 309 iterations, 1.62msec per call
some_func1(...) 417 iterations, 1.20msec per call


Jul 19 '05 #6
On Tue, 19 Apr 2005 17:00:02 -0700, Michael Spencer <ma**@telcopartners.com> wrote:
Michael Spencer wrote:
Bengt Richter wrote:
> I think this will be worth it if your string to modify is _very_ long:

>>> def some_func(s, letters, table=''.join([chr(i) for i in
xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in
letters]))
...
>>> some_func("Bob Carol Ted Alice", 'adB')
'Bad'

According to my measurements the string doesn't have to be long at all
before your method is faster - cool use of str.translate:

...and here's a version that appears faster than "".join across all lengths of
strings:
>>> import string
>>> def some_func1(s, letters, table=string.maketrans("","")): ... return s.translate(table, table.translate(table, letters))
... >>> some_func1("Bob Carol Ted Alice", "adB") 'Bad' >>>

Good one! ;-)

BTW, since str has .translate, why not .maketrans?

Anyway, this will be something to keep in mind when doing character-based joinery ;-)
Timings follow:

Let's just say improved ;-)
(or see parent post)

Regards,
Bengt Richter
Jul 19 '05 #7
Michael Spencer wrote:
*def*func_join(s,*letters):

...*****return*"".join(letter*for*letter*in*s*if*l etter*in*set(letters))


Make that

def func_join(s, letters):
letter_set = set(letters)
return*"".join(letter*for*letter*in*s*if*letter*in *letter_set)

for a fair timing of a set lookup as opposed to set creation.

Peter

Jul 19 '05 #8
Bengt Richter wrote:
... BTW, since str has .translate, why not .maketrans?

Probably because, while I can imagine u'whatever'.translate using a
256-wide table (and raising exceptions for other the rest), I have
more problems imagining the size of the table for a UCS-4 unicode
setup (32 bits per character). I suppose it could be done, but a
naïve program might be in for a big shock about memory consumption.

--Scott David Daniels
Sc***********@Acm.Org
Jul 19 '05 #9
Peter Otten wrote:
Michael Spencer wrote:

> def func_join(s, letters):


... return "".join(letter for letter in s if letter in set(letters))

Make that

def func_join(s, letters):
letter_set = set(letters)
return "".join(letter for letter in s if letter in letter_set)

for a fair timing of a set lookup as opposed to set creation.

Peter

Sorry - yes! I trip up over the early-binding of the outer loop, but the
late-binding of the condition

Anyway, here are the revised timings, which confirm the speed-advantage of the
translate approach. And, as before, with such a short list of white-listed
letters, it does not pay to create a set at all, even outside the loop. Note
the speed advantage of func_translate1 is 50:1 for long strings, so as Bengt
pointed out, it's worth keeping this in mind for character-based filtering/joining.
def func_join1(s, letters): ... return "".join(letter for letter in s if letter in letters)
... def func_join2(s, letters): ... letter_set = set(letters)
... return "".join(letter for letter in s if letter in letter_set)
... def func_translate1(s, letters, table=string.maketrans("","")): ... return s.translate(table, table.translate(table, letters))
...
for multiplier in (1, 10, 100, 1000, 10000): ... print "List multiplier: %s" % multiplier
... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
multiplier, 'adB')
... print shell.timefunc(func_join1, "Bob Carol Ted Alice" * multiplier,
'adB')
... print shell.timefunc(func_join2, "Bob Carol Ted Alice" * multiplier,
'adB')
...
List multiplier: 1
func_translate1(...) 62295 iterations, 8.03usec per call
func_join1(...) 36510 iterations, 13.69usec per call
func_join2(...) 30139 iterations, 16.59usec per call
List multiplier: 10
func_translate1(...) 53145 iterations, 9.41usec per call
func_join1(...) 7821 iterations, 63.93usec per call
func_join2(...) 7031 iterations, 71.12usec per call
List multiplier: 100
func_translate1(...) 23170 iterations, 21.58usec per call
func_join1(...) 858 iterations, 0.58msec per call
func_join2(...) 777 iterations, 0.64msec per call
List multiplier: 1000
func_translate1(...) 3761 iterations, 132.96usec per call
func_join1(...) 87 iterations, 5.76msec per call
func_join2(...) 81 iterations, 6.18msec per call
List multiplier: 10000
func_translate1(...) 407 iterations, 1.23msec per call
func_join1(...) 9 iterations, 56.27msec per call
func_join2(...) 8 iterations, 64.76msec per call


Jul 19 '05 #10
Michael Spencer wrote:
Anyway, here are the revised timings...
... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
multiplier, 'adB')


What is shell.timefunc?

Thanks,
Kent
Jul 19 '05 #11
Kent Johnson wrote:
Michael Spencer wrote:
Anyway, here are the revised timings...
... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
multiplier, 'adB')

What is shell.timefunc?


This snippet, which I attach to my interactive shell, since I find timeit
awkward to use in that context:

def _get_timer():
if sys.platform == "win32":
return time.clock
else:
return time.time
return

def timefunc(func, *args, **kwds):
timer = _get_timer()
count, totaltime = 0, 0
while totaltime < 0.5:
t1 = timer()
res = func(*args, **kwds)
t2 = timer()
totaltime += (t2-t1)
count += 1
if count > 1000:
unit = "usec"
timeper = totaltime * 1000000 / count
else:
unit = "msec"
timeper = totaltime * 1000 / count
return "%s(...) %s iterations, %.2f%s per call" % \
(func.__name__, count, timeper, unit)

Michael

Jul 19 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: JC | last post by:
hi i want to check a char in the char array public void characters(char chars, int start, int length) { if (chars!='/n' || chars((char)'/r')) { System.out.println("String read is " + new...
5
by: Richard Berg | last post by:
Hello, I need to search a byte array for a sequence of bytes. The sequence may include wildcards. For example if the array contains 0xAA, 0xBB, 0xAA, OxDD then I want to be able to search for...
5
by: Robert | last post by:
Hi, Is there some way of using an array of strings? Like in basic? I know you have to create an array of chars so i think it has to be an 2d array or something... Really stuck here... Thanks...
4
by: Simon Schaap | last post by:
Hello, I have encountered a strange problem and I hope you can help me to understand it. What I want to do is to pass an array of chars to a function that will split it up (on every location where...
7
by: Roman Mashak | last post by:
Hello, All! I wonder is it possible to define an array containing strings, not single characters? What I want is array 'table' that will have N elements, and every element is a strings tailoring...
2
by: Michael | last post by:
Hi, How to understand the difference between the following three. My understanding is the number in bracket minus one is the max number of chars to store in the char array , right? ...
17
by: Chad | last post by:
I'm want static char *output; to hold the modified string "tel chad" However, when I debug it, static char *output holds the ascii value of the strng, and not the string itself. Here is...
5
by: mwebel | last post by:
Hi!, im trying to copy the middle part of a dinamycally created char* string with memcopy but all i get is rubbish... I understand, that malloc can allocate memory wherever it wants and it does...
1
by: O.B. | last post by:
In the example below, I'm trying to convert a fixed byte array to a string. I get an error about needing to use "fixed" but I have no clue where to apply it. Help? using System; using...
6
by: Paulers | last post by:
Hello, I have a string that I am trying to add each char to a datatable row. for example if I have a string that looks like "abcdefg", I would like to break it up into an array of characters...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.