469,292 Members | 1,306 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,292 developers. It's quick & easy.

s.split() on multiple separators

Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:
>>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp

>>c
['ab', 'd', '', 'ab', '', '']

But. Surely there is a more Pythonic way to do this?

I cannot do this:
>>for i in dels:
c=[x.split(i) for x in c]

because x.split(i) is a list.

Sep 30 '07 #1
11 14623
On 9/30/07, mr****@gmail.com <mr****@gmail.comwrote:
Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).
Have a look at this recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/303342

which contains several ways to solve the problem. You could both
translate all your separators to a single one, and then split over it,
or (maybe the simpler solution) going for the list comprehension
solution.

francesco
Sep 30 '07 #2
On 9/30/07, mr****@gmail.com <mr****@gmail.comwrote:
Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).
Have a look at this recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/303342

which contains several ways to solve the problem. You could both
translate all your separators to a single one, and then split over it,
or (maybe the simpler solution) going for the list comprehension
solution.

francesco
Sep 30 '07 #3
OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:
>>>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp

>>>c
['ab', 'd', '', 'ab', '', '']

Given your original string, I'm not sure how that would be the
expected result of "split c on the characters in dels".

While there's a certain faction of pythonistas that don't esteem
regular expressions (or at least find them overused/misused,
which I'd certainly agree to), they may be able to serve your
purposes well:
>>c=' abcde abc cba fdsa bcd '
import re
r = re.compile('[ce ]')
r.split(c)
['', 'ab', 'd', '', 'ab', '', '', 'ba', 'fdsa', 'b', 'd', '']

given that a regexp object has a split() method.

-tkc

Sep 30 '07 #4
mr****@gmail.com wrote:
Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:

c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):
The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).

Python 2.3 and up offers: for (i, word) in enumerate(c):
cp.extend(c[i].split(j))
c=cp
c
['ab', 'd', '', 'ab', '', '']
The bug lost some words, such as 'fdsa'.

But. Surely there is a more Pythonic way to do this?
When string.split() doesn't quite cut it, try re.split(), or
maybe re.findall(). Is one of these what you want?

import re

c = ' abcde abc cba fdsa bcd '

print re.split('[ce ]', c)

print re.split('[ce ]+', c)

print re.findall('[^ce ]+', c)
--
--Bryan
Sep 30 '07 #5
On Sep 30, 8:53 am, mrk...@gmail.com wrote:
Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:
>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:

cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp
>c

['ab', 'd', '', 'ab', '', '']

But. Surely there is a more Pythonic way to do this?

I cannot do this:
>for i in dels:

c=[x.split(i) for x in c]

because x.split(i) is a list.
E:\Ruby>irb
irb(main):001:0' abcde abc cba fdsa bcd '.split(/[ce ]/)
=["", "ab", "d", "", "ab", "", "", "ba", "fdsa", "b", "d"]

Sep 30 '07 #6
['ab', 'd', '', 'ab', '', '']

Given your original string, I'm not sure how that would be the
expected result of "split c on the characters in dels".
Oops, the inner loop should be:

for i in xrange(0,len(c)):

Now it works.

>>c=' abcde abc cba fdsa bcd '
>>import re
>>r = re.compile('[ce ]')
>>r.split(c)
['', 'ab', 'd', '', 'ab', '', '', 'ba', 'fdsa', 'b', 'd', '']

given that a regexp object has a split() method.
That's probably optimum solution. Thanks!

Regards,
Marcin

Sep 30 '07 #7
On 30 Wrz, 20:27, William James <w_a_x_...@yahoo.comwrote:
On Sep 30, 8:53 am, mrk...@gmail.com wrote:
E:\Ruby>irb
irb(main):001:0' abcde abc cba fdsa bcd '.split(/[ce ]/)
=["", "ab", "d", "", "ab", "", "", "ba", "fdsa", "b", "d"]
That's acceptable only if you write perfect ruby-to-python
translator. ;-P

Regards,
Marcin

Sep 30 '07 #8
c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):

The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).
Yep. Just found it out, though this seems a bit counterintuitive to
me, even if it makes for more elegant code: I forgot about the high
stop bound.
>From my POV, if I want sequence from here to there, it should include
both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...
print re.split('[ce ]', c)
Yes, that does the job. Thanks.

Regards,
Marcin

Sep 30 '07 #9
On Sep 30, 8:16 pm, mrk...@gmail.com wrote:
c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):
The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).

Yep. Just found it out, though this seems a bit counterintuitive to
me, even if it makes for more elegant code: I forgot about the high
stop bound.
You made a common mistake of using a loop index instead of iterating
directly.
Instead of:
for i in xrange(len(c)):
cp.extend(c[i].split(j))

Just write:
for words in c:
cp.extend(words.split(j))

Then you won't make a bounds mistake, and this snippet becomes a LOT
more readable.

(Of course, you're better using re.split instead here, but the
principle is good).

--
Paul Hankin

Sep 30 '07 #10
En Sun, 30 Sep 2007 16:16:30 -0300, <mr****@gmail.comescribi�:
>From my POV, if I want sequence from here to there, it should include
both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...
See this note from E.W.Dijkstra in 1982 where he says that the Python
convention is the best choice.
http://www.cs.utexas.edu/users/EWD/t...xx/EWD831.html

--
Gabriel Genellina

Oct 1 '07 #11
Gabriel Genellina wrote:
En Sun, 30 Sep 2007 16:16:30 -0300, <mr****@gmail.comescribi�:
>>From my POV, if I want sequence from here to there, it should include
both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...

See this note from E.W.Dijkstra in 1982 where he says that the Python
convention is the best choice.
http://www.cs.utexas.edu/users/EWD/t...xx/EWD831.html
The only thing I agreed with was his conclusion. Clever man.

[david]
Oct 3 '07 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by qwweeeit | last post: by
5 posts views Thread by Arjen | last post: by
8 posts views Thread by uc_sk | last post: by
5 posts views Thread by kurt sune | last post: by
1 post views Thread by Little | last post: by
6 posts views Thread by Saurabh | last post: by
8 posts views Thread by ronrsr | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.