473,382 Members | 1,745 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

s.split() on multiple separators

Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:
>>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp

>>c
['ab', 'd', '', 'ab', '', '']

But. Surely there is a more Pythonic way to do this?

I cannot do this:
>>for i in dels:
c=[x.split(i) for x in c]

because x.split(i) is a list.

Sep 30 '07 #1
11 14908
On 9/30/07, mr****@gmail.com <mr****@gmail.comwrote:
Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).
Have a look at this recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/303342

which contains several ways to solve the problem. You could both
translate all your separators to a single one, and then split over it,
or (maybe the simpler solution) going for the list comprehension
solution.

francesco
Sep 30 '07 #2
On 9/30/07, mr****@gmail.com <mr****@gmail.comwrote:
Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).
Have a look at this recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/303342

which contains several ways to solve the problem. You could both
translate all your separators to a single one, and then split over it,
or (maybe the simpler solution) going for the list comprehension
solution.

francesco
Sep 30 '07 #3
OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:
>>>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp

>>>c
['ab', 'd', '', 'ab', '', '']

Given your original string, I'm not sure how that would be the
expected result of "split c on the characters in dels".

While there's a certain faction of pythonistas that don't esteem
regular expressions (or at least find them overused/misused,
which I'd certainly agree to), they may be able to serve your
purposes well:
>>c=' abcde abc cba fdsa bcd '
import re
r = re.compile('[ce ]')
r.split(c)
['', 'ab', 'd', '', 'ab', '', '', 'ba', 'fdsa', 'b', 'd', '']

given that a regexp object has a split() method.

-tkc

Sep 30 '07 #4
mr****@gmail.com wrote:
Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:

c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):
The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).

Python 2.3 and up offers: for (i, word) in enumerate(c):
cp.extend(c[i].split(j))
c=cp
c
['ab', 'd', '', 'ab', '', '']
The bug lost some words, such as 'fdsa'.

But. Surely there is a more Pythonic way to do this?
When string.split() doesn't quite cut it, try re.split(), or
maybe re.findall(). Is one of these what you want?

import re

c = ' abcde abc cba fdsa bcd '

print re.split('[ce ]', c)

print re.split('[ce ]+', c)

print re.findall('[^ce ]+', c)
--
--Bryan
Sep 30 '07 #5
On Sep 30, 8:53 am, mrk...@gmail.com wrote:
Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:
>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:

cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp
>c

['ab', 'd', '', 'ab', '', '']

But. Surely there is a more Pythonic way to do this?

I cannot do this:
>for i in dels:

c=[x.split(i) for x in c]

because x.split(i) is a list.
E:\Ruby>irb
irb(main):001:0' abcde abc cba fdsa bcd '.split(/[ce ]/)
=["", "ab", "d", "", "ab", "", "", "ba", "fdsa", "b", "d"]

Sep 30 '07 #6
['ab', 'd', '', 'ab', '', '']

Given your original string, I'm not sure how that would be the
expected result of "split c on the characters in dels".
Oops, the inner loop should be:

for i in xrange(0,len(c)):

Now it works.

>>c=' abcde abc cba fdsa bcd '
>>import re
>>r = re.compile('[ce ]')
>>r.split(c)
['', 'ab', 'd', '', 'ab', '', '', 'ba', 'fdsa', 'b', 'd', '']

given that a regexp object has a split() method.
That's probably optimum solution. Thanks!

Regards,
Marcin

Sep 30 '07 #7
On 30 Wrz, 20:27, William James <w_a_x_...@yahoo.comwrote:
On Sep 30, 8:53 am, mrk...@gmail.com wrote:
E:\Ruby>irb
irb(main):001:0' abcde abc cba fdsa bcd '.split(/[ce ]/)
=["", "ab", "d", "", "ab", "", "", "ba", "fdsa", "b", "d"]
That's acceptable only if you write perfect ruby-to-python
translator. ;-P

Regards,
Marcin

Sep 30 '07 #8
c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):

The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).
Yep. Just found it out, though this seems a bit counterintuitive to
me, even if it makes for more elegant code: I forgot about the high
stop bound.
>From my POV, if I want sequence from here to there, it should include
both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...
print re.split('[ce ]', c)
Yes, that does the job. Thanks.

Regards,
Marcin

Sep 30 '07 #9
On Sep 30, 8:16 pm, mrk...@gmail.com wrote:
c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):
The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).

Yep. Just found it out, though this seems a bit counterintuitive to
me, even if it makes for more elegant code: I forgot about the high
stop bound.
You made a common mistake of using a loop index instead of iterating
directly.
Instead of:
for i in xrange(len(c)):
cp.extend(c[i].split(j))

Just write:
for words in c:
cp.extend(words.split(j))

Then you won't make a bounds mistake, and this snippet becomes a LOT
more readable.

(Of course, you're better using re.split instead here, but the
principle is good).

--
Paul Hankin

Sep 30 '07 #10
En Sun, 30 Sep 2007 16:16:30 -0300, <mr****@gmail.comescribi�:
>From my POV, if I want sequence from here to there, it should include
both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...
See this note from E.W.Dijkstra in 1982 where he says that the Python
convention is the best choice.
http://www.cs.utexas.edu/users/EWD/t...xx/EWD831.html

--
Gabriel Genellina

Oct 1 '07 #11
Gabriel Genellina wrote:
En Sun, 30 Sep 2007 16:16:30 -0300, <mr****@gmail.comescribi�:
>>From my POV, if I want sequence from here to there, it should include
both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...

See this note from E.W.Dijkstra in 1982 where he says that the Python
convention is the best choice.
http://www.cs.utexas.edu/users/EWD/t...xx/EWD831.html
The only thing I agreed with was his conclusion. Clever man.

[david]
Oct 3 '07 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: qwweeeit | last post by:
The standard split() can use only one delimiter. To split a text file into words you need multiple delimiters like blank, punctuation, math signs (+-*/), parenteses and so on. I didn't...
5
by: Arjen | last post by:
Hi All, What I want to is using a string as PATTERN in a split function. This makes it possible for me to change the PATTERN on one place in my script... For example: $separator = ";"; $line...
8
by: uc_sk | last post by:
Hello All I am a newbie to PERL language...If i have a file with data of form abcd 4 {1,2,3} 3 lmn- 3 {12,18,19,22} 4 then i can read them as... ($list $listTotal $set $noElements) = split /...
3
by: Rico | last post by:
If there are consecutive occurrences of characters from the given delimiter, String.Split() and Regex.Split() produce an empty string as the token that's between such consecutive occurrences. It...
3
by: Ben | last post by:
Hi I am creating a dynamic function to return a two dimensional array from a delimeted string. The delimited string is like: field1...field2...field3... field1...field2...field3......
5
by: kurt sune | last post by:
The code: Dim aLine As String = "cat" & vbNewLine & "dog" & vbNewLine & "fox" & vbNewLine Dim csvColumns1 As String() = aLine.Split(vbNewLine, vbCr, vbLf) Dim csvColumns2 As String() =...
1
by: Little | last post by:
Hello everyone. I am trying to do the following program and am unable to get the beginning portion to work correctly. The scanner works when I print the statements without the double linked list...
6
by: Saurabh | last post by:
Hi everyone, I am looking for some expert advise to get me out of trouble. I am looking for a solution in C# which will allow me to split the below string in the format provided. The...
8
by: ronrsr | last post by:
I'm trying to break up the result tuple into keyword phrases. The keyword phrases are separated by a ; -- the split function is not working the way I believe it should be. Can anyone see what I"m...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.