s.split() on multiple separators

mrkafk

Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:

>>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:

cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp

>>c

['ab', 'd', '', 'ab', '', '']

But. Surely there is a more Pythonic way to do this?

I cannot do this:

>>for i in dels:

c=[x.split(i) for x in c]

because x.split(i) is a list.

Sep 30 '07 #1

Subscribe Post Reply

14909

Francesco Guerrieri

On 9/30/07, mr****@gmail.com <mr****@gmail.comwrote:

Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

Have a look at this recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/303342

which contains several ways to solve the problem. You could both
translate all your separators to a single one, and then split over it,
or (maybe the simpler solution) going for the list comprehension
solution.

francesco

Sep 30 '07 #2

Francesco Guerrieri

On 9/30/07, mr****@gmail.com <mr****@gmail.comwrote:

Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

Sep 30 '07 #3

Tim Chase

OK, so I want to split a string c into words using several different

separators from a list (dels).

I can do this the following C-like way:

>>>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:

cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp

>>>c

['ab', 'd', '', 'ab', '', '']

Given your original string, I'm not sure how that would be the
expected result of "split c on the characters in dels".

While there's a certain faction of pythonistas that don't esteem
regular expressions (or at least find them overused/misused,
which I'd certainly agree to), they may be able to serve your
purposes well:

>>c=' abcde abc cba fdsa bcd '
import re
r = re.compile('[ce ]')
r.split(c)

['', 'ab', 'd', '', 'ab', '', '', 'ba', 'fdsa', 'b', 'd', '']

given that a regexp object has a split() method.

-tkc

Sep 30 '07 #4

Bryan Olson

mr****@gmail.com wrote:

Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:

c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):

The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).

Python 2.3 and up offers: for (i, word) in enumerate(c):

cp.extend(c[i].split(j))
c=cp
c
['ab', 'd', '', 'ab', '', '']

The bug lost some words, such as 'fdsa'.

But. Surely there is a more Pythonic way to do this?

When string.split() doesn't quite cut it, try re.split(), or
maybe re.findall(). Is one of these what you want?

import re

c = ' abcde abc cba fdsa bcd '

print re.split('[ce ]', c)

print re.split('[ce ]+', c)

print re.findall('[^ce ]+', c)
--
--Bryan

Sep 30 '07 #5

William James

On Sep 30, 8:53 am, mrk...@gmail.com wrote:

Hello everyone,

OK, so I want to split a string c into words using several different
separators from a list (dels).

I can do this the following C-like way:

>c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:

cp=[]
for i in xrange(0,len(c)-1):
cp.extend(c[i].split(j))
c=cp

>c

['ab', 'd', '', 'ab', '', '']

But. Surely there is a more Pythonic way to do this?

I cannot do this:

>for i in dels:

c=[x.split(i) for x in c]

because x.split(i) is a list.

E:\Ruby>irb
irb(main):001:0' abcde abc cba fdsa bcd '.split(/[ce ]/)
=["", "ab", "d", "", "ab", "", "", "ba", "fdsa", "b", "d"]

Sep 30 '07 #6

mrkafk

['ab', 'd', '', 'ab', '', '']

Given your original string, I'm not sure how that would be the
expected result of "split c on the characters in dels".

Oops, the inner loop should be:

for i in xrange(0,len(c)):

Now it works.

>>c=' abcde abc cba fdsa bcd '
>>import re
>>r = re.compile('[ce ]')
>>r.split(c)

['', 'ab', 'd', '', 'ab', '', '', 'ba', 'fdsa', 'b', 'd', '']

given that a regexp object has a split() method.

That's probably optimum solution. Thanks!

Regards,
Marcin

Sep 30 '07 #7

mrkafk

On 30 Wrz, 20:27, William James <w_a_x_...@yahoo.comwrote:

On Sep 30, 8:53 am, mrk...@gmail.com wrote:

E:\Ruby>irb
irb(main):001:0' abcde abc cba fdsa bcd '.split(/[ce ]/)
=["", "ab", "d", "", "ab", "", "", "ba", "fdsa", "b", "d"]

That's acceptable only if you write perfect ruby-to-python
translator. ;-P

Regards,
Marcin

Sep 30 '07 #8

mrkafk

c=' abcde abc cba fdsa bcd '.split()

dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):

The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).

Yep. Just found it out, though this seems a bit counterintuitive to
me, even if it makes for more elegant code: I forgot about the high
stop bound.

>From my POV, if I want sequence from here to there, it should include

both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...

print re.split('[ce ]', c)

Yes, that does the job. Thanks.

Regards,
Marcin

Sep 30 '07 #9

Paul Hankin

On Sep 30, 8:16 pm, mrk...@gmail.com wrote:

c=' abcde abc cba fdsa bcd '.split()
dels='ce '
for j in dels:
cp=[]
for i in xrange(0,len(c)-1):

The "-1" looks like a bug; remember in Python 'stop' bounds
are exclusive. The indexes of c are simply xrange(len(c)).

Yep. Just found it out, though this seems a bit counterintuitive to
me, even if it makes for more elegant code: I forgot about the high
stop bound.

You made a common mistake of using a loop index instead of iterating
directly.
Instead of:
for i in xrange(len(c)):
cp.extend(c[i].split(j))

Just write:
for words in c:
cp.extend(words.split(j))

Then you won't make a bounds mistake, and this snippet becomes a LOT
more readable.

(Of course, you're better using re.split instead here, but the
principle is good).

--
Paul Hankin

Sep 30 '07 #10

Gabriel Genellina

En Sun, 30 Sep 2007 16:16:30 -0300, <mr****@gmail.comescribiï¿½:

>From my POV, if I want sequence from here to there, it should include
both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...

See this note from E.W.Dijkstra in 1982 where he says that the Python
convention is the best choice.
http://www.cs.utexas.edu/users/EWD/t...xx/EWD831.html

--
Gabriel Genellina

Oct 1 '07 #11

[david]

Gabriel Genellina wrote:

En Sun, 30 Sep 2007 16:16:30 -0300, <mr****@gmail.comescribiï¿½:

>>From my POV, if I want sequence from here to there, it should include
both here and there.

I do understand the consequences of making high bound exclusive, which
is more elegant code: xrange(len(c)). But it does seem a bit
illogical...

See this note from E.W.Dijkstra in 1982 where he says that the Python
convention is the best choice.
http://www.cs.utexas.edu/users/EWD/t...xx/EWD831.html

The only thing I agreed with was his conclusion. Clever man.

[david]

Oct 3 '07 #12

Similar topics

Split text file into words

by: qwweeeit | last post by:

The standard split() can use only one delimiter. To split a text file into words you need multiple delimiters like blank, punctuation, math signs (+-*/), parenteses and so on. I didn't...

Python

Need help on split-function

by: Arjen | last post by:

Hi All, What I want to is using a string as PATTERN in a split function. This makes it possible for me to change the PATTERN on one place in my script... For example: $separator = ";"; $line...

Perl

Query about using split...URGENT

by: uc_sk | last post by:

Hello All I am a newbie to PERL language...If i have a file with data of form abcd 4 {1,2,3} 3 lmn- 3 {12,18,19,22} 4 then i can read them as... ($list $listTotal $set $noElements) = split /...

Perl

String.Split(), Regex.Split() - empty String

by: Rico | last post by:

If there are consecutive occurrences of characters from the given delimiter, String.Split() and Regex.Split() produce an empty string as the token that's between such consecutive occurrences. It...

C# / C Sharp

Split Delimited Text Twice into Array

by: Ben | last post by:

Hi I am creating a dynamic function to return a two dimensional array from a delimeted string. The delimited string is like: field1...field2...field3... field1...field2...field3......

Visual Basic .NET

String.Split versus Strings.Split

by: kurt sune | last post by:

The code: Dim aLine As String = "cat" & vbNewLine & "dog" & vbNewLine & "fox" & vbNewLine Dim csvColumns1 As String() = aLine.Split(vbNewLine, vbCr, vbLf) Dim csvColumns2 As String() =...

Visual Basic .NET

Multiple Double Linked Lists

by: Little | last post by:

Hello everyone. I am trying to do the following program and am unable to get the beginning portion to work correctly. The scanner works when I print the statements without the double linked list...

C / C++

How do I split this String?

by: Saurabh | last post by:

Hi everyone, I am looking for some expert advise to get me out of trouble. I am looking for a solution in C# which will allow me to split the below string in the format provided. The...

C# / C Sharp

Why isn't SPLIT splitting my strings

by: ronrsr | last post by:

I'm trying to break up the result tuple into keyword phrases. The keyword phrases are separated by a ; -- the split function is not working the way I believe it should be. Can anyone see what I"m...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing