By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,818 Members | 1,262 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,818 IT Pros & Developers. It's quick & easy.

Problem splitting a string

P: n/a
I have this simple string:

mystr = 'this_NP is_VL funny_JJ'

I want to split it and give me a list as

['this', 'NP', 'is', 'VL', 'funny', 'JJ']

1. I tried mystr.split('_| '), but this gave me:

['this_NP is_VL funny_JJ']

It is not splitted at all.

2. I tried mystr.split('_'), and this gave me:

['this', 'NP is', 'VL funny', 'JJ']

in which, space is not used as a delimiter.

3. I tried mystr.split(' '), and this gave me:

['this_NP', 'is_VL', 'funny_JJ']

in which, '_' is not used as delimiter.

I think the documentation does say that the
separator/delimiter can be a string representing all
delimiters we want to use.

I do I split the string by using both ' ' and '_' as
the delimiters at once?

Thanks.


__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Oct 15 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Anthony Liu wrote:
I have this simple string:

mystr = 'this_NP is_VL funny_JJ'

I want to split it and give me a list as

['this', 'NP', 'is', 'VL', 'funny', 'JJ']

1. I tried mystr.split('_| '), but this gave me:

['this_NP is_VL funny_JJ']

It is not splitted at all.


Use re.split:
re.split('_| ', s)

['this', 'NP', 'is', 'VL', 'funny', 'JJ']

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
To love without criticism is to be betrayed.
-- Djuna Barnes
Oct 15 '05 #2

P: n/a
On Fri, 14 Oct 2005 21:52:07 -0700, Anthony Liu wrote:
I have this simple string:

mystr = 'this_NP is_VL funny_JJ'

I want to split it and give me a list as

['this', 'NP', 'is', 'VL', 'funny', 'JJ'] I think the documentation does say that the
separator/delimiter can be a string representing all
delimiters we want to use.
No, the delimiter is the delimiter, not a list of delimiters.

The only exception is delimiter=None, which splits on any whitespace.

[Aside: I think a split-on-any-delimiter function would be useful.]
I do I split the string by using both ' ' and '_' as
the delimiters at once?


Something like this:

mystr = 'this_NP is_VL funny_JJ'
L1 = mystr.split() # splits on whitespace
L2 = []
for item in L1:
L2.extend(item.split('_')

You can *almost* do that as a one-liner:

L2 = [item.split('_') for item in mystr.split()]

except that gives a list like this:

[['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]

which needs flattening.

--
Steven.

Oct 15 '05 #3

P: n/a
Anthony Liu <an***********@yahoo.com> writes:
I do I split the string by using both ' ' and '_' as
the delimiters at once?


Use re.split.
Oct 15 '05 #4

P: n/a
Steven D'Aprano <st***@REMOVETHIScyber.com.au> wrote:
...
You can *almost* do that as a one-liner:
No 'almost' about it...
L2 = [item.split('_') for item in mystr.split()]

except that gives a list like this:

[['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]

which needs flattening.


.....because the flattening is easy:

[ x for x in y.split('_') for y in z.split(' ') ]
Alex
Oct 15 '05 #5

P: n/a
On Sat, 15 Oct 2005 10:51:41 +0200, Alex Martelli wrote:
Steven D'Aprano <st***@REMOVETHIScyber.com.au> wrote:
...
You can *almost* do that as a one-liner:


No 'almost' about it...
L2 = [item.split('_') for item in mystr.split()]

except that gives a list like this:

[['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]

which needs flattening.


....because the flattening is easy:

[ x for x in y.split('_') for y in z.split(' ') ]

py> mystr = 'this_NP is_VL funny_JJ'
py> [x for x in y.split('_') for y in mystr.split(' ')]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'y' is not defined
This works, but isn't flattened:

py> [x for x in [y.split('_') for y in mystr.split(' ')]]
[['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]

--
Steven.

Oct 15 '05 #6

P: n/a
Use re.split, as this is the fastest and cleanest way.
However, iff you have to split a lot of strings, the best is:

import re
delimiters = re.compile('_| ')

def split(x):
return delimiters.split(x)
split('this_NP is_VL funny_JJ')

['this', 'NP', 'is', 'VL', 'funny', 'JJ']

Stani
--
SPE - Stani's Python Editor http://pythonide.stani.be

Oct 15 '05 #7

P: n/a
"SPE - Stani's Python Editor" wrote:
Use re.split, as this is the fastest and cleanest way.
However, iff you have to split a lot of strings, the best is:

import re
delimiters = re.compile('_| ')

def split(x):
return delimiters.split(x)


or, shorter:

import re
split = re.compile('_| ').split

to quickly build a splitter for an arbitrary set of separator characters, use

separators = "_ :+"

split = re.compile("[" + re.escape(separators) + "]").split

to deal with arbitrary separators, you need to be a little bit more careful
when you prepare the pattern:

separators = sep1, sep2, sep3, sep4, ...

pattern = "|".join(re.escape(p) for p in reversed(sorted(separators)))
split = re.compile(pattern).split

</F>

Oct 15 '05 #8

P: n/a
Steven D'Aprano wrote:
On Sat, 15 Oct 2005 10:51:41 +0200, Alex Martelli wrote:
[ x for x in y.split('_') for y in z.split(' ') ]


py> mystr = 'this_NP is_VL funny_JJ'
py> [x for x in y.split('_') for y in mystr.split(' ')]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'y' is not defined


The order of the 'for' clauses is backwards:
[x for y in mystr.split(' ') for x in y.split('_')]

['this', 'NP', 'is', 'VL', 'funny', 'JJ']

Kent
Oct 15 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.