Connecting Tech Pros Worldwide Forums | Help | Site Map

Problem splitting a string

Anthony Liu
Guest
 
Posts: n/a
#1: Oct 15 '05
I have this simple string:

mystr = 'this_NP is_VL funny_JJ'

I want to split it and give me a list as

['this', 'NP', 'is', 'VL', 'funny', 'JJ']

1. I tried mystr.split('_| '), but this gave me:

['this_NP is_VL funny_JJ']

It is not splitted at all.

2. I tried mystr.split('_'), and this gave me:

['this', 'NP is', 'VL funny', 'JJ']

in which, space is not used as a delimiter.

3. I tried mystr.split(' '), and this gave me:

['this_NP', 'is_VL', 'funny_JJ']

in which, '_' is not used as delimiter.

I think the documentation does say that the
separator/delimiter can be a string representing all
delimiters we want to use.

I do I split the string by using both ' ' and '_' as
the delimiters at once?

Thanks.






__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

Erik Max Francis
Guest
 
Posts: n/a
#2: Oct 15 '05

re: Problem splitting a string


Anthony Liu wrote:
[color=blue]
> I have this simple string:
>
> mystr = 'this_NP is_VL funny_JJ'
>
> I want to split it and give me a list as
>
> ['this', 'NP', 'is', 'VL', 'funny', 'JJ']
>
> 1. I tried mystr.split('_| '), but this gave me:
>
> ['this_NP is_VL funny_JJ']
>
> It is not splitted at all.[/color]

Use re.split:
[color=blue][color=green][color=darkred]
>>> re.split('_| ', s)[/color][/color][/color]
['this', 'NP', 'is', 'VL', 'funny', 'JJ']

--
Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
To love without criticism is to be betrayed.
-- Djuna Barnes
Steven D'Aprano
Guest
 
Posts: n/a
#3: Oct 15 '05

re: Problem splitting a string


On Fri, 14 Oct 2005 21:52:07 -0700, Anthony Liu wrote:
[color=blue]
> I have this simple string:
>
> mystr = 'this_NP is_VL funny_JJ'
>
> I want to split it and give me a list as
>
> ['this', 'NP', 'is', 'VL', 'funny', 'JJ'][/color]
[color=blue]
> I think the documentation does say that the
> separator/delimiter can be a string representing all
> delimiters we want to use.[/color]

No, the delimiter is the delimiter, not a list of delimiters.

The only exception is delimiter=None, which splits on any whitespace.

[Aside: I think a split-on-any-delimiter function would be useful.]
[color=blue]
> I do I split the string by using both ' ' and '_' as
> the delimiters at once?[/color]

Something like this:

mystr = 'this_NP is_VL funny_JJ'
L1 = mystr.split() # splits on whitespace
L2 = []
for item in L1:
L2.extend(item.split('_')

You can *almost* do that as a one-liner:

L2 = [item.split('_') for item in mystr.split()]

except that gives a list like this:

[['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]

which needs flattening.

--
Steven.

Paul Rubin
Guest
 
Posts: n/a
#4: Oct 15 '05

re: Problem splitting a string


Anthony Liu <antonyliu2002@yahoo.com> writes:[color=blue]
> I do I split the string by using both ' ' and '_' as
> the delimiters at once?[/color]

Use re.split.
Alex Martelli
Guest
 
Posts: n/a
#5: Oct 15 '05

re: Problem splitting a string


Steven D'Aprano <steve@REMOVETHIScyber.com.au> wrote:
...[color=blue]
> You can *almost* do that as a one-liner:[/color]

No 'almost' about it...
[color=blue]
> L2 = [item.split('_') for item in mystr.split()]
>
> except that gives a list like this:
>
> [['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]
>
> which needs flattening.[/color]

.....because the flattening is easy:

[ x for x in y.split('_') for y in z.split(' ') ]


Alex
Steven D'Aprano
Guest
 
Posts: n/a
#6: Oct 15 '05

re: Problem splitting a string


On Sat, 15 Oct 2005 10:51:41 +0200, Alex Martelli wrote:
[color=blue]
> Steven D'Aprano <steve@REMOVETHIScyber.com.au> wrote:
> ...[color=green]
>> You can *almost* do that as a one-liner:[/color]
>
> No 'almost' about it...
>[color=green]
>> L2 = [item.split('_') for item in mystr.split()]
>>
>> except that gives a list like this:
>>
>> [['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]
>>
>> which needs flattening.[/color]
>
> ....because the flattening is easy:
>
> [ x for x in y.split('_') for y in z.split(' ') ][/color]


py> mystr = 'this_NP is_VL funny_JJ'
py> [x for x in y.split('_') for y in mystr.split(' ')]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'y' is not defined


This works, but isn't flattened:

py> [x for x in [y.split('_') for y in mystr.split(' ')]]
[['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]



--
Steven.

SPE - Stani's Python Editor
Guest
 
Posts: n/a
#7: Oct 15 '05

re: Problem splitting a string


Use re.split, as this is the fastest and cleanest way.
However, iff you have to split a lot of strings, the best is:

import re
delimiters = re.compile('_| ')

def split(x):
return delimiters.split(x)
[color=blue][color=green][color=darkred]
>>> split('this_NP is_VL funny_JJ')[/color][/color][/color]
['this', 'NP', 'is', 'VL', 'funny', 'JJ']

Stani
--
SPE - Stani's Python Editor http://pythonide.stani.be

Fredrik Lundh
Guest
 
Posts: n/a
#8: Oct 15 '05

re: Problem splitting a string


"SPE - Stani's Python Editor" wrote:
[color=blue]
> Use re.split, as this is the fastest and cleanest way.
> However, iff you have to split a lot of strings, the best is:
>
> import re
> delimiters = re.compile('_| ')
>
> def split(x):
> return delimiters.split(x)[/color]

or, shorter:

import re
split = re.compile('_| ').split

to quickly build a splitter for an arbitrary set of separator characters, use

separators = "_ :+"

split = re.compile("[" + re.escape(separators) + "]").split

to deal with arbitrary separators, you need to be a little bit more careful
when you prepare the pattern:

separators = sep1, sep2, sep3, sep4, ...

pattern = "|".join(re.escape(p) for p in reversed(sorted(separators)))
split = re.compile(pattern).split

</F>



Kent Johnson
Guest
 
Posts: n/a
#9: Oct 15 '05

re: Problem splitting a string


Steven D'Aprano wrote:[color=blue]
> On Sat, 15 Oct 2005 10:51:41 +0200, Alex Martelli wrote:[color=green]
>>[ x for x in y.split('_') for y in z.split(' ') ][/color]
>
> py> mystr = 'this_NP is_VL funny_JJ'
> py> [x for x in y.split('_') for y in mystr.split(' ')]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> NameError: name 'y' is not defined[/color]

The order of the 'for' clauses is backwards:[color=blue][color=green][color=darkred]
>>> [x for y in mystr.split(' ') for x in y.split('_')][/color][/color][/color]
['this', 'NP', 'is', 'VL', 'funny', 'JJ']

Kent
Closed Thread