Problem splitting a string | |
I have this simple string:
mystr = 'this_NP is_VL funny_JJ'
I want to split it and give me a list as
['this', 'NP', 'is', 'VL', 'funny', 'JJ']
1. I tried mystr.split('_| '), but this gave me:
['this_NP is_VL funny_JJ']
It is not splitted at all.
2. I tried mystr.split('_'), and this gave me:
['this', 'NP is', 'VL funny', 'JJ']
in which, space is not used as a delimiter.
3. I tried mystr.split(' '), and this gave me:
['this_NP', 'is_VL', 'funny_JJ']
in which, '_' is not used as delimiter.
I think the documentation does say that the
separator/delimiter can be a string representing all
delimiters we want to use.
I do I split the string by using both ' ' and '_' as
the delimiters at once?
Thanks.
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com | | | | re: Problem splitting a string
Anthony Liu wrote:
[color=blue]
> I have this simple string:
>
> mystr = 'this_NP is_VL funny_JJ'
>
> I want to split it and give me a list as
>
> ['this', 'NP', 'is', 'VL', 'funny', 'JJ']
>
> 1. I tried mystr.split('_| '), but this gave me:
>
> ['this_NP is_VL funny_JJ']
>
> It is not splitted at all.[/color]
Use re.split:
[color=blue][color=green][color=darkred]
>>> re.split('_| ', s)[/color][/color][/color]
['this', 'NP', 'is', 'VL', 'funny', 'JJ']
--
Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
To love without criticism is to be betrayed.
-- Djuna Barnes | | | | re: Problem splitting a string
On Fri, 14 Oct 2005 21:52:07 -0700, Anthony Liu wrote:
[color=blue]
> I have this simple string:
>
> mystr = 'this_NP is_VL funny_JJ'
>
> I want to split it and give me a list as
>
> ['this', 'NP', 'is', 'VL', 'funny', 'JJ'][/color]
[color=blue]
> I think the documentation does say that the
> separator/delimiter can be a string representing all
> delimiters we want to use.[/color]
No, the delimiter is the delimiter, not a list of delimiters.
The only exception is delimiter=None, which splits on any whitespace.
[Aside: I think a split-on-any-delimiter function would be useful.]
[color=blue]
> I do I split the string by using both ' ' and '_' as
> the delimiters at once?[/color]
Something like this:
mystr = 'this_NP is_VL funny_JJ'
L1 = mystr.split() # splits on whitespace
L2 = []
for item in L1:
L2.extend(item.split('_')
You can *almost* do that as a one-liner:
L2 = [item.split('_') for item in mystr.split()]
except that gives a list like this:
[['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]
which needs flattening.
--
Steven. | | | | re: Problem splitting a string
Anthony Liu <antonyliu2002@yahoo.com> writes:[color=blue]
> I do I split the string by using both ' ' and '_' as
> the delimiters at once?[/color]
Use re.split. | | | | re: Problem splitting a string
Steven D'Aprano <steve@REMOVETHIScyber.com.au> wrote:
...[color=blue]
> You can *almost* do that as a one-liner:[/color]
No 'almost' about it...
[color=blue]
> L2 = [item.split('_') for item in mystr.split()]
>
> except that gives a list like this:
>
> [['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]
>
> which needs flattening.[/color]
.....because the flattening is easy:
[ x for x in y.split('_') for y in z.split(' ') ]
Alex | | | | re: Problem splitting a string
On Sat, 15 Oct 2005 10:51:41 +0200, Alex Martelli wrote:
[color=blue]
> Steven D'Aprano <steve@REMOVETHIScyber.com.au> wrote:
> ...[color=green]
>> You can *almost* do that as a one-liner:[/color]
>
> No 'almost' about it...
>[color=green]
>> L2 = [item.split('_') for item in mystr.split()]
>>
>> except that gives a list like this:
>>
>> [['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]
>>
>> which needs flattening.[/color]
>
> ....because the flattening is easy:
>
> [ x for x in y.split('_') for y in z.split(' ') ][/color]
py> mystr = 'this_NP is_VL funny_JJ'
py> [x for x in y.split('_') for y in mystr.split(' ')]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'y' is not defined
This works, but isn't flattened:
py> [x for x in [y.split('_') for y in mystr.split(' ')]]
[['this', 'NP'], ['is', 'VL'], ['funny', 'JJ']]
--
Steven. | | | | re: Problem splitting a string
Use re.split, as this is the fastest and cleanest way.
However, iff you have to split a lot of strings, the best is:
import re
delimiters = re.compile('_| ')
def split(x):
return delimiters.split(x)
[color=blue][color=green][color=darkred]
>>> split('this_NP is_VL funny_JJ')[/color][/color][/color]
['this', 'NP', 'is', 'VL', 'funny', 'JJ']
Stani
--
SPE - Stani's Python Editor http://pythonide.stani.be | | | | re: Problem splitting a string
"SPE - Stani's Python Editor" wrote:
[color=blue]
> Use re.split, as this is the fastest and cleanest way.
> However, iff you have to split a lot of strings, the best is:
>
> import re
> delimiters = re.compile('_| ')
>
> def split(x):
> return delimiters.split(x)[/color]
or, shorter:
import re
split = re.compile('_| ').split
to quickly build a splitter for an arbitrary set of separator characters, use
separators = "_ :+"
split = re.compile("[" + re.escape(separators) + "]").split
to deal with arbitrary separators, you need to be a little bit more careful
when you prepare the pattern:
separators = sep1, sep2, sep3, sep4, ...
pattern = "|".join(re.escape(p) for p in reversed(sorted(separators)))
split = re.compile(pattern).split
</F> | | | | re: Problem splitting a string
Steven D'Aprano wrote:[color=blue]
> On Sat, 15 Oct 2005 10:51:41 +0200, Alex Martelli wrote:[color=green]
>>[ x for x in y.split('_') for y in z.split(' ') ][/color]
>
> py> mystr = 'this_NP is_VL funny_JJ'
> py> [x for x in y.split('_') for y in mystr.split(' ')]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> NameError: name 'y' is not defined[/color]
The order of the 'for' clauses is backwards:[color=blue][color=green][color=darkred]
>>> [x for y in mystr.split(' ') for x in y.split('_')][/color][/color][/color]
['this', 'NP', 'is', 'VL', 'funny', 'JJ']
Kent |  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,471 network members.
|