Connecting Tech Pros Worldwide Help | Site Map

Behaviour of str.split

Will McGugan
Guest
 
Posts: n/a
#1: Jul 19 '05
Hi,

I'm curious about the behaviour of the str.split() when applied to empty
strings.

"".split() returns an empty list, however..

"".split("*") returns a list containing one empty string.

I would have expected the second example to have also returned an empty
list. What am I missing?


TIA,

Will McGugan


--
http://www.willmcgugan.com
"".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-84)%26) for c
in "jvyy*jvyyzpthtna^pbz" ] )
runes
Guest
 
Posts: n/a
#2: Jul 19 '05

re: Behaviour of str.split


The behaviour of "".split("*") is not that strange as the splitpoint
always disappear. The re.split() have a nice option to keep the
splitpoint which the str.split should have, I think.

One expectation I keep fighting within myself is that I expect

"mystring".split('') to return ['m', 'y', 's', 't', 'r', 'i', 'n',
'g']. But I guess it's in line with "There should be one-- and
preferably only one --obvious way to do it." that it's not so.

Tim N. van der Leeuw
Guest
 
Posts: n/a
#3: Jul 19 '05

re: Behaviour of str.split



runes wrote:[color=blue]
> The behaviour of "".split("*") is not that strange as the splitpoint
> always disappear. The re.split() have a nice option to keep the
> splitpoint which the str.split should have, I think.
>
> One expectation I keep fighting within myself is that I expect
>
> "mystring".split('') to return ['m', 'y', 's', 't', 'r', 'i', 'n',
> 'g']. But I guess it's in line with "There should be one-- and
> preferably only one --obvious way to do it." that it's not so.[/color]

Fortunately, this is easy to write as: list("mystring").
Actually for me it's not so counter-intuitive that "mystring".split('')
doesn't work; what are you trying to split on?

Anyways, I usually need to split on something more complicated so I
split with regexes, usually.

cheers,

--Tim

runes
Guest
 
Posts: n/a
#4: Jul 19 '05

re: Behaviour of str.split


[Tim N. van der Leeuw][color=blue]
> Fortunately, this is easy to write as: list("mystring").[/color]

Sure, and map(None, "mystring")

Anyways, I have settled with this bevaviour, more or less ;-)

Rune

John Machin
Guest
 
Posts: n/a
#5: Jul 19 '05

re: Behaviour of str.split


On Mon, 18 Apr 2005 16:16:00 +0100, Will McGugan
<news@NOwillmcguganSPAM.com> wrote:
[color=blue]
>Hi,
>
>I'm curious about the behaviour of the str.split() when applied to empty
>strings.
>
>"".split() returns an empty list, however..
>
>"".split("*") returns a list containing one empty string.
>
>I would have expected the second example to have also returned an empty
>list. What am I missing?
>[/color]

You are missing a perusal of the documentation. Had you done so, you
would have noticed that the actual behaviour that you mentioned is
completely the reverse of what is in the documentation!

"""
Splitting an empty string with a specified separator returns an empty
list.
If sep is not specified or is None, a different splitting algorithm is
applied. <snip> Splitting an empty string or a string consisting of
just whitespace will return "['']".
"""

As you stumbled on this first, you may have the honour of submitting a
patch to the docs and getting your name on the roll of contributors.
Get in quickly, before X** L** does :-)

Cheers,

John
Raymond Hettinger
Guest
 
Posts: n/a
#6: Jul 19 '05

re: Behaviour of str.split


[Will McGugan][color=blue][color=green]
> >I'm curious about the behaviour of the str.split() when applied to empty
> >strings.
> >
> >"".split() returns an empty list, however..
> >
> >"".split("*") returns a list containing one empty string.[/color][/color]

[John Machin][color=blue]
> You are missing a perusal of the documentation. Had you done so, you
> would have noticed that the actual behaviour that you mentioned is
> completely the reverse of what is in the documentation![/color]

Nuts! I've got it from here and will get it fixed up.

<lament>
str.split() has to be one of the most frequently revised pieces of
documentation. In this case, the two statements about splitting empty strings
need to be swapped. Previously, all the changes occured because
someone/somewhere would always find a way to misread whatever was there.

In the absence of reading the docs, a person's intuition seems to lead them to
guess that the behavior will be different than it actually is. Unfortunately,
one person's intuition is often at odds with another's.
</lament>


Raymond Hettinger


Greg Ewing
Guest
 
Posts: n/a
#7: Jul 19 '05

re: Behaviour of str.split


Will McGugan wrote:[color=blue]
> Hi,
>
> I'm curious about the behaviour of the str.split() when applied to empty
> strings.
>
> "".split() returns an empty list, however..
>
> "".split("*") returns a list containing one empty string.[/color]

Both of these make sense as limiting cases.

Consider
[color=blue][color=green][color=darkred]
>>> "a b c".split()[/color][/color][/color]
['a', 'b', 'c'][color=blue][color=green][color=darkred]
>>> "a b".split()[/color][/color][/color]
['a', 'b'][color=blue][color=green][color=darkred]
>>> "a".split()[/color][/color][/color]
['a'][color=blue][color=green][color=darkred]
>>> "".split()[/color][/color][/color]
[]

and
[color=blue][color=green][color=darkred]
>>> "**".split("*")[/color][/color][/color]
['', '', ''][color=blue][color=green][color=darkred]
>>> "*".split("*")[/color][/color][/color]
['', ''][color=blue][color=green][color=darkred]
>>> "".split("*")[/color][/color][/color]
['']

The split() method is really doing two somewhat different things
depending on whether it is given an argument, and the end-cases
come out differently.

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg
David Fraser
Guest
 
Posts: n/a
#8: Jul 19 '05

re: Behaviour of str.split


Greg Ewing wrote:[color=blue]
> Will McGugan wrote:
>[color=green]
>> Hi,
>>
>> I'm curious about the behaviour of the str.split() when applied to
>> empty strings.
>>
>> "".split() returns an empty list, however..
>>
>> "".split("*") returns a list containing one empty string.[/color]
>
>
> Both of these make sense as limiting cases.
>
> Consider
>[color=green][color=darkred]
> >>> "a b c".split()[/color][/color]
> ['a', 'b', 'c'][color=green][color=darkred]
> >>> "a b".split()[/color][/color]
> ['a', 'b'][color=green][color=darkred]
> >>> "a".split()[/color][/color]
> ['a'][color=green][color=darkred]
> >>> "".split()[/color][/color]
> []
>
> and
>[color=green][color=darkred]
> >>> "**".split("*")[/color][/color]
> ['', '', ''][color=green][color=darkred]
> >>> "*".split("*")[/color][/color]
> ['', ''][color=green][color=darkred]
> >>> "".split("*")[/color][/color]
> ['']
>
> The split() method is really doing two somewhat different things
> depending on whether it is given an argument, and the end-cases
> come out differently.
>[/color]
You don't really explain *why* they make sense as limiting cases, as
your examples are quite different.

Consider[color=blue][color=green][color=darkred]
>>> "a*b*c".split("*")[/color][/color][/color]
['a', 'b', 'c'][color=blue][color=green][color=darkred]
>>> "a*b".split("*")[/color][/color][/color]
['a', 'b'][color=blue][color=green][color=darkred]
>>> "a".split("*")[/color][/color][/color]
['a'][color=blue][color=green][color=darkred]
>>> "".split("*")[/color][/color][/color]
['']

Now how is this logical when compared with split() above?

David
Bengt Richter
Guest
 
Posts: n/a
#9: Jul 19 '05

re: Behaviour of str.split


On Wed, 20 Apr 2005 10:55:18 +0200, David Fraser <davidf@sjsoft.com> wrote:
[color=blue]
>Greg Ewing wrote:[color=green]
>> Will McGugan wrote:
>>[color=darkred]
>>> Hi,
>>>
>>> I'm curious about the behaviour of the str.split() when applied to
>>> empty strings.
>>>
>>> "".split() returns an empty list, however..
>>>
>>> "".split("*") returns a list containing one empty string.[/color]
>>
>>
>> Both of these make sense as limiting cases.
>>
>> Consider
>>[color=darkred]
>> >>> "a b c".split()[/color]
>> ['a', 'b', 'c'][color=darkred]
>> >>> "a b".split()[/color]
>> ['a', 'b'][color=darkred]
>> >>> "a".split()[/color]
>> ['a'][color=darkred]
>> >>> "".split()[/color]
>> []
>>
>> and
>>[color=darkred]
>> >>> "**".split("*")[/color]
>> ['', '', ''][color=darkred]
>> >>> "*".split("*")[/color]
>> ['', ''][color=darkred]
>> >>> "".split("*")[/color]
>> ['']
>>
>> The split() method is really doing two somewhat different things
>> depending on whether it is given an argument, and the end-cases
>> come out differently.
>>[/color]
>You don't really explain *why* they make sense as limiting cases, as
>your examples are quite different.
>
>Consider[color=green][color=darkred]
> >>> "a*b*c".split("*")[/color][/color]
>['a', 'b', 'c'][color=green][color=darkred]
> >>> "a*b".split("*")[/color][/color]
>['a', 'b'][color=green][color=darkred]
> >>> "a".split("*")[/color][/color]
>['a'][color=green][color=darkred]
> >>> "".split("*")[/color][/color]
>['']
>
>Now how is this logical when compared with split() above?[/color]

The trouble is that s.split(arg) and s.split() are two different functions.

The first is 1:1 and reversible like arg.join(s.split(arg))==s
The second is not 1:1 nor reversible: '<<various whitespace>>'.join(s.split()) == s ?? Not usually.

I think you can do it with the equivalent whitespace regex, preserving the splitout whitespace
substrings and ''.joining those back with the others, but not with split(). I.e.,
[color=blue][color=green][color=darkred]
>>> def splitjoin(s, splitter=None):[/color][/color][/color]
... return (splitter is None and '<<whitespace>>' or splitter).join(s.split(splitter))
...[color=blue][color=green][color=darkred]
>>> splitjoin('a*b*c', '*')[/color][/color][/color]
'a*b*c'[color=blue][color=green][color=darkred]
>>> splitjoin('a*b', '*')[/color][/color][/color]
'a*b'[color=blue][color=green][color=darkred]
>>> splitjoin('a', '*')[/color][/color][/color]
'a'[color=blue][color=green][color=darkred]
>>> splitjoin('', '*')[/color][/color][/color]
''[color=blue][color=green][color=darkred]
>>> splitjoin('a b c')[/color][/color][/color]
'a<<whitespace>>b<<whitespace>>c'[color=blue][color=green][color=darkred]
>>> splitjoin('a b ')[/color][/color][/color]
'a<<whitespace>>b'[color=blue][color=green][color=darkred]
>>> splitjoin(' b ')[/color][/color][/color]
'b'[color=blue][color=green][color=darkred]
>>> splitjoin('')[/color][/color][/color]
''
[color=blue][color=green][color=darkred]
>>> splitjoin('*****','*')[/color][/color][/color]
'*****'
Note why that works:
[color=blue][color=green][color=darkred]
>>> '*****'.split('*')[/color][/color][/color]
['', '', '', '', '', ''][color=blue][color=green][color=darkred]
>>> '*a'.split('*')[/color][/color][/color]
['', 'a'][color=blue][color=green][color=darkred]
>>> 'a*'.split('*')[/color][/color][/color]
['a', '']
[color=blue][color=green][color=darkred]
>>> splitjoin('*a','*')[/color][/color][/color]
'*a'[color=blue][color=green][color=darkred]
>>> splitjoin('a*','*')[/color][/color][/color]
'a*'

Regards,
Bengt Richter
David Fraser
Guest
 
Posts: n/a
#10: Jul 19 '05

re: Behaviour of str.split


Bengt Richter wrote:[color=blue]
> On Wed, 20 Apr 2005 10:55:18 +0200, David Fraser <davidf@sjsoft.com> wrote:
>
>[color=green]
>>Greg Ewing wrote:
>>[color=darkred]
>>>Will McGugan wrote:
>>>
>>>
>>>>Hi,
>>>>
>>>>I'm curious about the behaviour of the str.split() when applied to
>>>>empty strings.
>>>>
>>>>"".split() returns an empty list, however..
>>>>
>>>>"".split("*") returns a list containing one empty string.
>>>
>>>
>>>Both of these make sense as limiting cases.
>>>
>>>Consider
>>>
>>> >>> "a b c".split()
>>>['a', 'b', 'c']
>>> >>> "a b".split()
>>>['a', 'b']
>>> >>> "a".split()
>>>['a']
>>> >>> "".split()
>>>[]
>>>
>>>and
>>>
>>> >>> "**".split("*")
>>>['', '', '']
>>> >>> "*".split("*")
>>>['', '']
>>> >>> "".split("*")
>>>['']
>>>
>>>The split() method is really doing two somewhat different things
>>>depending on whether it is given an argument, and the end-cases
>>>come out differently.
>>>[/color]
>>
>>You don't really explain *why* they make sense as limiting cases, as
>>your examples are quite different.
>>
>>Consider
>>[color=darkred]
>>>>>"a*b*c".split("*")[/color]
>>
>>['a', 'b', 'c']
>>[color=darkred]
>>>>>"a*b".split("*")[/color]
>>
>>['a', 'b']
>>[color=darkred]
>>>>>"a".split("*")[/color]
>>
>>['a']
>>[color=darkred]
>>>>>"".split("*")[/color]
>>
>>['']
>>
>>Now how is this logical when compared with split() above?[/color]
>
>
> The trouble is that s.split(arg) and s.split() are two different functions.
>
> The first is 1:1 and reversible like arg.join(s.split(arg))==s
> The second is not 1:1 nor reversible: '<<various whitespace>>'.join(s.split()) == s ?? Not usually.
>
> I think you can do it with the equivalent whitespace regex, preserving the splitout whitespace
> substrings and ''.joining those back with the others, but not with split(). I.e.,
>[color=green][color=darkred]
> >>> def splitjoin(s, splitter=None):[/color][/color]
> ... return (splitter is None and '<<whitespace>>' or splitter).join(s.split(splitter))
> ...[color=green][color=darkred]
> >>> splitjoin('a*b*c', '*')[/color][/color]
> 'a*b*c'[color=green][color=darkred]
> >>> splitjoin('a*b', '*')[/color][/color]
> 'a*b'[color=green][color=darkred]
> >>> splitjoin('a', '*')[/color][/color]
> 'a'[color=green][color=darkred]
> >>> splitjoin('', '*')[/color][/color]
> ''[color=green][color=darkred]
> >>> splitjoin('a b c')[/color][/color]
> 'a<<whitespace>>b<<whitespace>>c'[color=green][color=darkred]
> >>> splitjoin('a b ')[/color][/color]
> 'a<<whitespace>>b'[color=green][color=darkred]
> >>> splitjoin(' b ')[/color][/color]
> 'b'[color=green][color=darkred]
> >>> splitjoin('')[/color][/color]
> ''
>[color=green][color=darkred]
> >>> splitjoin('*****','*')[/color][/color]
> '*****'
> Note why that works:
>[color=green][color=darkred]
> >>> '*****'.split('*')[/color][/color]
> ['', '', '', '', '', ''][color=green][color=darkred]
> >>> '*a'.split('*')[/color][/color]
> ['', 'a'][color=green][color=darkred]
> >>> 'a*'.split('*')[/color][/color]
> ['a', '']
>[color=green][color=darkred]
> >>> splitjoin('*a','*')[/color][/color]
> '*a'[color=green][color=darkred]
> >>> splitjoin('a*','*')[/color][/color]
> 'a*'[/color]

Thanks, this makes sense.
So ideally if we weren't dealing with backward compatibility these
functions might have different names... "split" (with arg) and
"spacesplit" (without arg)
In fact it would be nice to allow an argument to "spacesplit" specifying
the characters regarded as 'space'
But all not worth breaking current code :-)

David
Closed Thread