By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,267 Members | 1,861 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,267 IT Pros & Developers. It's quick & easy.

Small inconsistency between string.split and "".split

P: n/a
Hi all,

While writing a small program to help other poster at c.l.py, I found
a small inconsistency between the handling of keyword parameters of
string.split() and the split() method of strings. I wonder if someone
else had ever stumbled on it before, and if it has a good reason to
work like it is.

Both implementations take two parameters: the separator character and
the max number of splits (maxsplit). However, string.split() accept
maxsplit as a keyword parameter, while mystring.split() doesn't. In my
case, it meant that I had to resort to string.split() in my example,
in order to avoid having to deal with the separator.

** BTW, I had to avoid dealing with the separator for another annoying
reason: I thought that I could do something like this:

mystring.split(string.whitespace, 2)

to preserve the default whitespace detecting behavior. But it won't
work this way with neither implementation of split().

----
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com
Jul 18 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a
Carlos Ribeiro wrote:
While writing a small program to help other poster at c.l.py, I found
a small inconsistency between the handling of keyword parameters of
string.split() and the split() method of strings. I wonder if someone
else had ever stumbled on it before, and if it has a good reason to
work like it is.

Both implementations take two parameters: the separator character and
the max number of splits (maxsplit). However, string.split() accept
maxsplit as a keyword parameter, while mystring.split() doesn't. In my
case, it meant that I had to resort to string.split() in my example,
in order to avoid having to deal with the separator.
Works here:

c:\>python
Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
s = 'this is my string'
s.split() ['this', 'is', 'my', 'string'] s.split('s') ['thi', ' i', ' my ', 'tring'] s.split('s', 1) ['thi', ' is my string'] s.split('s', 2) ['thi', ' i', ' my string']
** BTW, I had to avoid dealing with the separator for another annoying
reason: I thought that I could do something like this:

mystring.split(string.whitespace, 2)

to preserve the default whitespace detecting behavior. But it won't
work this way with neither implementation of split().


I think this works though:
s.split(None, 2) ['this', 'is', 'my string'] s.split(None, 1)

['this', 'is my string']

-Peter
Jul 18 '05 #2

P: n/a
On Mon, 13 Sep 2004 13:09:26 -0400, Peter Hansen <pe***@engcorp.com> wrote:
Works here:
<snip>
>>> s.split('s', 1) ['thi', ' is my string'] >>> s.split('s', 2)


Unfortunately, this is *not* what I had meant to ask for. What I am
saying is that:

import strings
strings.split(maxsplit=1)

works, while

mystring.split(maxsplit=1)

doesn't. In short, the builtin string method doesn't accept keyword
parameters while the strings.split() function does. Alas, the "None"
trick is not documented -- and without knowing about it, I had no
other way around.
--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com
Jul 18 '05 #3

P: n/a
On Mon, Sep 13, 2004 at 02:41:33PM -0300, Carlos Ribeiro wrote:
....
... Alas, the "None"
trick is not documented -- and without knowing about it, I had no
other way around.


In 2.3.4 Python Library Reference section 2.3.6.1 String Methods,

"""
split([sep [,maxsplit]])

Return a list of the words in the string, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or None, any
whitespace string is a separator.
"""

I think "None" trick was documented here since string method was
introduced.

-Inyeol
Jul 18 '05 #4

P: n/a
On Mon, 13 Sep 2004 10:59:27 -0700, Inyeol Lee <in********@siimage.com> wrote:
I think "None" trick was documented here since string method was
introduced.


I got it now. The problem is that I had just read the docstring --
yes, not the manual, and admit it, it was lazyness of my part ;-) But
anyway... the keyword parameter handling is inconsistent, *and* the
docstring could mention something about sep="None". Here it is:

split(s [,sep [,maxsplit]]) -> list of strings

Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is given, splits at no more than
maxsplit places (resulting in at most maxsplit+1 words). If sep
is not specified, any whitespace string is a separator.

(split and splitfields are synonymous)

It seems that sep=None can be safely understood as "sep is not
specified". The other way round is not so clear.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com
Jul 18 '05 #5

P: n/a
Carlos Ribeiro wrote:
On Mon, 13 Sep 2004 10:59:27 -0700, Inyeol Lee <in********@siimage.com> wrote:
I think "None" trick was documented here since string method was
introduced.


I got it now. The problem is that I had just read the docstring --
yes, not the manual, and admit it, it was lazyness of my part ;-) But
anyway... the keyword parameter handling is inconsistent, *and* the
docstring could mention something about sep="None".


I've fixed the docstring for both unicode.split() and
string.split() to give a hint about the None default. Note
that the docstring for str.split() already *did* mention
the None option.

Bye,
Walter Dörwald

Jul 18 '05 #6

P: n/a
Walter,

On Tue, 14 Sep 2004 12:01:29 +0200, Walter Dörwald
<wa****@livinglogic.de> wrote:
Carlos Ribeiro wrote:
I've fixed the docstring for both unicode.split() and
string.split() to give a hint about the None default. Note
that the docstring for str.split() already *did* mention
the None option.


I don't know if you can do it, but isn't easy to modify the split
method to accept maxsplit as a keyword parameter? It would make it
consistent with string.split(), and as far as I'm aware, it should not
cause any sizeable performance penalty. But the most important reason
is that keyword parameters for often-unused options make code more
readable; for example,

mystring.split(maxsplit=2)

reads better than:

mystring.,split(None, 2)

That's my opinion, anyway...

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com
Jul 18 '05 #7

P: n/a
Carlos Ribeiro <ca********@gmail.com> wrote:
Walter,

On Tue, 14 Sep 2004 12:01:29 +0200, Walter Dörwald
<wa****@livinglogic.de> wrote:
Carlos Ribeiro wrote:
I've fixed the docstring for both unicode.split() and
string.split() to give a hint about the None default. Note
that the docstring for str.split() already *did* mention
the None option.
I don't know if you can do it, but isn't easy to modify the split
method to accept maxsplit as a keyword parameter? It would make it


Feasible, not hard, not trivial. The problem is different...:

kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_KEYWORDS'
92
kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_VARARGS'
1272
kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_'
2429

In other words: throughout the current C sources for Python (across all
platforms etc) there are about 2429 specifications of how various
functions (methods, of course, include) take their parameters. Of
these, about half are METH_VARARGS (400 are METH_NOARGS, i.e.e functions
and methods accepting no explicit arguments, and 739 are METH_O,
accepting just one), and less than 4% accept keyword-style arguments.
Many of those are pretty recent additions, too, and some play special
roles which you just couldn't fulfil otherwise (e.g. consider the
optional key= vs cmp= arguments that 2.4 accepts for the list.sort
method -- they are mutually exclusive...).

Having ALL C-coded functions and methods that accept any argument accept
keyword-style arguments in particular would surely lead to a more
consistent language, once the impact of thousands of modifications to
the source stabilizes again -- a slightly bigger and slower interpreter,
no doubt, but probably only slightly. But these thousands of changes
will require very substantial and disruptive editing -- substantial
manpower to perform them all, AND ensure they're all well tested (I
suspect the set of unit tests would have to more than double to do a
halfway decent job). It would have to be among the major targets of a
given Python release, I suspect, and raising enthusiasm for such a job
might not be easy, even though Python would be a better language in
consequence. Maybe it will be feasible as part of the 3.0 release,
which is slated to be incompatible anyway... remove the METH_VARARGS
altogether, breaking compatibility with all existing extensions, so
EVERY C-coded function in the future, if it takes any argument at all,
will HAVE to take them in keyword form, too.

Until it's feasible to perform such a sweeping change, justifying
changes to ONE specific method of an object which has dozens is going to
be pretty hard. Perhaps, if someone volunteered a patch to make ALL
methods of string and unicode objects specifically accepts arguments in
keyword form as well as positionally, with all the needed tests & docs,
in time for Python 2.4's first beta in a couple of weeks, it might be
accepted (if separate but similar patches also existed for methods of
other built-in types, that would help all of their acceptance chances,
IMHO). But a patch to change ONE method out of dozens, I suspect, would
be shot down -- the slight, useful extra functionality might be judged
to not be worth the increase in inconsistency in this area (which IMHO
must, sadly, count as a wart in today's Python, sigh).
Alex

consistent with string.split(), and as far as I'm aware, it should not
cause any sizeable performance penalty. But the most important reason
is that keyword parameters for often-unused options make code more
readable; for example,

mystring.split(maxsplit=2)

reads better than:

mystring.,split(None, 2)

That's my opinion, anyway...

Jul 18 '05 #8

P: n/a
al*****@yahoo.com (Alex Martelli) writes:
Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,


[...]

This whole area isn't particularly pretty. In general it would be
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like
psyco. METH_O, METH_NOARGS are a step in this direction -- but you
can't pass a keyword argument to a METH_O function (not that one would
want to, very often, but it's still a potential inconsistency).

I wonder what Pyrex does...

My thoughts on this area, like many others, can probably be summarized
as "I hate C".

Cheers,
mwh

--
Enlightenment is probably antithetical to impatience.
-- Erik Naggum, comp.lang.lisp
Jul 18 '05 #9

P: n/a
Michael Hudson <mw*@python.net> wrote:
al*****@yahoo.com (Alex Martelli) writes:
Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,
[...]

This whole area isn't particularly pretty. In general it would be


Indeed, it isn't.
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like
....and consistency with the way Python-coded functions work.
psyco. METH_O, METH_NOARGS are a step in this direction -- but you
can't pass a keyword argument to a METH_O function (not that one would
want to, very often, but it's still a potential inconsistency).
Right; it could be remedied by letting a macro otherwise equivalent to
METH_O know about that one argument's name.

I wonder what Pyrex does...
for:
def example(aa, bb):
pass

it generates (name mangling apart, I'm demangling for legibility):

static PyObject* example(PyObject *self, PyObject *args, PyObject *kwds)
{
PyObject *aa = 0;
PyObject *bb = 0;
static char *argnames[] = {"aa", "bb", 0};

if(!PyArg_ParseTupleAndKeywords(args,kwds,"OO",arg names,&aa,&bb))
return 0;

etc, etc, and METH_VARARGS|METH_KEYWORDS in the PyMethodDef array. IOW,
nothing strange, and all correct, it seems to me.
Alex

My thoughts on this area, like many others, can probably be summarized
as "I hate C".

Cheers,
mwh

Jul 18 '05 #10

P: n/a
al*****@yahoo.com (Alex Martelli) writes:
Michael Hudson <mw*@python.net> wrote:
al*****@yahoo.com (Alex Martelli) writes:
Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,


[...]

This whole area isn't particularly pretty. In general it would be


Indeed, it isn't.
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like


...and consistency with the way Python-coded functions work.


Heh, yes, that too :-)
psyco. METH_O, METH_NOARGS are a step in this direction -- but you
can't pass a keyword argument to a METH_O function (not that one would
want to, very often, but it's still a potential inconsistency).


Right; it could be remedied by letting a macro otherwise equivalent to
METH_O know about that one argument's name.


But... how? I guess the PyMethodDef struct could grow an ml_signature
field... wouldn't it be nice if you could do:

static PyObject*
foo(PyObject* ob, int index)
{
...;
}

PyMethodDef methods[] = {
{"foo", foo, "O[ob]i[index]", "docstring"},
{NULL, NULL}
}

? Even nicer if you didn't have to write the signature by hand.

Unfortunately, I don't think you can do this in standard C.
I wonder what Pyrex does...


for:
def example(aa, bb):
pass

it generates (name mangling apart, I'm demangling for legibility):

static PyObject* example(PyObject *self, PyObject *args, PyObject *kwds)
{
PyObject *aa = 0;
PyObject *bb = 0;
static char *argnames[] = {"aa", "bb", 0};

if(!PyArg_ParseTupleAndKeywords(args,kwds,"OO",arg names,&aa,&bb))
return 0;

etc, etc, and METH_VARARGS|METH_KEYWORDS in the PyMethodDef array. IOW,
nothing strange, and all correct, it seems to me.


Cool. I should use pyrex more, I suspect.

Cheers,
mwh

--
As it seems to me, in Perl you have to be an expert to correctly make
a nested data structure like, say, a list of hashes of instances. In
Python, you have to be an idiot not to be able to do it, because you
just write it down. -- Peter Norvig, comp.lang.functional
Jul 18 '05 #11

P: n/a
Michael Hudson <mw*@python.net> wrote:
...
Right; it could be remedied by letting a macro otherwise equivalent to
METH_O know about that one argument's name.
But... how? I guess the PyMethodDef struct could grow an ml_signature
field... wouldn't it be nice if you could do:


Right, something like that. As long as we need backwards compatibility
(==all the way to 3.0) that needs to be handled with care, of course...

static PyObject*
foo(PyObject* ob, int index)
{
...;
}

PyMethodDef methods[] = {
{"foo", foo, "O[ob]i[index]", "docstring"},
{NULL, NULL}
}

? Even nicer if you didn't have to write the signature by hand.

Unfortunately, I don't think you can do this in standard C.


I don't think so, either -- unless you put macros in TWO places,
perhaps:

DEF_PYFUN(foo, (PyObject* ob, int index))
{
...
}

PyMethodDef methods[] = {
REF_PYFUN(foo, "docstring"),
{0}
};

This, I suspect, might be possible, with DEF_PYFUN stashing the sig
string someplace (e.g. in a __def_pyfun__foo global) and REF_PYFUN
pulling out a reference to it...
nothing strange, and all correct, it seems to me.


Cool. I should use pyrex more, I suspect.


Me too, I suspect -- it's really a cool way to write extensions for
Python.
Alex
Jul 18 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.