473,327 Members | 1,920 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

Small inconsistency between string.split and "".split

Hi all,

While writing a small program to help other poster at c.l.py, I found
a small inconsistency between the handling of keyword parameters of
string.split() and the split() method of strings. I wonder if someone
else had ever stumbled on it before, and if it has a good reason to
work like it is.

Both implementations take two parameters: the separator character and
the max number of splits (maxsplit). However, string.split() accept
maxsplit as a keyword parameter, while mystring.split() doesn't. In my
case, it meant that I had to resort to string.split() in my example,
in order to avoid having to deal with the separator.

** BTW, I had to avoid dealing with the separator for another annoying
reason: I thought that I could do something like this:

mystring.split(string.whitespace, 2)

to preserve the default whitespace detecting behavior. But it won't
work this way with neither implementation of split().

----
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com
Jul 18 '05 #1
11 2468
Carlos Ribeiro wrote:
While writing a small program to help other poster at c.l.py, I found
a small inconsistency between the handling of keyword parameters of
string.split() and the split() method of strings. I wonder if someone
else had ever stumbled on it before, and if it has a good reason to
work like it is.

Both implementations take two parameters: the separator character and
the max number of splits (maxsplit). However, string.split() accept
maxsplit as a keyword parameter, while mystring.split() doesn't. In my
case, it meant that I had to resort to string.split() in my example,
in order to avoid having to deal with the separator.
Works here:

c:\>python
Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
s = 'this is my string'
s.split() ['this', 'is', 'my', 'string'] s.split('s') ['thi', ' i', ' my ', 'tring'] s.split('s', 1) ['thi', ' is my string'] s.split('s', 2) ['thi', ' i', ' my string']
** BTW, I had to avoid dealing with the separator for another annoying
reason: I thought that I could do something like this:

mystring.split(string.whitespace, 2)

to preserve the default whitespace detecting behavior. But it won't
work this way with neither implementation of split().


I think this works though:
s.split(None, 2) ['this', 'is', 'my string'] s.split(None, 1)

['this', 'is my string']

-Peter
Jul 18 '05 #2
On Mon, 13 Sep 2004 13:09:26 -0400, Peter Hansen <pe***@engcorp.com> wrote:
Works here:
<snip>
>>> s.split('s', 1) ['thi', ' is my string'] >>> s.split('s', 2)


Unfortunately, this is *not* what I had meant to ask for. What I am
saying is that:

import strings
strings.split(maxsplit=1)

works, while

mystring.split(maxsplit=1)

doesn't. In short, the builtin string method doesn't accept keyword
parameters while the strings.split() function does. Alas, the "None"
trick is not documented -- and without knowing about it, I had no
other way around.
--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com
Jul 18 '05 #3
On Mon, Sep 13, 2004 at 02:41:33PM -0300, Carlos Ribeiro wrote:
....
... Alas, the "None"
trick is not documented -- and without knowing about it, I had no
other way around.


In 2.3.4 Python Library Reference section 2.3.6.1 String Methods,

"""
split([sep [,maxsplit]])

Return a list of the words in the string, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or None, any
whitespace string is a separator.
"""

I think "None" trick was documented here since string method was
introduced.

-Inyeol
Jul 18 '05 #4
On Mon, 13 Sep 2004 10:59:27 -0700, Inyeol Lee <in********@siimage.com> wrote:
I think "None" trick was documented here since string method was
introduced.


I got it now. The problem is that I had just read the docstring --
yes, not the manual, and admit it, it was lazyness of my part ;-) But
anyway... the keyword parameter handling is inconsistent, *and* the
docstring could mention something about sep="None". Here it is:

split(s [,sep [,maxsplit]]) -> list of strings

Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is given, splits at no more than
maxsplit places (resulting in at most maxsplit+1 words). If sep
is not specified, any whitespace string is a separator.

(split and splitfields are synonymous)

It seems that sep=None can be safely understood as "sep is not
specified". The other way round is not so clear.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com
Jul 18 '05 #5
Carlos Ribeiro wrote:
On Mon, 13 Sep 2004 10:59:27 -0700, Inyeol Lee <in********@siimage.com> wrote:
I think "None" trick was documented here since string method was
introduced.


I got it now. The problem is that I had just read the docstring --
yes, not the manual, and admit it, it was lazyness of my part ;-) But
anyway... the keyword parameter handling is inconsistent, *and* the
docstring could mention something about sep="None".


I've fixed the docstring for both unicode.split() and
string.split() to give a hint about the None default. Note
that the docstring for str.split() already *did* mention
the None option.

Bye,
Walter Dörwald

Jul 18 '05 #6
Walter,

On Tue, 14 Sep 2004 12:01:29 +0200, Walter Dörwald
<wa****@livinglogic.de> wrote:
Carlos Ribeiro wrote:
I've fixed the docstring for both unicode.split() and
string.split() to give a hint about the None default. Note
that the docstring for str.split() already *did* mention
the None option.


I don't know if you can do it, but isn't easy to modify the split
method to accept maxsplit as a keyword parameter? It would make it
consistent with string.split(), and as far as I'm aware, it should not
cause any sizeable performance penalty. But the most important reason
is that keyword parameters for often-unused options make code more
readable; for example,

mystring.split(maxsplit=2)

reads better than:

mystring.,split(None, 2)

That's my opinion, anyway...

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com
Jul 18 '05 #7
Carlos Ribeiro <ca********@gmail.com> wrote:
Walter,

On Tue, 14 Sep 2004 12:01:29 +0200, Walter Dörwald
<wa****@livinglogic.de> wrote:
Carlos Ribeiro wrote:
I've fixed the docstring for both unicode.split() and
string.split() to give a hint about the None default. Note
that the docstring for str.split() already *did* mention
the None option.
I don't know if you can do it, but isn't easy to modify the split
method to accept maxsplit as a keyword parameter? It would make it


Feasible, not hard, not trivial. The problem is different...:

kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_KEYWORDS'
92
kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_VARARGS'
1272
kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_'
2429

In other words: throughout the current C sources for Python (across all
platforms etc) there are about 2429 specifications of how various
functions (methods, of course, include) take their parameters. Of
these, about half are METH_VARARGS (400 are METH_NOARGS, i.e.e functions
and methods accepting no explicit arguments, and 739 are METH_O,
accepting just one), and less than 4% accept keyword-style arguments.
Many of those are pretty recent additions, too, and some play special
roles which you just couldn't fulfil otherwise (e.g. consider the
optional key= vs cmp= arguments that 2.4 accepts for the list.sort
method -- they are mutually exclusive...).

Having ALL C-coded functions and methods that accept any argument accept
keyword-style arguments in particular would surely lead to a more
consistent language, once the impact of thousands of modifications to
the source stabilizes again -- a slightly bigger and slower interpreter,
no doubt, but probably only slightly. But these thousands of changes
will require very substantial and disruptive editing -- substantial
manpower to perform them all, AND ensure they're all well tested (I
suspect the set of unit tests would have to more than double to do a
halfway decent job). It would have to be among the major targets of a
given Python release, I suspect, and raising enthusiasm for such a job
might not be easy, even though Python would be a better language in
consequence. Maybe it will be feasible as part of the 3.0 release,
which is slated to be incompatible anyway... remove the METH_VARARGS
altogether, breaking compatibility with all existing extensions, so
EVERY C-coded function in the future, if it takes any argument at all,
will HAVE to take them in keyword form, too.

Until it's feasible to perform such a sweeping change, justifying
changes to ONE specific method of an object which has dozens is going to
be pretty hard. Perhaps, if someone volunteered a patch to make ALL
methods of string and unicode objects specifically accepts arguments in
keyword form as well as positionally, with all the needed tests & docs,
in time for Python 2.4's first beta in a couple of weeks, it might be
accepted (if separate but similar patches also existed for methods of
other built-in types, that would help all of their acceptance chances,
IMHO). But a patch to change ONE method out of dozens, I suspect, would
be shot down -- the slight, useful extra functionality might be judged
to not be worth the increase in inconsistency in this area (which IMHO
must, sadly, count as a wart in today's Python, sigh).
Alex

consistent with string.split(), and as far as I'm aware, it should not
cause any sizeable performance penalty. But the most important reason
is that keyword parameters for often-unused options make code more
readable; for example,

mystring.split(maxsplit=2)

reads better than:

mystring.,split(None, 2)

That's my opinion, anyway...

Jul 18 '05 #8
al*****@yahoo.com (Alex Martelli) writes:
Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,


[...]

This whole area isn't particularly pretty. In general it would be
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like
psyco. METH_O, METH_NOARGS are a step in this direction -- but you
can't pass a keyword argument to a METH_O function (not that one would
want to, very often, but it's still a potential inconsistency).

I wonder what Pyrex does...

My thoughts on this area, like many others, can probably be summarized
as "I hate C".

Cheers,
mwh

--
Enlightenment is probably antithetical to impatience.
-- Erik Naggum, comp.lang.lisp
Jul 18 '05 #9
Michael Hudson <mw*@python.net> wrote:
al*****@yahoo.com (Alex Martelli) writes:
Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,
[...]

This whole area isn't particularly pretty. In general it would be


Indeed, it isn't.
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like
....and consistency with the way Python-coded functions work.
psyco. METH_O, METH_NOARGS are a step in this direction -- but you
can't pass a keyword argument to a METH_O function (not that one would
want to, very often, but it's still a potential inconsistency).
Right; it could be remedied by letting a macro otherwise equivalent to
METH_O know about that one argument's name.

I wonder what Pyrex does...
for:
def example(aa, bb):
pass

it generates (name mangling apart, I'm demangling for legibility):

static PyObject* example(PyObject *self, PyObject *args, PyObject *kwds)
{
PyObject *aa = 0;
PyObject *bb = 0;
static char *argnames[] = {"aa", "bb", 0};

if(!PyArg_ParseTupleAndKeywords(args,kwds,"OO",arg names,&aa,&bb))
return 0;

etc, etc, and METH_VARARGS|METH_KEYWORDS in the PyMethodDef array. IOW,
nothing strange, and all correct, it seems to me.
Alex

My thoughts on this area, like many others, can probably be summarized
as "I hate C".

Cheers,
mwh

Jul 18 '05 #10
al*****@yahoo.com (Alex Martelli) writes:
Michael Hudson <mw*@python.net> wrote:
al*****@yahoo.com (Alex Martelli) writes:
Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,


[...]

This whole area isn't particularly pretty. In general it would be


Indeed, it isn't.
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like


...and consistency with the way Python-coded functions work.


Heh, yes, that too :-)
psyco. METH_O, METH_NOARGS are a step in this direction -- but you
can't pass a keyword argument to a METH_O function (not that one would
want to, very often, but it's still a potential inconsistency).


Right; it could be remedied by letting a macro otherwise equivalent to
METH_O know about that one argument's name.


But... how? I guess the PyMethodDef struct could grow an ml_signature
field... wouldn't it be nice if you could do:

static PyObject*
foo(PyObject* ob, int index)
{
...;
}

PyMethodDef methods[] = {
{"foo", foo, "O[ob]i[index]", "docstring"},
{NULL, NULL}
}

? Even nicer if you didn't have to write the signature by hand.

Unfortunately, I don't think you can do this in standard C.
I wonder what Pyrex does...


for:
def example(aa, bb):
pass

it generates (name mangling apart, I'm demangling for legibility):

static PyObject* example(PyObject *self, PyObject *args, PyObject *kwds)
{
PyObject *aa = 0;
PyObject *bb = 0;
static char *argnames[] = {"aa", "bb", 0};

if(!PyArg_ParseTupleAndKeywords(args,kwds,"OO",arg names,&aa,&bb))
return 0;

etc, etc, and METH_VARARGS|METH_KEYWORDS in the PyMethodDef array. IOW,
nothing strange, and all correct, it seems to me.


Cool. I should use pyrex more, I suspect.

Cheers,
mwh

--
As it seems to me, in Perl you have to be an expert to correctly make
a nested data structure like, say, a list of hashes of instances. In
Python, you have to be an idiot not to be able to do it, because you
just write it down. -- Peter Norvig, comp.lang.functional
Jul 18 '05 #11
Michael Hudson <mw*@python.net> wrote:
...
Right; it could be remedied by letting a macro otherwise equivalent to
METH_O know about that one argument's name.
But... how? I guess the PyMethodDef struct could grow an ml_signature
field... wouldn't it be nice if you could do:


Right, something like that. As long as we need backwards compatibility
(==all the way to 3.0) that needs to be handled with care, of course...

static PyObject*
foo(PyObject* ob, int index)
{
...;
}

PyMethodDef methods[] = {
{"foo", foo, "O[ob]i[index]", "docstring"},
{NULL, NULL}
}

? Even nicer if you didn't have to write the signature by hand.

Unfortunately, I don't think you can do this in standard C.


I don't think so, either -- unless you put macros in TWO places,
perhaps:

DEF_PYFUN(foo, (PyObject* ob, int index))
{
...
}

PyMethodDef methods[] = {
REF_PYFUN(foo, "docstring"),
{0}
};

This, I suspect, might be possible, with DEF_PYFUN stashing the sig
string someplace (e.g. in a __def_pyfun__foo global) and REF_PYFUN
pulling out a reference to it...
nothing strange, and all correct, it seems to me.


Cool. I should use pyrex more, I suspect.


Me too, I suspect -- it's really a cool way to write extensions for
Python.
Alex
Jul 18 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: MetalOne | last post by:
string.split("") ==> string.split("",",") ==> I did not expect these to have different outputs. I have a string with comma delimited numbers. There can be zero or more numbers in the string...
10
by: Derek Basch | last post by:
Hello, I have a string like: string = "WHITE/CLARET/PINK/XL" which I need to alter to this format: string = "WHITE-CLARET-PINK/XL"
4
by: Henry Chen | last post by:
Hi, I have a string that needs to be parsed into the string. The separator is not char. It is something like " at ". With current string.Split function, it doesn't work. Is there any exist...
6
by: Senthil | last post by:
Code ---------------------- string Line = "\"A\",\"B\",\"C\",\"D\""; string Line2 = Line.Replace("\",\"","\"\",\"\""); string CSVColumns = Line2.Split("\",\"".ToCharArray());
19
by: David Logan | last post by:
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple...
3
by: Rico | last post by:
If there are consecutive occurrences of characters from the given delimiter, String.Split() and Regex.Split() produce an empty string as the token that's between such consecutive occurrences. It...
5
by: kurt sune | last post by:
The code: Dim aLine As String = "cat" & vbNewLine & "dog" & vbNewLine & "fox" & vbNewLine Dim csvColumns1 As String() = aLine.Split(vbNewLine, vbCr, vbLf) Dim csvColumns2 As String() =...
3
by: David Pratt | last post by:
Hi. I am splitting a string on a non whitespace character. One or more whitespace characters can be returned as items in the list. I do not want the items in the list that are only whitespace (can...
4
by: Michele Petrazzo | last post by:
Hello ng, I don't understand why split (string split) doesn't work with the same method if I can't pass values or if I pass a whitespace value: >>> "".split() >>> "".split(" ") But into...
2
by: Shawn Minisall | last post by:
I'm trying to unpack a list of 5 floats from a list read from a file and python is telling me 5 variables are too many for the string.split statement. Anyone have any other idea's? NOTE: the only...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.