new string method in 2.5 (partition)

John Salerno

Forgive my excitement, especially if you are already aware of this, but
this seems like the kind of feature that is easily overlooked (yet could
be very useful):
Both 8-bit and Unicode strings have new partition(sep) and
rpartition(sep) methods that simplify a common use case.
The find(S) method is often used to get an index which is then used to
slice the string and obtain the pieces that are before and after the
separator. partition(sep) condenses this pattern into a single method
call that returns a 3-tuple containing the substring before the
separator, the separator itself, and the substring after the separator.
If the separator isn't found, the first element of the tuple is the
entire string and the other two elements are empty. rpartition(sep) also
returns a 3-tuple but starts searching from the end of the string; the
"r" stands for 'reverse'.

Some examples:

>>('http://www.python.org').partition('://')

('http', '://', 'www.python.org')

>>('file:/usr/share/doc/index.html').partition('://')

('file:/usr/share/doc/index.html', '', '')

>>(u'Subject: a quick question').partition(':')

(u'Subject', u':', u' a quick question')

>>'www.python.org'.rpartition('.')

('www.python', '.', 'org')

>>'www.python.org'.rpartition(':')

('', '', 'www.python.org')

(Implemented by Fredrik Lundh following a suggestion by Raymond Hettinger.)

Sep 19 '06 #1

Subscribe Reply

12910

metaperl

sweet thanks for the heads up.

John Salerno wrote:

Forgive my excitement, especially if you are already aware of this, but
this seems like the kind of feature that is easily overlooked (yet could
be very useful):
Both 8-bit and Unicode strings have new partition(sep) and
rpartition(sep) methods that simplify a common use case.
The find(S) method is often used to get an index which is then used to
slice the string and obtain the pieces that are before and after the
separator. partition(sep) condenses this pattern into a single method
call that returns a 3-tuple containing the substring before the
separator, the separator itself, and the substring after the separator.
If the separator isn't found, the first element of the tuple is the
entire string and the other two elements are empty. rpartition(sep) also
returns a 3-tuple but starts searching from the end of the string; the
"r" stands for 'reverse'.

Some examples:

>>('http://www.python.org').partition('://')

('http', '://', 'www.python.org')

>>('file:/usr/share/doc/index.html').partition('://')

('file:/usr/share/doc/index.html', '', '')

>>(u'Subject: a quick question').partition(':')

(u'Subject', u':', u' a quick question')

>>'www.python.org'.rpartition('.')

('www.python', '.', 'org')

>>'www.python.org'.rpartition(':')

('', '', 'www.python.org')

(Implemented by Fredrik Lundh following a suggestion by Raymond Hettinger.)

Sep 19 '06 #2

richard.charts

I'm confused.
What's the difference between this and string.split?

John Salerno wrote:

Forgive my excitement, especially if you are already aware of this, but
this seems like the kind of feature that is easily overlooked (yet could
be very useful):
Both 8-bit and Unicode strings have new partition(sep) and
rpartition(sep) methods that simplify a common use case.
The find(S) method is often used to get an index which is then used to
slice the string and obtain the pieces that are before and after the
separator. partition(sep) condenses this pattern into a single method
call that returns a 3-tuple containing the substring before the
separator, the separator itself, and the substring after the separator.
If the separator isn't found, the first element of the tuple is the
entire string and the other two elements are empty. rpartition(sep) also
returns a 3-tuple but starts searching from the end of the string; the
"r" stands for 'reverse'.

Some examples:

>>('http://www.python.org').partition('://')

('http', '://', 'www.python.org')

>>('file:/usr/share/doc/index.html').partition('://')

('file:/usr/share/doc/index.html', '', '')

>>(u'Subject: a quick question').partition(':')

(u'Subject', u':', u' a quick question')

>>'www.python.org'.rpartition('.')

('www.python', '.', 'org')

>>'www.python.org'.rpartition(':')

('', '', 'www.python.org')

(Implemented by Fredrik Lundh following a suggestion by Raymond Hettinger.)

Sep 19 '06 #3

Lawrence Oluyede

ri************@gmail.com <ri************@gmail.comwrote:

What's the difference between this and string.split?

>>('http://www.python.org').partition('://')

('http', '://', 'www.python.org')

>>('http://www.python.org').split('://')

['http', 'www.python.org']

--
Lawrence - http://www.oluyede.org/blog
"Nothing is more dangerous than an idea
if it's the only one you have" - E. A. Chartier

Sep 19 '06 #4

John Salerno

ri************@gmail.com wrote:

I'm confused.
What's the difference between this and string.split?

>>s = 'hello, world'

>>s.split(',')

['hello', ' world']

>>s.partition(',')

('hello', ',', ' world')
split returns a list of the substrings on either side of the specified
argument.

partition returns a tuple of the substring on the left of the argument,
the argument itself, and the substring on the right. rpartition reads
from right to left.
But you raise a good point. Notice this:

>>s = 'hello, world, how are you'

>>s.split(',')

['hello', ' world', ' how are you']

>>s.partition(',')

('hello', ',', ' world, how are you')

split will return all substrings. partition (and rpartition) only return
the substrings before and after the first occurrence of the argument.

Sep 19 '06 #5

Bruno Desthuilliers

John Salerno a écrit :

Forgive my excitement, especially if you are already aware of this, but
this seems like the kind of feature that is easily overlooked (yet could
be very useful):
Both 8-bit and Unicode strings have new partition(sep) and
rpartition(sep) methods that simplify a common use case.
The find(S) method is often used to get an index which is then used to
slice the string and obtain the pieces that are before and after the
separator.

Err... is it me being dumb, or is it a perfect use case for str.split ?

partition(sep) condenses this pattern into a single method
call that returns a 3-tuple containing the substring before the
separator, the separator itself, and the substring after the separator.
If the separator isn't found, the first element of the tuple is the
entire string and the other two elements are empty. rpartition(sep) also
returns a 3-tuple but starts searching from the end of the string; the
"r" stands for 'reverse'.

Some examples:

>>('http://www.python.org').partition('://')

('http', '://', 'www.python.org')

>>('file:/usr/share/doc/index.html').partition('://')

('file:/usr/share/doc/index.html', '', '')

>>(u'Subject: a quick question').partition(':')

(u'Subject', u':', u' a quick question')

>>'www.python.org'.rpartition('.')

('www.python', '.', 'org')

>>'www.python.org'.rpartition(':')

('', '', 'www.python.org')

I must definitively be dumb, but so far I fail to see how it's better
than split and rsplit:

>>'http://www.python.org'.split('://')

['http', 'www.python.org']

>>'file:/usr/share/doc/index.html'.split('://')

['file:/usr/share/doc/index.html']

>>u'Subject: a quick question'.split(': ')

[u'Subject', u'a quick question']

>>u'Subject: a quick question'.rsplit(': ')

[u'Subject', u'a quick question']

>>'www.python.org'.rsplit('.', 1)

['www.python', 'org']

>>>

There are IMVHO much exciting new features in 2.5 (enhanced generators,
try/except/finally, ternary operator, with: statement etc...)

Sep 19 '06 #6

Tim Chase

>partition(sep) condenses this pattern into a single method

>call that returns a 3-tuple containing the substring before
the separator, the separator itself, and the substring after
the separator. If the separator isn't found, the first
element of the tuple is the entire string and the other two
elements are empty. rpartition(sep) also returns a 3-tuple
but starts searching from the end of the string; the "r"
stands for 'reverse'.

I'm confused. What's the difference between this and
string.split?

(please don't top-post...I've inverted and trimmed for the sake
of readability)

I too am a bit confused but I can see uses for it, and there
could be good underlying reason to do as much. Split doesn't
return the separator. It's also guarnteed to return a 3-tuple. E.g.

>>s1 = 'one'
s2 = 'one|two'
len(s1.split('|', 1)

>>len(s2.split('|', 1))

2

which could make a difference when doing tuple-assignment:

>>v1, v2 = s2.split('|', 1)
# works fine
v1, v2 = s1.split('|', 1)

[traceback]

whereas one could consistently do something like

>>v1, _, v2 = s1.partition('|')

without fear of a traceback to deal with.

Just a few thoughts...

-tkc

Sep 19 '06 #7

Tim Chase

But you raise a good point. Notice this:

>

>>s = 'hello, world, how are you'

>>s.split(',')

['hello', ' world', ' how are you']

>>s.partition(',')

('hello', ',', ' world, how are you')

split will return all substrings. partition (and rpartition) only return
the substrings before and after the first occurrence of the argument.

The split()/rsplit() functions do take an optional argument for
the maximum number of splits to make, just FYI...

>>help("".split)

Help on built-in function split:

split(...)
S.split([sep [,maxsplit]]) -list of strings

Return a list of the words in the string S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator.

(as I use this on a regular basis when mashing up various text
files in a data conversion process)

-tkc

Sep 19 '06 #8

John Salerno

Bruno Desthuilliers wrote:

Err... is it me being dumb, or is it a perfect use case for str.split ?

Hmm, I suppose you could get nearly the same functionality as using
split(':', 1), but with partition you also get the separator returned as
well.

There are IMVHO much exciting new features in 2.5 (enhanced generators,
try/except/finally, ternary operator, with: statement etc...)

I definitely agree, but I figure everyone knows about those already.
There are also the startswith() and endswith() string methods that are
new and seem neat as well.

Sep 19 '06 #9

Thomas Heller

John Salerno schrieb:

Bruno Desthuilliers wrote:

>Err... is it me being dumb, or is it a perfect use case for str.split ?

Hmm, I suppose you could get nearly the same functionality as using
split(':', 1), but with partition you also get the separator returned as
well.

Well, x.split(":", 1) returns a list of one or two elements, depending on x,
while x.partition(":") always returns a three-tuple.

Thomas

Sep 19 '06 #10

George Sakkis

Bruno Desthuilliers wrote:

I must definitively be dumb, but so far I fail to see how it's better
than split and rsplit:

I fail to see it too. What's the point of returning the separator since
the caller passes it anyway* ?

George

* unless the separator can be a regex, but I don't think so.

Sep 19 '06 #11

Larry Bates

John Salerno wrote:

Bruno Desthuilliers wrote:

>Err... is it me being dumb, or is it a perfect use case for str.split ?

Hmm, I suppose you could get nearly the same functionality as using
split(':', 1), but with partition you also get the separator returned as
well.

>There are IMVHO much exciting new features in 2.5 (enhanced
generators, try/except/finally, ternary operator, with: statement etc...)

I definitely agree, but I figure everyone knows about those already.
There are also the startswith() and endswith() string methods that are
new and seem neat as well.

FYI- .startswith() and .endswith() string methods aren't new in 2.5.
They have been around since at least 2.3.

Larry Bates

Sep 19 '06 #12

John Salerno

Larry Bates wrote:

John Salerno wrote:
>Bruno Desthuilliers wrote:

>>Err... is it me being dumb, or is it a perfect use case for str.split ?
Hmm, I suppose you could get nearly the same functionality as using
split(':', 1), but with partition you also get the separator returned as
well.

>>There are IMVHO much exciting new features in 2.5 (enhanced
generators, try/except/finally, ternary operator, with: statement etc...)
I definitely agree, but I figure everyone knows about those already.
There are also the startswith() and endswith() string methods that are
new and seem neat as well.

FYI- .startswith() and .endswith() string methods aren't new in 2.5.
They have been around since at least 2.3.

Larry Bates

Oops, just a slight change in their functionality:

The startswith() and endswith() methods of string types now accept
tuples of strings to check for.

def is_image_file (filename):
return filename.endswith(('.gif', '.jpg', '.tiff'))

(Implemented by Georg Brandl following a suggestion by Tom Lynn.)

Sep 19 '06 #13

Jack Diederich

On Tue, Sep 19, 2006 at 07:23:50PM +0000, John Salerno wrote:

Bruno Desthuilliers wrote:

Err... is it me being dumb, or is it a perfect use case for str.split ?

Hmm, I suppose you could get nearly the same functionality as using
split(':', 1), but with partition you also get the separator returned as
well.

There are IMVHO much exciting new features in 2.5 (enhanced generators,
try/except/finally, ternary operator, with: statement etc...)

I definitely agree, but I figure everyone knows about those already.
There are also the startswith() and endswith() string methods that are
new and seem neat as well.

Partition is much, much nicer than index() or find() for many
(but not all) applications.

diff for cgi.py parsing "var=X"
- i = p.find('=')
- if i >= 0:
- name = p[:i]
- value = p[i+1:]
+ (name, sep_found, value) = p.partition('=')

Notice that preserving the seperator makes for a nice boolean
to test if the partition was successful. Partition raises an
error if you pass an empty seperator.

parition also has the very desirable feature of returning the orignal
string when the seperator isn't found

ex/

script = 'foo.cgi?a=7'
script, sep, params = script.partition('?')

"script" will be "foo.cgi" even if there are no params. With
find or index you have to slice the string by hand and with split
you would do something like.

try:
script, params = script.split('?')
except ValueError: pass

or

parts = script.split('?', 1)
script = parts[0]
params = ''.join(parts[1:])
Grep your source for index, find, and split and try rewriting
the code with partition. Not every instance will turn out cleaner
but many will.

Long-live-partition-ly,

-Jack

Sep 19 '06 #14

Duncan Booth

"George Sakkis" <ge***********@gmail.comwrote:

Bruno Desthuilliers wrote:

>I must definitively be dumb, but so far I fail to see how it's better
than split and rsplit:

I fail to see it too. What's the point of returning the separator since
the caller passes it anyway* ?

The separator is only returned if it was found otherwise you get back an
empty string. Concatenating the elements of the tuple that is returned
always gives you the original string.

It is quite similar to using split(sep,1), but reduces the amount of
special case handling for cases where the separator isn't found.

Sep 19 '06 #15

Terry Reedy

"Bruno Desthuilliers" <bd*****************@free.quelquepart.frwrote in
message news:45***********************@news.free.fr...

>Err... is it me being dumb, or is it a perfect use case for str.split ?

s.partition() was invented and its design settled on as a result of looking
at some awkward constructions in the standard library and other actual use
cases. Sometimes it replaces s.find or s.index instead of s.split. In
some cases, it is meant to be used within a loop. I was not involved and
so would refer you to the pydev discussions.

tjr

Sep 19 '06 #16

MonkeeSage

s = "There should be one -- and preferably only one -- obvious way to
do it".partition('only one')
print s[0]+'more than one'+s[2]

;)

Regards,
Jordan

Sep 20 '06 #17

Irmen de Jong

Terry Reedy wrote:

"Bruno Desthuilliers" <bd*****************@free.quelquepart.frwrote in
message news:45***********************@news.free.fr...
>Err... is it me being dumb, or is it a perfect use case for str.split ?

s.partition() was invented and its design settled on as a result of looking
at some awkward constructions in the standard library and other actual use
cases. Sometimes it replaces s.find or s.index instead of s.split. In
some cases, it is meant to be used within a loop. I was not involved and
so would refer you to the pydev discussions.

While there is the functional aspect of the new partition method, I was
wondering about the following /technical/ aspect:

Because the result of partition is a non mutable tuple type containing
three substrings of the original string, is it perhaps also the case
that partition works without allocating extra memory for 3 new string
objects and copying the substrings into them?
I can imagine that the tuple type returned by partition is actually
a special object that contains a few internal pointers into the
original string to point at the locations of each substring.
Although a quick type check of the result object revealed that
it was just a regular tuple type, so I don't think the above is true...

--Irmen

Sep 20 '06 #18

Steve Holden

Irmen de Jong wrote:

Terry Reedy wrote:

>>"Bruno Desthuilliers" <bd*****************@free.quelquepart.frwrote in
message news:45***********************@news.free.fr...

>>>Err... is it me being dumb, or is it a perfect use case for str.split ?

s.partition() was invented and its design settled on as a result of looking
at some awkward constructions in the standard library and other actual use
cases. Sometimes it replaces s.find or s.index instead of s.split. In
some cases, it is meant to be used within a loop. I was not involved and
so would refer you to the pydev discussions.

While there is the functional aspect of the new partition method, I was
wondering about the following /technical/ aspect:

Because the result of partition is a non mutable tuple type containing
three substrings of the original string, is it perhaps also the case
that partition works without allocating extra memory for 3 new string
objects and copying the substrings into them?
I can imagine that the tuple type returned by partition is actually
a special object that contains a few internal pointers into the
original string to point at the locations of each substring.
Although a quick type check of the result object revealed that
it was just a regular tuple type, so I don't think the above is true...

It's not.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 20 '06 #19

Bruno Desthuilliers

John Salerno a écrit :

Bruno Desthuilliers wrote:

>Err... is it me being dumb, or is it a perfect use case for str.split ?

Hmm, I suppose you could get nearly the same functionality as using
split(':', 1), but with partition you also get the separator returned as
well.

Well, you already know it since you use it to either split() or
partition the string !-)

Not to say these two new methods are necessary useless - sometimes a
small improvement to an API greatly simplifies a lot of common use cases.

>There are IMVHO much exciting new features in 2.5 (enhanced
generators, try/except/finally, ternary operator, with: statement etc...)

I definitely agree, but I figure everyone knows about those already.
There are also the startswith() and endswith() string methods that are
new

Err... 'new' ???

and seem neat as well.

Sep 20 '06 #20

Gabriel Genellina

At Wednesday 20/9/2006 15:11, Irmen de Jong wrote:

>Because the result of partition is a non mutable tuple type containing
three substrings of the original string, is it perhaps also the case
that partition works without allocating extra memory for 3 new string
objects and copying the substrings into them?
I can imagine that the tuple type returned by partition is actually
a special object that contains a few internal pointers into the
original string to point at the locations of each substring.
Although a quick type check of the result object revealed that
it was just a regular tuple type, so I don't think the above is true...

Nope, a python string has both a length *and* a null terminator (for
ease of interfacing C routines, I guess) so you can't just share a substring.

Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Sep 20 '06 #21

Irmen de Jong

Gabriel Genellina wrote:

Nope, a python string has both a length *and* a null terminator (for
ease of interfacing C routines, I guess) so you can't just share a
substring.

Ofcourse, that makes perfect sense. Should have thought a little
bit further myself .... :)

--Irmen

Sep 20 '06 #22

Fredrik Lundh

Irmen de Jong wrote:

Because the result of partition is a non mutable tuple type containing
three substrings of the original string, is it perhaps also the case
that partition works without allocating extra memory for 3 new string
objects and copying the substrings into them?

nope. the core string type doesn't support sharing, and given the
typical use cases for partition, I doubt it would be more efficient
than actually creating the new strings.

(note that partition reuses the original string and the separator,
where possible)

(and yes, you're not the first one who thought of this. check the
python-dev archives from May this year for more background).

</F>

Sep 21 '06 #23

Lawrence D'Oliveiro

In message <ma**************************************@python.o rg>, Gabriel
Genellina wrote:

... a python string has both a length *and* a null terminator (for
ease of interfacing C routines ...

How does that work for strings with embedded nulls? Or are the C routines
simply fooled into seeing a truncated part of the string?

Sep 22 '06 #24

Duncan Booth

Lawrence D'Oliveiro <ld*@geek-central.gen.new_zealandwrote:

In message <ma**************************************@python.o rg>, Gabriel
Genellina wrote:

>... a python string has both a length *and* a null terminator (for
ease of interfacing C routines ...

How does that work for strings with embedded nulls? Or are the C routines
simply fooled into seeing a truncated part of the string?

If passed to a C library function it would mean that the C code would
generally only use up to the first embedded null. However the Python
standard library will usually check for nulls first so it can throw an
error:

>>with open('test.txt', 'r') as f:

.... print f.read()
....
Hello world

>>with open('test.txt\x00junk', 'r') as f:

.... print f.read()
....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: file() argument 1 must be (encoded string without NULL
bytes), not str

>>>

What actually happens is that Python argument parsing code will reject
values with embedded nulls if asked to convert a parameter to a C string
('s', 'z', 'es', or 'et' formats), but will allow them if converting to a C
string and a length ('s#', 'z#', 'es#', or 'et#').

Sep 22 '06 #25

Gabriel Genellina

At Friday 22/9/2006 04:53, Lawrence D'Oliveiro wrote:

... a python string has both a length *and* a null terminator (for
ease of interfacing C routines ...

How does that work for strings with embedded nulls? Or are the C routines
simply fooled into seeing a truncated part of the string?

This is for simple char* strings, ASCIIZ. If your C code can accept
embedded nulls, surely has made other provisions - like receiving the
buffer length as a parameter. If not, it will see only a truncated string.

Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Sep 22 '06 #26

new string method in 2.5 (partition)

Similar topics