471,339 Members | 1,415 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,339 software developers and data experts.

startswith( prefix[, start[, end]]) Query

Hi

startswith( prefix[, start[, end]]) States:

Return True if string starts with the prefix, otherwise return False.
prefix can also be a tuple of suffixes to look for. However when I try
and add a tuple of suffixes I get the following error:

Type Error: expected a character buffer object

For example:

file = f.readlines()
for line in file:
if line.startswith(("abc","df"))
CODE

It would generate the above error

To overcome this problem, I am currently just joining individual
startswith methods
i.e. if line.startswith("if") or line.startswith("df")
but know there must be a way to define all my suffixes in one tuple.

Thanks in advance

Sep 6 '07 #1
11 4950
cj***@bath.ac.uk wrote:
Hi

startswith( prefix[, start[, end]]) States:

Return True if string starts with the prefix, otherwise return False.
prefix can also be a tuple of suffixes to look for.
That particular aspect of the functionality (the multiple
prefixes in a tuple) was only added Python 2.5. If you're
using <= 2.4 you'll need to use "or" or some other approach,
eg looping over a sequence of prefixes.

TJG
Sep 6 '07 #2
On Sep 6, 7:09 am, cj...@bath.ac.uk wrote:
Hi

startswith( prefix[, start[, end]]) States:

Return True if string starts with the prefix, otherwise return False.
prefix can also be a tuple of suffixes to look for. However when I try
and add a tuple of suffixes I get the following error:

Type Error: expected a character buffer object

For example:

file = f.readlines()
for line in file:
if line.startswith(("abc","df"))
CODE

It would generate the above error
(snipped)

You see to be using an older version of Python.
For me it works as advertised with 2.5.1,
but runs into the problem you described with 2.4.4:

Python 2.5.1c1 (r251c1:54692, Apr 17 2007, 21:12:16)
[GCC 4.0.0 (Apple Computer, Inc. build 5026)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>line = "foobar"
if line.startswith(("foo", "bar")): print line
....
foobar
>>if line.startswith(("foo", "bar")):
.... print line
....
foobar
VS.

Python 2.4.4 (#1, Oct 18 2006, 10:34:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>line = "foobar"
if line.startswith(("foo", "bar")): print line
....
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: expected a character buffer object
--
Hope this helps,
Steven

Sep 6 '07 #3
cj***@bath.ac.uk a écrit :
Hi

startswith( prefix[, start[, end]]) States:

Return True if string starts with the prefix, otherwise return False.
prefix can also be a tuple of suffixes to look for. However when I try
and add a tuple of suffixes I get the following error:

Type Error: expected a character buffer object

For example:

file = f.readlines()
for line in file:
slightly OT, but:
1/ you should not use 'file' as an identifier, it shadowas the builtin
file type
2/ FWIW, it's also a pretty bad naming choice for a list of lines - why
not just name this list 'lines' ?-)
3/ anyway, unless you need to store this whole list in memory, you'd be
better using the iterator idiom (Python files are iterables):

f = open('some_file.ext')
for line in f:
print line

if line.startswith(("abc","df"))
CODE

It would generate the above error
May I suggest that you read the appropriate version of the doc ? That
is, the one corresponding to your installed Python version ?-)

Passing a tuple to str.startswith is new in 2.5. I bet you're trying it
on a 2.4 or older version.
To overcome this problem, I am currently just joining individual
startswith methods
i.e. if line.startswith("if") or line.startswith("df")
but know there must be a way to define all my suffixes in one tuple.
You may want to try with a regexp, but I'm not sure it's worth it (hint:
the timeit module is great for quick small benchmarks).

Else, you could as well write your own testing function:

def str_starts_with(astring, *prefixes):
startswith = astring.startswith
for prefix in prefixes:
if startswith(prefix):
return true
return false

for line in f:
if str_starts_with(line, 'abc, 'de', 'xxx'):
# CODE HERE

HTH
Sep 6 '07 #4
On 06/09/07, Bruno Desthuilliers
<br********************@wtf.websiteburo.oops.comwr ote:
>
You may want to try with a regexp, but I'm not sure it's worth it (hint:
the timeit module is great for quick small benchmarks).

Else, you could as well write your own testing function:

def str_starts_with(astring, *prefixes):
startswith = astring.startswith
for prefix in prefixes:
if startswith(prefix):
return true
return false

for line in f:
if str_starts_with(line, 'abc, 'de', 'xxx'):
# CODE HERE
Isn't slicing still faster than startswith? As you mention timeit,
then you should probably add slicing to the pot too :)

if astring[:len(prefix)] == prefix:
do_stuff()

:)
Sep 6 '07 #5
Else, you could as well write your own testing function:

def str_starts_with(astring, *prefixes):
startswith = astring.startswith
for prefix in prefixes:
if startswith(prefix):
return true
return false
What is the reason for
startswith = astring.startswith
startswith(prefix)

instead of
astring.startswith(prefix)

Sep 7 '07 #6
TheFlyingDutchman wrote:
>Else, you could as well write your own testing function:

def str_starts_with(astring, *prefixes):
startswith = astring.startswith
for prefix in prefixes:
if startswith(prefix):
return true
return false

What is the reason for
startswith = astring.startswith
startswith(prefix)

instead of
astring.startswith(prefix)
It's an optimization: the assigment creates a "bound method" (i.e. a
method associated with a specific string instance) and avoids having to
look up the startswith method of astring for each iteration of the inner
loop.

Probably not really necessary, though, and they do say that premature
optimization is the root of all evil ...

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Sep 7 '07 #7
Steve Holden a écrit :
TheFlyingDutchman wrote:
>>Else, you could as well write your own testing function:

def str_starts_with(astring, *prefixes):
startswith = astring.startswith
for prefix in prefixes:
if startswith(prefix):
return true
return false

What is the reason for
startswith = astring.startswith
startswith(prefix)

instead of
astring.startswith(prefix)
It's an optimization: the assigment creates a "bound method" (i.e. a
method associated with a specific string instance) and avoids having to
look up the startswith method of astring for each iteration of the inner
loop.

Probably not really necessary, though, and they do say that premature
optimization is the root of all evil ...
I wouldn't call this one "premature" optimization, since it doesn't
change the algorithm, doesn't introduce (much) complication, and is
proven to really save on lookup time.

Now I do agree that unless you have quite a lot of prefixes to test, it
might not be that necessary in this particular case...
Sep 7 '07 #8
"Tim Williams" <li********@tdw.netwrote:
Isn't slicing still faster than startswith? As you mention timeit,
then you should probably add slicing to the pot too :)
Possibly, but there are so many other factors that affect the timing
that writing it clearly should be your first choice.

Some timings:

@echo off
setlocal
cd \python25\lib
echo "startswith"
...\python timeit.py -s "s='abracadabra1'*1000;t='abracadabra2'" s.startswith(t)
...\python timeit.py -s "s='abracadabra1'*1000;t='abracadabra1'" s.startswith(t)
echo "prebound startswith"
...\python timeit.py -s "s='abracadabra1'*1000;t='abracadabra2';startswith =s.startswith" startswith(t)
...\python timeit.py -s "s='abracadabra1'*1000;t='abracadabra1';startswith =s.startswith" startswith(t)
echo "slice with len"
...\python timeit.py -s "s='abracadabra1'*1000;t='abracadabra2'" s[:len(t)]==t
...\python timeit.py -s "s='abracadabra1'*1000;t='abracadabra1'" s[:len(t)]==t
echo "slice with magic number"
...\python timeit.py -s "s='abracadabra1'*1000;t='abracadabra2'" s[:12]==t
...\python timeit.py -s "s='abracadabra1'*1000;t='abracadabra1'" s[:12]==t

and typical output from this is:

"startswith"
1000000 loops, best of 3: 0.542 usec per loop
1000000 loops, best of 3: 0.514 usec per loop
"prebound startswith"
1000000 loops, best of 3: 0.472 usec per loop
1000000 loops, best of 3: 0.474 usec per loop
"slice with len"
1000000 loops, best of 3: 0.501 usec per loop
1000000 loops, best of 3: 0.456 usec per loop
"slice with magic number"
1000000 loops, best of 3: 0.34 usec per loop
1000000 loops, best of 3: 0.315 usec per loop

So for these particular strings, the naive slice wins if the comparison is
true, but loses to the pre-bound method if the comparison fails. The slice is
taking a hit from calling len every time, so pre-calculating the length
(which should be possible in the same situations as pre-binding startswith)
might be worthwhile, but I would still favour using startswith unless I knew
the code was time critical.
Sep 7 '07 #9
Bruno Desthuilliers wrote:
Steve Holden a écrit :
[...]
>>
Probably not really necessary, though, and they do say that premature
optimization is the root of all evil ...

I wouldn't call this one "premature" optimization, since it doesn't
change the algorithm, doesn't introduce (much) complication, and is
proven to really save on lookup time.

Now I do agree that unless you have quite a lot of prefixes to test, it
might not be that necessary in this particular case...
The defense rests.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Sep 7 '07 #10
Duncan Booth <du**********@invalid.invalidwrote in
news:Xn*************************@127.0.0.1:

I went through your example to get timings for my machine, and I
ran into an issue I didn't expect.

My bat file did the following 10 times in a row:
(the command line wraps in this post)

call timeit -s "s='abracadabra1'*1000;t='abracadabra2';
startswith=s.startswith" startswith(t)
.... giving me these times:

1000000 loops, best of 3: 0.483 usec per loop
1000000 loops, best of 3: 0.49 usec per loop
1000000 loops, best of 3: 0.489 usec per loop
1000000 loops, best of 3: 0.491 usec per loop
1000000 loops, best of 3: 0.488 usec per loop
1000000 loops, best of 3: 0.492 usec per loop
1000000 loops, best of 3: 0.49 usec per loop
1000000 loops, best of 3: 0.493 usec per loop
1000000 loops, best of 3: 0.486 usec per loop
1000000 loops, best of 3: 0.489 usec per loop

Then I thought that a shorter name for the lookup might affect the
timings, so I changed the bat file, which now did the following 10
times in a row:

timeit -s "s='abracadabra1'* 1000;t='abracadabra2';
sw=s.startswith" sw(t)

.... giving me these times:
1000000 loops, best of 3: 0.516 usec per loop
1000000 loops, best of 3: 0.512 usec per loop
1000000 loops, best of 3: 0.514 usec per loop
1000000 loops, best of 3: 0.517 usec per loop
1000000 loops, best of 3: 0.515 usec per loop
1000000 loops, best of 3: 0.518 usec per loop
1000000 loops, best of 3: 0.523 usec per loop
1000000 loops, best of 3: 0.513 usec per loop
1000000 loops, best of 3: 0.514 usec per loop
1000000 loops, best of 3: 0.515 usec per loop

In other words, the shorter name did seem to affect the timings,
but in a negative way. Why it would actually change at all is
beyond me, but it is consistently this way on my machine.

Can anyone explain this?

--
rzed
Sep 8 '07 #11
Steve Holden a écrit :
Bruno Desthuilliers wrote:
>Steve Holden a écrit :

[...]
>>>
Probably not really necessary, though, and they do say that premature
optimization is the root of all evil ...


I wouldn't call this one "premature" optimization, since it doesn't
change the algorithm, doesn't introduce (much) complication, and is
proven to really save on lookup time.

Now I do agree that unless you have quite a lot of prefixes to test,
it might not be that necessary in this particular case...


The defense rests.
Sorry, I don't understand this one (please bare with a poor french boy).

Sep 11 '07 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

39 posts views Thread by Erlend Fuglum | last post: by
8 posts views Thread by Christian Gudrian | last post: by
11 posts views Thread by Dan Sugalski | last post: by
8 posts views Thread by js | last post: by
4 posts views Thread by =?utf-8?B?Qm9yaXMgRHXFoWVr?= | last post: by
4 posts views Thread by Deckarep | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.