473,395 Members | 1,870 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

strip() using strings instead of chars

In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:

if url.startswith('http://'):
url = url[7:]

Similarly for stripping suffixes:

if filename.endswith('.html'):
filename = filename[:-5]

My problem with this is that it's cumbersome and error prone to count
the number of chars of the prefix or suffix. If you want to change it
from 'http://' to 'https://', you must not forget to change the 7 to 8.
If you write len('http://') instead of the 7, you see this is actually
a DRY problem.

Things get even worse if you have several prefixes to consider:

if url.startswith('http://'):
url = url[7:]
elif url.startswith('https://'):
url = url[8:]

You can't take use of url.startswith(('http://', 'https://')) here.

Here is another concrete example taken from the standard lib:

if chars.startswith(BOM_UTF8):
chars = chars[3:].decode("utf-8")

This avoids hardcoding the BOM_UTF8, but its length is still hardcoded,
and the programmer had to know it or look it up when writing this line.

So my suggestion is to add another string method, say "stripstr" that
behaves like "strip", but instead of stripping *characters* strips
*strings* (similarly for lstrip and rstrip). Then in the case above,
you could simply write url = url.lstripstr('http://') or
url = url.lstripstr(('http://', 'https://')).

The new function would actually comprise the old strip function, you
would have strip('aeiou') == stripstr(set('aeio')).

Instead of a new function, we could also add another parameter to strip
(lstrip, rstrip) for passing strings or changing the behavior, or we
could create functions with the signature of startswith and endswith
which instead of only checking whether the string starts or ends with
the substring, remove the substring (startswith and endswith have
additional "start" and "end" index parameters that may be useful).

Or did I overlook anything and there is already a good idiom for this?

Btw, in most other languages, "strip" is called "trim" and behaves
like Python's strip, i.e. considers the parameter as a set of chars.
There is one notable exception: In MySQL, trim behaves like stripstr
proposed above (differently to SQLite, PostgreSQL and Oracle).

-- Christoph
Jul 11 '08 #1
6 4108
Christoph Zwerschke a écrit :
In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:

if url.startswith('http://'):
url = url[7:]
DRY/SPOT violation. Should be written as :

prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]

(snip)
My problem with this is that it's cumbersome and error prone to count
the number of chars of the prefix or suffix.
cf above
If you want to change it
from 'http://' to 'https://', you must not forget to change the 7 to 8.
If you write len('http://') instead of the 7, you see this is actually
a DRY problem.
cf above
Things get even worse if you have several prefixes to consider:

if url.startswith('http://'):
url = url[7:]
elif url.startswith('https://'):
url = url[8:]

You can't take use of url.startswith(('http://', 'https://')) here.
for prefix in ('http://', 'https://'):
if url.startswith(prefix):
url = url[len(prefix):]
break

For most complex use case, you may want to consider regexps,
specifically re.sub:
>>import re
pat = re.compile(r"(^https?://|\.txt$)")
urls = ['http://toto.com', 'https://titi.com', 'tutu.com',
'file://tata.txt']
>>[pat.sub('', u) for u in urls]
['toto.com', 'titi.com', 'tutu.com', 'file://tata']
Not to dismiss your suggestion, but I thought you might like to know how
to solve your problem with what's currently available !-)

Jul 11 '08 #2
Bruno Desthuilliers schrieb:
DRY/SPOT violation. Should be written as :

prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]
That was exactly my point. This formulation is a bit better, but it
still violates DRY, because you need to type "prefix" two times. It is
exactly this idiom that I see so often and that I wanted to simplify.
Your suggestions work, but I somehow feel such a simple task should have
a simpler formulation in Python, i.e. something like

url = url.lstripstr(('http://', 'https://'))

instead of

for prefix in ('http://', 'https://'):
if url.startswith(prefix):
url = url[len(prefix):]
break

-- Christoph
Jul 11 '08 #3
On Fri, 11 Jul 2008 16:45:20 +0200, Christoph Zwerschke wrote:
Bruno Desthuilliers schrieb:
>DRY/SPOT violation. Should be written as :

prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]

That was exactly my point. This formulation is a bit better, but it
still violates DRY, because you need to type "prefix" two times. It is
exactly this idiom that I see so often and that I wanted to simplify.
Your suggestions work, but I somehow feel such a simple task should have
a simpler formulation in Python, i.e. something like

url = url.lstripstr(('http://', 'https://'))
I would prefer a name like `remove_prefix()` instead of a variant with
`strip` and abbreviations in it.

Ciao,
Marc 'BlackJack' Rintsch
Jul 11 '08 #4
Christoph Zwerschke <ci**@online.dewrote:
In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:

if url.startswith('http://'):
url = url[7:]
If I came across this code I'd want to know why they weren't using
urlparse.urlsplit()...
>
Similarly for stripping suffixes:

if filename.endswith('.html'):
filename = filename[:-5]
.... and I'd want to know why os.path.splitext() wasn't appropriate here.
>
My problem with this is that it's cumbersome and error prone to count
the number of chars of the prefix or suffix. If you want to change it
from 'http://' to 'https://', you must not forget to change the 7 to 8.
If you write len('http://') instead of the 7, you see this is actually
a DRY problem.

Things get even worse if you have several prefixes to consider:

if url.startswith('http://'):
url = url[7:]
elif url.startswith('https://'):
url = url[8:]

You can't take use of url.startswith(('http://', 'https://')) here.
No you can't, so you definitely want to be parsing the URL properly. I
can't actually think of a use for stripping off the scheme without either
saving it somewhere or doing further parsing of the url.
Jul 11 '08 #5
Duncan Booth schrieb:
>if url.startswith('http://'):
url = url[7:]

If I came across this code I'd want to know why they weren't using
urlparse.urlsplit()...
Right, such code can have a smell since in the case of urls, file names,
config options etc. there are specialized functions available. But I'm
not sure whether the need for removing string prefix/suffixes in general
is really so rare that we shouldn't worry to offer a simpler solution.

-- Christoph
Jul 12 '08 #6
Christoph Zwerschke <ci**@online.dewrote:
Duncan Booth schrieb:
>>if url.startswith('http://'):
url = url[7:]

If I came across this code I'd want to know why they weren't using
urlparse.urlsplit()...

Right, such code can have a smell since in the case of urls, file names,
config options etc. there are specialized functions available. But I'm
not sure whether the need for removing string prefix/suffixes in general
is really so rare that we shouldn't worry to offer a simpler solution.
One of the great things about Python is that it resists bloating the
builtin classes with lots of methods that just seem like a good idea at the
time. If a lot of people make a case for this function then it might get
added, but I think it is unlikely given how simple it is to write a
function to do this for yourself.
Jul 12 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Nikolay Petrov | last post by:
I need a way to strip chars from a string. The chars are all chars that are not allowed in file path. TIA
5
by: dan.j.weber | last post by:
I'm using Python 2.3.5 and when I type the following in the interactive prompt I see that strip() is not working as advertised: >>>s = 'p p:p' >>>s.strip(' :') 'p p:p' Is this just me or...
2
by: ImageAnalyst | last post by:
Tom, Nikolay: That code doesn't work, at least not in VS2005. What happens is that when you replace with VBNullChar, it basically chops off the string from that point onwards. So Sna?*|fu" would...
6
by: eight02645999 | last post by:
hi can someone explain strip() for these : 'example' when i did this: 'abcd,words.words'
3
by: Drum2001 | last post by:
Hello, I have a textbox "Fname" where users input what they would like a filename to be. I would like to strip out all invalid characters with an "After Update" Event. I have searched other...
3
by: Colin J. Williams | last post by:
The Library Reference has strip( ) Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed....
4
by: Ethan Furman | last post by:
Greetings. The strip() method of strings works from both ends towards the middle. Is there a simple, built-in way to remove several characters from a string no matter their location? (besides...
0
by: Maric Michaud | last post by:
Le Monday 16 June 2008 18:58:06 Ethan Furman, vous avez écrit : As Larry Bates said the python way is to use str.join, but I'd do it with a genexp for memory saving, and a set to get O(1) test...
4
by: Poppy | last post by:
I'm using versions 2.5.2 and 2.5.1 of python and have encountered a potential bug. Not sure if I'm misunderstanding the usage of the strip function but here's my example. var = "detail.xml"...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.