470,636 Members | 1,507 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,636 developers. It's quick & easy.

Better way to sift parts of URL . . .

I am working on a script that splits a URL into a page and a url. The
examples below are the conditions I expect a user to pass to the
script. In all cases, "http://www.example.org/test/" is the URL, and
the page comprises parts that have upper case letters (note, 5 & 6 are
the same as earlier examples, sans the 'test').

1. http://www.example.org/test/Main/AnotherPage (page =
Main/AnotherPage)
2. http://www.example.org/test/Main (page = Main + '/' +
default_page)
3. http://www.example.org/test (page = default_group + '/' +
default_page)
4. http://www.example.org/test/ (page = default_group + '/' +
default_page)
5. http://www.example.org/ (page = default_group + '/' +
default_page)
6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)

Right now, I'm doing a simple split off condition 1:

page = '.'.join(in.split('/')[-2:])
url = '/'.join(in.split('/')[:-2]) + '/'

Before I start winding my way down a complex path, I wanted to see if
anybody had an elegant approach to this problem.

Thanks in advance.
Ben

Apr 18 '06 #1
6 1291
Here is what I came up with:

def siftUrl(s):
s = s.split('//')[1]
bits = s.split('/')

if '' in bits: bits.remove('')
if len(bits) > 1:
group = bits[-2]
page = bits[-1]
group.strip('/')
page.strip('/')
else:
group = 'test'
page = 'test'

if group == group.capitalize():
page = '/'.join([group,page])
url = '/'.join(s.split('/')[:-2]) + '/'
elif page == page.capitalize():
page = '/'.join(['Main',page])
url = '/'.join(s.split('/')[:-1]) + '/'
else:
page = '/'.join(['Main','Main'])
url = s

url = 'http://' + url
return url, page

Apr 18 '06 #2

Ben> I am working on a script that splits a URL into a page and a
Ben> url.

I couldn't tell quite what you mean to accomplish from your example. (In
particular, I don't know what you mean by "default_group", as it's never
defined, and I don't know why the desired output of examples 1 and 6 is the
same, since the URLs are clearly different.) You don't mention having tried
the urlparse module, so I thought I should ask: have you tried using
urlparse?

Skip
Apr 18 '06 #3
Sorry.

I'm writing a python script that retrieves source contents of a wiki
page, edits, and re-posts changed content. The wiki breaks pages into
groups and pages (e.g. ThisGroup/ThisPage). The sections that are
camel cased (or otherwise contain title case) are the group and page
for a given page. When a url is passed that is incomplete (i.e., has
the base URL and the Group, or only the base URL), the wiki resorts to
defaults (e.g. a base URL and Group would return the default page for
that group, and a bare URL returns the base page for the base group).

I'm playing with urlparse now. Looks like I can do the same thing in a
lot fewer steps. I'll post results.

Ben

On 4/18/06, sk**@pobox.com <sk**@pobox.com> wrote:

Ben> I am working on a script that splits a URL into a page and a
Ben> url.

I couldn't tell quite what you mean to accomplish from your example. (In
particular, I don't know what you mean by "default_group", as it's never
defined, and I don't know why the desired output of examples 1 and 6 is the
same, since the URLs are clearly different.) You don't mention having tried
the urlparse module, so I thought I should ask: have you tried using
urlparse?

Skip

--
Ben Wilson
" Mundus vult decipi, ergo decipiatur"
Apr 18 '06 #4
"Ben Wilson" <da****@gmail.com> wrote in message
news:11**********************@z34g2000cwc.googlegr oups.com...
I am working on a script that splits a URL into a page and a url. The
examples below are the conditions I expect a user to pass to the
script. In all cases, "http://www.example.org/test/" is the URL, and
the page comprises parts that have upper case letters (note, 5 & 6 are
the same as earlier examples, sans the 'test').

1. http://www.example.org/test/Main/AnotherPage (page =
Main/AnotherPage)
2. http://www.example.org/test/Main (page = Main + '/' +
default_page)
3. http://www.example.org/test (page = default_group + '/' +
default_page)
4. http://www.example.org/test/ (page = default_group + '/' +
default_page)
5. http://www.example.org/ (page = default_group + '/' +
default_page)
6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)

Right now, I'm doing a simple split off condition 1:

page = '.'.join(in.split('/')[-2:])
url = '/'.join(in.split('/')[:-2]) + '/'

Before I start winding my way down a complex path, I wanted to see if
anybody had an elegant approach to this problem.

Thanks in advance.
Ben


Standard Python includes urlparse. Possible help?

-- Paul

import urlparse

urls = [
"http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
"http://www.example.org/test/Main", # (page = Main + '/' + default_page)
"http://www.example.org/test", # (page = default_group + '/' +
default_page)
"http://www.example.org/test/", # (page = default_group + '/' +
default_page)
"http://www.example.org/", # (page = default_group + '/' + default_page)
"http://www.example.org/Main/AnotherPage",
]

for u in urls:
print u
parts = urlparse.urlparse(u)
print parts
scheme,netloc,path,params,query,frag = parts
print path.split("/")[1:]
print

prints:
http://www.example.org/test/Main/AnotherPage
('http', 'www.example.org', '/test/Main/AnotherPage', '', '', '')
['test', 'Main', 'AnotherPage']

http://www.example.org/test/Main
('http', 'www.example.org', '/test/Main', '', '', '')
['test', 'Main']

http://www.example.org/test
('http', 'www.example.org', '/test', '', '', '')
['test']

http://www.example.org/test/
('http', 'www.example.org', '/test/', '', '', '')
['test', '']

http://www.example.org/
('http', 'www.example.org', '/', '', '', '')
['']

http://www.example.org
('http', 'www.example.org', '', '', '', '')
[]

http://www.example.org/Main/AnotherPage
('http', 'www.example.org', '/Main/AnotherPage', '', '', '')
['Main', 'AnotherPage']
Apr 18 '06 #5
This is what I ended up with. Slightly different approach:

import urlparse

def sUrl(s):
page = group = ''
bits = urlparse.urlsplit(s)
url = '//'.join([bits[0],bits[1]]) + '/'
query = bits[2].split('/')
if '' in query: query.remove('')
if len(query) > 1: page = query.pop()
if len(query) > 0 and query[-1] == query[-1].capitalize(): group =
query.pop()
if len(query): url += '/'.join(query) + '/'
if page == '': page = 'Main'
if group == '': group = 'Main'
page = '.'.join([group,page])
print " URL: (%s) PAGE: (%s)" % (url, page)
urls = [
"http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
"http://www.example.org/test/Main", # (page = Main + '/' +
default_page)
"http://www.example.org/test", # (page = default_group + '/' +
default_page)
"http://www.example.org/test/", # (page = default_group + '/' +
default_page)
"http://www.example.org/", # (page = default_group + '/' +
default_page)
"http://www.example.org/Main/AnotherPage",
]

for u in urls:
print "Testing:",u
sUrl(u)

Apr 19 '06 #6
In practice, I had to change this:
if len(query) > 0 and query[-1] == query[-1].capitalize(): group =
query.pop()

to this:
if len(query) > 0 and query[-1][0] == query[-1].capitalize()[0]:
group = query.pop()

This is because I only wanted to test the case of the first letter of
the string.

Apr 19 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

220 posts views Thread by Brandon J. Van Every | last post: by
5 posts views Thread by Michael Herman \(Parallelspace\) | last post: by
8 posts views Thread by bill | last post: by
204 posts views Thread by Masood | last post: by
???
reply views Thread by Stoney L | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.