"Ben Wilson" <da****@gmail.com> wrote in message
news:11**********************@z34g2000cwc.googlegr oups.com...
I am working on a script that splits a URL into a page and a url. The
examples below are the conditions I expect a user to pass to the
script. In all cases, "http://www.example.org/test/" is the URL, and
the page comprises parts that have upper case letters (note, 5 & 6 are
the same as earlier examples, sans the 'test').
1. http://www.example.org/test/Main/AnotherPage (page =
Main/AnotherPage)
2. http://www.example.org/test/Main (page = Main + '/' +
default_page)
3. http://www.example.org/test (page = default_group + '/' +
default_page)
4. http://www.example.org/test/ (page = default_group + '/' +
default_page)
5. http://www.example.org/ (page = default_group + '/' +
default_page)
6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)
Right now, I'm doing a simple split off condition 1:
page = '.'.join(in.split('/')[-2:])
url = '/'.join(in.split('/')[:-2]) + '/'
Before I start winding my way down a complex path, I wanted to see if
anybody had an elegant approach to this problem.
Thanks in advance.
Ben
Standard Python includes urlparse. Possible help?
-- Paul
import urlparse
urls = [
"http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
"http://www.example.org/test/Main", # (page = Main + '/' + default_page)
"http://www.example.org/test", # (page = default_group + '/' +
default_page)
"http://www.example.org/test/", # (page = default_group + '/' +
default_page)
"http://www.example.org/", # (page = default_group + '/' + default_page)
"http://www.example.org/Main/AnotherPage",
]
for u in urls:
print u
parts = urlparse.urlparse(u)
print parts
scheme,netloc,path,params,query,frag = parts
print path.split("/")[1:]
print
prints:
http://www.example.org/test/Main/AnotherPage
('http', 'www.example.org', '/test/Main/AnotherPage', '', '', '')
['test', 'Main', 'AnotherPage']
http://www.example.org/test/Main
('http', 'www.example.org', '/test/Main', '', '', '')
['test', 'Main']
http://www.example.org/test
('http', 'www.example.org', '/test', '', '', '')
['test']
http://www.example.org/test/
('http', 'www.example.org', '/test/', '', '', '')
['test', '']
http://www.example.org/
('http', 'www.example.org', '/', '', '', '')
['']
http://www.example.org
('http', 'www.example.org', '', '', '', '')
[]
http://www.example.org/Main/AnotherPage
('http', 'www.example.org', '/Main/AnotherPage', '', '', '')
['Main', 'AnotherPage']