473,407 Members | 2,312 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

Better way to sift parts of URL . . .

I am working on a script that splits a URL into a page and a url. The
examples below are the conditions I expect a user to pass to the
script. In all cases, "http://www.example.org/test/" is the URL, and
the page comprises parts that have upper case letters (note, 5 & 6 are
the same as earlier examples, sans the 'test').

1. http://www.example.org/test/Main/AnotherPage (page =
Main/AnotherPage)
2. http://www.example.org/test/Main (page = Main + '/' +
default_page)
3. http://www.example.org/test (page = default_group + '/' +
default_page)
4. http://www.example.org/test/ (page = default_group + '/' +
default_page)
5. http://www.example.org/ (page = default_group + '/' +
default_page)
6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)

Right now, I'm doing a simple split off condition 1:

page = '.'.join(in.split('/')[-2:])
url = '/'.join(in.split('/')[:-2]) + '/'

Before I start winding my way down a complex path, I wanted to see if
anybody had an elegant approach to this problem.

Thanks in advance.
Ben

Apr 18 '06 #1
6 1439
Here is what I came up with:

def siftUrl(s):
s = s.split('//')[1]
bits = s.split('/')

if '' in bits: bits.remove('')
if len(bits) > 1:
group = bits[-2]
page = bits[-1]
group.strip('/')
page.strip('/')
else:
group = 'test'
page = 'test'

if group == group.capitalize():
page = '/'.join([group,page])
url = '/'.join(s.split('/')[:-2]) + '/'
elif page == page.capitalize():
page = '/'.join(['Main',page])
url = '/'.join(s.split('/')[:-1]) + '/'
else:
page = '/'.join(['Main','Main'])
url = s

url = 'http://' + url
return url, page

Apr 18 '06 #2

Ben> I am working on a script that splits a URL into a page and a
Ben> url.

I couldn't tell quite what you mean to accomplish from your example. (In
particular, I don't know what you mean by "default_group", as it's never
defined, and I don't know why the desired output of examples 1 and 6 is the
same, since the URLs are clearly different.) You don't mention having tried
the urlparse module, so I thought I should ask: have you tried using
urlparse?

Skip
Apr 18 '06 #3
Sorry.

I'm writing a python script that retrieves source contents of a wiki
page, edits, and re-posts changed content. The wiki breaks pages into
groups and pages (e.g. ThisGroup/ThisPage). The sections that are
camel cased (or otherwise contain title case) are the group and page
for a given page. When a url is passed that is incomplete (i.e., has
the base URL and the Group, or only the base URL), the wiki resorts to
defaults (e.g. a base URL and Group would return the default page for
that group, and a bare URL returns the base page for the base group).

I'm playing with urlparse now. Looks like I can do the same thing in a
lot fewer steps. I'll post results.

Ben

On 4/18/06, sk**@pobox.com <sk**@pobox.com> wrote:

Ben> I am working on a script that splits a URL into a page and a
Ben> url.

I couldn't tell quite what you mean to accomplish from your example. (In
particular, I don't know what you mean by "default_group", as it's never
defined, and I don't know why the desired output of examples 1 and 6 is the
same, since the URLs are clearly different.) You don't mention having tried
the urlparse module, so I thought I should ask: have you tried using
urlparse?

Skip

--
Ben Wilson
" Mundus vult decipi, ergo decipiatur"
Apr 18 '06 #4
"Ben Wilson" <da****@gmail.com> wrote in message
news:11**********************@z34g2000cwc.googlegr oups.com...
I am working on a script that splits a URL into a page and a url. The
examples below are the conditions I expect a user to pass to the
script. In all cases, "http://www.example.org/test/" is the URL, and
the page comprises parts that have upper case letters (note, 5 & 6 are
the same as earlier examples, sans the 'test').

1. http://www.example.org/test/Main/AnotherPage (page =
Main/AnotherPage)
2. http://www.example.org/test/Main (page = Main + '/' +
default_page)
3. http://www.example.org/test (page = default_group + '/' +
default_page)
4. http://www.example.org/test/ (page = default_group + '/' +
default_page)
5. http://www.example.org/ (page = default_group + '/' +
default_page)
6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)

Right now, I'm doing a simple split off condition 1:

page = '.'.join(in.split('/')[-2:])
url = '/'.join(in.split('/')[:-2]) + '/'

Before I start winding my way down a complex path, I wanted to see if
anybody had an elegant approach to this problem.

Thanks in advance.
Ben


Standard Python includes urlparse. Possible help?

-- Paul

import urlparse

urls = [
"http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
"http://www.example.org/test/Main", # (page = Main + '/' + default_page)
"http://www.example.org/test", # (page = default_group + '/' +
default_page)
"http://www.example.org/test/", # (page = default_group + '/' +
default_page)
"http://www.example.org/", # (page = default_group + '/' + default_page)
"http://www.example.org/Main/AnotherPage",
]

for u in urls:
print u
parts = urlparse.urlparse(u)
print parts
scheme,netloc,path,params,query,frag = parts
print path.split("/")[1:]
print

prints:
http://www.example.org/test/Main/AnotherPage
('http', 'www.example.org', '/test/Main/AnotherPage', '', '', '')
['test', 'Main', 'AnotherPage']

http://www.example.org/test/Main
('http', 'www.example.org', '/test/Main', '', '', '')
['test', 'Main']

http://www.example.org/test
('http', 'www.example.org', '/test', '', '', '')
['test']

http://www.example.org/test/
('http', 'www.example.org', '/test/', '', '', '')
['test', '']

http://www.example.org/
('http', 'www.example.org', '/', '', '', '')
['']

http://www.example.org
('http', 'www.example.org', '', '', '', '')
[]

http://www.example.org/Main/AnotherPage
('http', 'www.example.org', '/Main/AnotherPage', '', '', '')
['Main', 'AnotherPage']
Apr 18 '06 #5
This is what I ended up with. Slightly different approach:

import urlparse

def sUrl(s):
page = group = ''
bits = urlparse.urlsplit(s)
url = '//'.join([bits[0],bits[1]]) + '/'
query = bits[2].split('/')
if '' in query: query.remove('')
if len(query) > 1: page = query.pop()
if len(query) > 0 and query[-1] == query[-1].capitalize(): group =
query.pop()
if len(query): url += '/'.join(query) + '/'
if page == '': page = 'Main'
if group == '': group = 'Main'
page = '.'.join([group,page])
print " URL: (%s) PAGE: (%s)" % (url, page)
urls = [
"http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
"http://www.example.org/test/Main", # (page = Main + '/' +
default_page)
"http://www.example.org/test", # (page = default_group + '/' +
default_page)
"http://www.example.org/test/", # (page = default_group + '/' +
default_page)
"http://www.example.org/", # (page = default_group + '/' +
default_page)
"http://www.example.org/Main/AnotherPage",
]

for u in urls:
print "Testing:",u
sUrl(u)

Apr 19 '06 #6
In practice, I had to change this:
if len(query) > 0 and query[-1] == query[-1].capitalize(): group =
query.pop()

to this:
if len(query) > 0 and query[-1][0] == query[-1].capitalize()[0]:
group = query.pop()

This is because I only wanted to test the case of the first letter of
the string.

Apr 19 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

220
by: Brandon J. Van Every | last post by:
What's better about Ruby than Python? I'm sure there's something. What is it? This is not a troll. I'm language shopping and I want people's answers. I don't know beans about Ruby or have...
133
by: Gaurav | last post by:
http://www.sys-con.com/story/print.cfm?storyid=45250 Any comments? Thanks Gaurav
5
by: Michael Herman \(Parallelspace\) | last post by:
1. What are some compelling solutions for using Master/Content pages with Web Pages? 2. If a content area has a web part zone with web parts, what is the user experience like when "editting" the...
16
by: John Salerno | last post by:
My initial feeling is that concatenation might take longer than substitution, but that it is also easier to read: def p(self, paragraph): self.source += '<p>' + paragraph + '</p>\n\n' vs. ...
3
by: TD | last post by:
I need help understanding a better way to do this. I have an unbound form named frmParts that has two subforms neither of which is bound to frmParts. One subform is named frmSubParts and it's...
8
by: bill | last post by:
Turning on error_reporting(E_ALL); was quite an eye opener as to how much "fixing" PHP will do for the sloppy coder. I fixed all of the errors except: Notice: Undefined property: parts in...
23
by: mike3 | last post by:
Hi. I seem to have made some progress on finding that bug in my program. I deactivated everything in the bignum package that was used except for the returning of BigFloat objects. I even...
204
by: Masood | last post by:
I know that this topic may inflame the "C language Taleban", but is there any prospect of some of the neat features of C++ getting incorporated in C? No I am not talking out the OO stuff. I am...
58
by: bonneylake | last post by:
Hey Everyone, Well recently i been inserting multiple fields for a section in my form called "serial". Well now i am trying to insert multiple fields for the not only the serial section but also...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.