Better way to sift parts of URL . . .

Ben Wilson

I am working on a script that splits a URL into a page and a url. The
examples below are the conditions I expect a user to pass to the
script. In all cases, "http://www.example.org/test/" is the URL, and
the page comprises parts that have upper case letters (note, 5 & 6 are
the same as earlier examples, sans the 'test').

1. http://www.example.org/test/Main/AnotherPage (page =
Main/AnotherPage)
2. http://www.example.org/test/Main (page = Main + '/' +
default_page)
3. http://www.example.org/test (page = default_group + '/' +
default_page)
4. http://www.example.org/test/ (page = default_group + '/' +
default_page)
5. http://www.example.org/ (page = default_group + '/' +
default_page)
6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)

Right now, I'm doing a simple split off condition 1:

page = '.'.join(in.split('/')[-2:])
url = '/'.join(in.split('/')[:-2]) + '/'

Before I start winding my way down a complex path, I wanted to see if
anybody had an elegant approach to this problem.

Thanks in advance.
Ben

Apr 18 '06 #1

Subscribe Post Reply

1439

Ben Wilson

Here is what I came up with:

def siftUrl(s):
s = s.split('//')[1]
bits = s.split('/')

if '' in bits: bits.remove('')
if len(bits) > 1:
group = bits[-2]
page = bits[-1]
group.strip('/')
page.strip('/')
else:
group = 'test'
page = 'test'

if group == group.capitalize():
page = '/'.join([group,page])
url = '/'.join(s.split('/')[:-2]) + '/'
elif page == page.capitalize():
page = '/'.join(['Main',page])
url = '/'.join(s.split('/')[:-1]) + '/'
else:
page = '/'.join(['Main','Main'])
url = s

url = 'http://' + url
return url, page

Apr 18 '06 #2

skip

Ben> I am working on a script that splits a URL into a page and a
Ben> url.

I couldn't tell quite what you mean to accomplish from your example. (In
particular, I don't know what you mean by "default_group", as it's never
defined, and I don't know why the desired output of examples 1 and 6 is the
same, since the URLs are clearly different.) You don't mention having tried
the urlparse module, so I thought I should ask: have you tried using
urlparse?

Skip

Apr 18 '06 #3

Ben Wilson

Sorry.

I'm writing a python script that retrieves source contents of a wiki
page, edits, and re-posts changed content. The wiki breaks pages into
groups and pages (e.g. ThisGroup/ThisPage). The sections that are
camel cased (or otherwise contain title case) are the group and page
for a given page. When a url is passed that is incomplete (i.e., has
the base URL and the Group, or only the base URL), the wiki resorts to
defaults (e.g. a base URL and Group would return the default page for
that group, and a bare URL returns the base page for the base group).

I'm playing with urlparse now. Looks like I can do the same thing in a
lot fewer steps. I'll post results.

Ben

On 4/18/06, sk**@pobox.com <sk**@pobox.com> wrote:

Ben> I am working on a script that splits a URL into a page and a
Ben> url.

I couldn't tell quite what you mean to accomplish from your example. (In
particular, I don't know what you mean by "default_group", as it's never
defined, and I don't know why the desired output of examples 1 and 6 is the
same, since the URLs are clearly different.) You don't mention having tried
the urlparse module, so I thought I should ask: have you tried using
urlparse?

Skip

--
Ben Wilson
" Mundus vult decipi, ergo decipiatur"

Apr 18 '06 #4

Paul McGuire

"Ben Wilson" <da****@gmail.com> wrote in message
news:11**********************@z34g2000cwc.googlegr oups.com...

I am working on a script that splits a URL into a page and a url. The
examples below are the conditions I expect a user to pass to the
script. In all cases, "http://www.example.org/test/" is the URL, and
the page comprises parts that have upper case letters (note, 5 & 6 are
the same as earlier examples, sans the 'test').

1. http://www.example.org/test/Main/AnotherPage (page =
Main/AnotherPage)
2. http://www.example.org/test/Main (page = Main + '/' +
default_page)
3. http://www.example.org/test (page = default_group + '/' +
default_page)
4. http://www.example.org/test/ (page = default_group + '/' +
default_page)
5. http://www.example.org/ (page = default_group + '/' +
default_page)
6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)

Right now, I'm doing a simple split off condition 1:

page = '.'.join(in.split('/')[-2:])
url = '/'.join(in.split('/')[:-2]) + '/'

Before I start winding my way down a complex path, I wanted to see if
anybody had an elegant approach to this problem.

Thanks in advance.
Ben

Standard Python includes urlparse. Possible help?

-- Paul

import urlparse

urls = [
"http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
"http://www.example.org/test/Main", # (page = Main + '/' + default_page)
"http://www.example.org/test", # (page = default_group + '/' +
default_page)
"http://www.example.org/test/", # (page = default_group + '/' +
default_page)
"http://www.example.org/", # (page = default_group + '/' + default_page)
"http://www.example.org/Main/AnotherPage",
]

for u in urls:
print u
parts = urlparse.urlparse(u)
print parts
scheme,netloc,path,params,query,frag = parts
print path.split("/")[1:]
print

prints:
http://www.example.org/test/Main/AnotherPage
('http', 'www.example.org', '/test/Main/AnotherPage', '', '', '')
['test', 'Main', 'AnotherPage']

http://www.example.org/test/Main
('http', 'www.example.org', '/test/Main', '', '', '')
['test', 'Main']

http://www.example.org/test
('http', 'www.example.org', '/test', '', '', '')
['test']

http://www.example.org/test/
('http', 'www.example.org', '/test/', '', '', '')
['test', '']

http://www.example.org/
('http', 'www.example.org', '/', '', '', '')
['']

http://www.example.org
('http', 'www.example.org', '', '', '', '')
[]

http://www.example.org/Main/AnotherPage
('http', 'www.example.org', '/Main/AnotherPage', '', '', '')
['Main', 'AnotherPage']

Apr 18 '06 #5

Ben Wilson

This is what I ended up with. Slightly different approach:

import urlparse

def sUrl(s):
page = group = ''
bits = urlparse.urlsplit(s)
url = '//'.join([bits[0],bits[1]]) + '/'
query = bits[2].split('/')
if '' in query: query.remove('')
if len(query) > 1: page = query.pop()
if len(query) > 0 and query[-1] == query[-1].capitalize(): group =
query.pop()
if len(query): url += '/'.join(query) + '/'
if page == '': page = 'Main'
if group == '': group = 'Main'
page = '.'.join([group,page])
print " URL: (%s) PAGE: (%s)" % (url, page)
urls = [
"http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
"http://www.example.org/test/Main", # (page = Main + '/' +
default_page)
"http://www.example.org/test", # (page = default_group + '/' +
default_page)
"http://www.example.org/test/", # (page = default_group + '/' +
default_page)
"http://www.example.org/", # (page = default_group + '/' +
default_page)
"http://www.example.org/Main/AnotherPage",
]

for u in urls:
print "Testing:",u
sUrl(u)

Apr 19 '06 #6

Ben Wilson

In practice, I had to change this:
if len(query) > 0 and query[-1] == query[-1].capitalize(): group =
query.pop()

to this:
if len(query) > 0 and query[-1][0] == query[-1].capitalize()[0]:
group = query.pop()

This is because I only wanted to test the case of the first letter of
the string.

Apr 19 '06 #7

Similar topics

220

What's better about Ruby than Python?

by: Brandon J. Van Every | last post by:

What's better about Ruby than Python? I'm sure there's something. What is it? This is not a troll. I'm language shopping and I want people's answers. I don't know beans about Ruby or have...

Python

133

Java's performance far better that optimized C++

by: Gaurav | last post by:

http://www.sys-con.com/story/print.cfm?storyid=45250 Any comments? Thanks Gaurav

C / C++

ASP.NET 3.0: Compelling scenarios for combining Master Pages and Web Parts?

by: Michael Herman $Parallelspace$ | last post by:

1. What are some compelling solutions for using Master/Content pages with Web Pages? 2. If a content area has a web part zone with web parts, what is the user experience like when "editting" the...

ASP.NET

which is better, string concatentation or substitution?

by: John Salerno | last post by:

My initial feeling is that concatenation might take longer than substitution, but that it is also easier to read: def p(self, paragraph): self.source += '<p>' + paragraph + '</p>\n\n' vs. ...

Python

Better way to update two tables ?

by: TD | last post by:

I need help understanding a better way to do this. I have an unbound form named frmParts that has two subforms neither of which is bound to frmParts. One subform is named frmSubParts and it's...

Microsoft Access / VBA

$parts = $struct->parts;

by: bill | last post by:

Turning on error_reporting(E_ALL); was quite an eye opener as to how much "fixing" PHP will do for the sloppy coder. I fixed all of the errors except: Notice: Undefined property: parts in...

PHP

Bug in my C++ program seems really strange. (Update on debugging progress)

by: mike3 | last post by:

Hi. I seem to have made some progress on finding that bug in my program. I deactivated everything in the bignum package that was used except for the returning of BigFloat objects. I even...

C / C++

204

Making C better (by borrowing from C++)

by: Masood | last post by:

I know that this topic may inflame the "C language Taleban", but is there any prospect of some of the neat features of C++ getting incorporated in C? No I am not talking out the OO stuff. I am...

C / C++

Insert Multiple Parts Problem

by: bonneylake | last post by:

Hey Everyone, Well recently i been inserting multiple fields for a section in my form called "serial". Well now i am trying to insert multiple fields for the not only the serial section but also...

ColdFusion

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA