how to strip the domain name in python?

Marko.Cain.23

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

Thank you.

Apr 14 '07 #1

Subscribe Post Reply

6995

Alex Martelli

<Ma***********@gmail.comwrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

You're using reverse slashes in your RE pattern, to start with, while
the URLs contain plain slashes (or don't have any slashes, in the case
of the second one).

Anyway, forget REs, and use standard library module urlparse,
specifically its urlparse.urlsplit function.
Alex

Apr 14 '07 #2

Michael Bentley

On Apr 13, 2007, at 11:49 PM, Ma***********@gmail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Apr 14 '07 #3

Marko.Cain.23

On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:

On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' is http://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):

s1, s2 = match[0]

print s2

Apr 14 '07 #4

Marko.Cain.23

On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote:

On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:

On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)

match = re.findall(pattern, line)

if (match):

s1, s2 = match[0]

print s2

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com -cnn.com

Apr 15 '07 #5

Marc 'BlackJack' Rintsch

In <11*********************@y5g2000hsa.googlegroups.c om>, Marko.Cain.23
wrote:

On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote:
>On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:

On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

>http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)

match = re.findall(pattern, line)

if (match):

s1, s2 = match[0]

print s2

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com -cnn.com

from urlparse import urlsplit

def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
print get_domain('http://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Apr 15 '07 #6

Marko.Cain.23

On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:

In <1176654669.737355.78...@y5g2000hsa.googlegroups.c om>, Marko.Cain.23
wrote:

On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:

On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)

match = re.findall(pattern, line)

if (match):

s1, s2 = match[0]

print s2

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com-cnn.com

from urlparse import urlsplit

def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
print get_domain('http://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?

Apr 15 '07 #7

Steve Holden

Ma***********@gmail.com wrote:

On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
>In <1176654669.737355.78...@y5g2000hsa.googlegroups.c om>, Marko.Cain.23
wrote:

>>On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:
>Hi,
>I have a list of url names like this, and I am trying to strip out the
>domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
>pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
>match = re.findall(pattern, line)
>if (match):
> s1, s2 = match[0]
> print s2
>but none of the site matched, can you please tell me what am i
>missing?
change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com-cnn.com
from urlparse import urlsplit

def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
print get_domain('http://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?

>>def get_domain(url):

... net_location = urlsplit(url)[1]
... return net_location.split(".", 1)[1]
...

>>print get_domain('http://www.cnn.com')

cnn.com

>>print get_domain('http://www.ebay.co.uk')

ebay.co.uk

>>>

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Apr 16 '07 #8

Michael Bentley

On Apr 15, 2007, at 4:24 PM, Ma***********@gmail.com wrote:

On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
>In <1176654669.737355.78...@y5g2000hsa.googlegroups.c om>,
Marko.Cain.23
wrote:

>>On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:

>>>>On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:

>>>>>Hi,

>>>>>I have a list of url names like this, and I am trying to strip
>out the
>domain name using the following code:

>>>>>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk

>>>>>pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
>match = re.findall(pattern, line)

>>>>>if (match):
> s1, s2 = match[0]

>>>>> print s2

>>>>>but none of the site matched, can you please tell me what am i
>missing?

>>>>change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile
("http:\/
\/(.*)\.(.*)", re.S)

>>>Thanks. I try this:

>>>but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

>>>pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)

>>> match = re.findall(pattern, line)

>>> if (match):

>>> s1, s2 = match[0]

>>> print s2

>>Can anyone please help me with my problem? I still can't solve it.

>>Basically, I want to strip out the text after the first '.' in url
address:

>>http://www.cnn.com-cnn.com

from urlparse import urlsplit

def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
print get_domain('http://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and
www.cnn.com?

from urlparse import urlsplit

def get_domain(url):
net_location = (
urlsplit(url)[1]
and urlsplit(url)[1].split('.')
or urlsplit(url)[2].split('.')
) # tricksy way to get long line into email
if net_location[0].lower() == 'www':
net_location = net_location[1:]
return '.'.join(net_location)

def main():
testItems = ['http://www.cnn.com',
'www.yahoo.com',
'http://www.ebay.co.uk']

for testItem in testItems:
print get_domain(testItem)

if __name__ == '__main__':
main()

Apr 16 '07 #9

Similar topics

Strip HTML tags?

by: Fazer | last post by:

Hello, I was wondering what would be the easiest way to strip away HTML tags from a string? Or how would I remove everything between < and > also the < , > as well using regex? Thanks for...

Python

Possible Bug: ArgumentOutOfRangeException when using xsl:sort and xsl:strip-space has been declared

by: Mark Miller | last post by:

I have a scheduled job that uses different XSL templates to transform XML and save it to disk. I am having problems with the code below. The problem shows up on both my development machine (Windows...

.NET Framework

different ways to strip strings

by: rtilley | last post by:

s = ' qazwsx ' # How are these different? print s.strip() print str.strip(s) Do string objects all have the attribute strip()? If so, why is str.strip() needed? Really, I'm just curious......

Python

Sort by domain name?

by: js | last post by:

Hi list, I have a list of URL and I want to sort that list by the domain name. Here, domain name doesn't contain subdomain, or should I say, domain's part of 'www', mail, news and en should be...

Python

strip question

by: eight02645999 | last post by:

hi can someone explain strip() for these : 'example' when i did this: 'abcd,words.words'

Python

strip() 2.4.4

by: Nick | last post by:

strip() isn't working as i expect, am i doing something wrong - Sample data in file in.txt: 'AF':'AFG':'004':'AFGHANISTAN':'Afghanistan' 'AL':'ALB':'008':'ALBANIA':'Albania'...

Python

strip() using strings instead of chars

by: Christoph Zwerschke | last post by:

In Python programs, you will quite frequently find code like the following for removing a certain prefix from a string: if url.startswith('http://'): url = url Similarly for stripping...

Python

finding domain name

by: Bobby Roberts | last post by:

hi group. I'm new to python and need some help and hope you can answer this question. I have a situation in my code where i need to create a file on the server and write to it. That's not a...

Python

strip module bug

by: Poppy | last post by:

I'm using versions 2.5.2 and 2.5.1 of python and have encountered a potential bug. Not sure if I'm misunderstanding the usage of the strip function but here's my example. var = "detail.xml"...

Python

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA