how to strip the domain name in python?

Marko.Cain.23

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

Thank you.

Apr 14 '07 #1

Subscribe Reply

7012

Alex Martelli

<Ma***********@ gmail.comwrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

You're using reverse slashes in your RE pattern, to start with, while
the URLs contain plain slashes (or don't have any slashes, in the case
of the second one).

Anyway, forget REs, and use standard library module urlparse,
specifically its urlparse.urlspl it function.
Alex

Apr 14 '07 #2

Michael Bentley

On Apr 13, 2007, at 11:49 PM, Ma***********@g mail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)

Apr 14 '07 #3

Marko.Cain.23

On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:

On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' is http://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)

if (match):

s1, s2 = match[0]

print s2

Apr 14 '07 #4

Marko.Cain.23

On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:

On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:

On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)

match = re.findall(patt ern, line)

if (match):

s1, s2 = match[0]

print s2

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com -cnn.com

Apr 15 '07 #5

Marc 'BlackJack' Rintsch

In <11************ *********@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23
wrote:

On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
>On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:

On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

>http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)

match = re.findall(patt ern, line)

if (match):

s1, s2 = match[0]

print s2

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com -cnn.com

from urlparse import urlsplit

def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])

def main():
print get_domain('htt p://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Apr 15 '07 #6

Marko.Cain.23

On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:

In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23
wrote:

On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:

On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)

match = re.findall(patt ern, line)

if (match):

s1, s2 = match[0]

print s2

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com-cnn.com

from urlparse import urlsplit

def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])

def main():
print get_domain('htt p://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?

Apr 15 '07 #7

Steve Holden

Ma***********@g mail.com wrote:

On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
>In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23
wrote:

>>On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
>Hi,
>I have a list of url names like this, and I am trying to strip out the
>domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
>pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
>match = re.findall(patt ern, line)
>if (match):
> s1, s2 = match[0]
> print s2
>but none of the site matched, can you please tell me what am i
>missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com-cnn.com
from urlparse import urlsplit

def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])

def main():
print get_domain('htt p://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?

>>def get_domain(url) :

... net_location = urlsplit(url)[1]
... return net_location.sp lit(".", 1)[1]
...

>>print get_domain('htt p://www.cnn.com')

cnn.com

>>print get_domain('htt p://www.ebay.co.uk' )

ebay.co.uk

>>>

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Apr 16 '07 #8

Michael Bentley

On Apr 15, 2007, at 4:24 PM, Ma***********@g mail.com wrote:

On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
>In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>,
Marko.Cain.2 3
wrote:

>>On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:

>>>>On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:

>>>>>Hi,

>>>>>I have a list of url names like this, and I am trying to strip
>out the
>domain name using the following code:

>>>>>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk

>>>>>pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
>match = re.findall(patt ern, line)

>>>>>if (match):
> s1, s2 = match[0]

>>>>> print s2

>>>>>but none of the site matched, can you please tell me what am i
>missing?

>>>>change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile
("http:\/
\/(.*)\.(.*)", re.S)

>>>Thanks. I try this:

>>>but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

>>>pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)

>>> match = re.findall(patt ern, line)

>>> if (match):

>>> s1, s2 = match[0]

>>> print s2

>>Can anyone please help me with my problem? I still can't solve it.

>>Basically, I want to strip out the text after the first '.' in url
address:

>>http://www.cnn.com-cnn.com

from urlparse import urlsplit

def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])

def main():
print get_domain('htt p://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and
www.cnn.com?

from urlparse import urlsplit

def get_domain(url) :
net_location = (
urlsplit(url)[1]
and urlsplit(url)[1].split('.')
or urlsplit(url)[2].split('.')
) # tricksy way to get long line into email
if net_location[0].lower() == 'www':
net_location = net_location[1:]
return '.'.join(net_lo cation)

def main():
testItems = ['http://www.cnn.com',
'www.yahoo.com' ,
'http://www.ebay.co.uk']

for testItem in testItems:
print get_domain(test Item)

if __name__ == '__main__':
main()

Apr 16 '07 #9

Similar topics

9341

Strip HTML tags?

by: Fazer | last post by:

Hello, I was wondering what would be the easiest way to strip away HTML tags from a string? Or how would I remove everything between < and > also the < , > as well using regex? Thanks for any help!

Python

2156

Possible Bug: ArgumentOutOfRangeException when using xsl:sort and xsl:strip-space has been declared

by: Mark Miller | last post by:

I have a scheduled job that uses different XSL templates to transform XML and save it to disk. I am having problems with the code below. The problem shows up on both my development machine (Windows XP Pro SP 1, .Net Framework 1.1) and on our production server (Windows 2K SP 4, .Net Framework 1.1). I have simplified the code and data to isolate the problem. When I use the xsl:strip-space (Line 12) declaration in conjunction with the xsl:sort...

.NET Framework

2563

different ways to strip strings

by: rtilley | last post by:

s = ' qazwsx ' # How are these different? print s.strip() print str.strip(s) Do string objects all have the attribute strip()? If so, why is str.strip() needed? Really, I'm just curious... there's a lot don't fully understand :)

Python

2875

Sort by domain name?

by: js | last post by:

Hi list, I have a list of URL and I want to sort that list by the domain name. Here, domain name doesn't contain subdomain, or should I say, domain's part of 'www', mail, news and en should be excluded. For example, if the list was the following ------------------------------------------------------------ http://mail.google.com

Python

2351

strip question

by: eight02645999 | last post by:

hi can someone explain strip() for these : 'example' when i did this: 'abcd,words.words'

Python

2189

strip() 2.4.4

by: Nick | last post by:

strip() isn't working as i expect, am i doing something wrong - Sample data in file in.txt: 'AF':'AFG':'004':'AFGHANISTAN':'Afghanistan' 'AL':'ALB':'008':'ALBANIA':'Albania' 'DZ':'DZA':'012':'ALGERIA':'Algeria' 'AS':'ASM':'016':'AMERICAN SAMOA':'American Samoa'

Python

4128

strip() using strings instead of chars

by: Christoph Zwerschke | last post by:

In Python programs, you will quite frequently find code like the following for removing a certain prefix from a string: if url.startswith('http://'): url = url Similarly for stripping suffixes: if filename.endswith('.html'): filename = filename

Python

2471

finding domain name

by: Bobby Roberts | last post by:

hi group. I'm new to python and need some help and hope you can answer this question. I have a situation in my code where i need to create a file on the server and write to it. That's not a problem if i hard code the path. However, the domain name needs to be dynamic so it is picked up automatically. The path to our websites is home/sites/xxxxx/ where xxxxx represents the domain name.

Python

2284

strip module bug

by: Poppy | last post by:

I'm using versions 2.5.2 and 2.5.1 of python and have encountered a potential bug. Not sure if I'm misunderstanding the usage of the strip function but here's my example. var = "detail.xml" print var.strip(".xml") ### expect to see 'detail', but get 'detai' var = "overview.xml" print var.strip(".xml") ### expect and get 'overview' I have a work around using the replace function which happens to be the

Python

8420

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8516

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8617

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7353

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

5642

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4173

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4330

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1970

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

1733

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General