473,657 Members | 2,458 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how to strip the domain name in python?

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

Thank you.

Apr 14 '07 #1
8 7012
<Ma***********@ gmail.comwrote:
Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?
You're using reverse slashes in your RE pattern, to start with, while
the URLs contain plain slashes (or don't have any slashes, in the case
of the second one).

Anyway, forget REs, and use standard library module urlparse,
specifically its urlparse.urlspl it function.
Alex
Apr 14 '07 #2

On Apr 13, 2007, at 11:49 PM, Ma***********@g mail.com wrote:
Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)

Apr 14 '07 #3
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?

change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:

but when the 'line' is http://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)

if (match):

s1, s2 = match[0]

print s2

Apr 14 '07 #4
On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)

match = re.findall(patt ern, line)

if (match):

s1, s2 = match[0]

print s2
Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com -cnn.com

Apr 15 '07 #5
In <11************ *********@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23
wrote:
On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
>On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)

match = re.findall(patt ern, line)

if (match):

s1, s2 = match[0]

print s2

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com -cnn.com
from urlparse import urlsplit

def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])

def main():
print get_domain('htt p://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch
Apr 15 '07 #6
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23
wrote:
On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com-cnn.com

from urlparse import urlsplit

def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])

def main():
print get_domain('htt p://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch
Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?

Apr 15 '07 #7
Ma***********@g mail.com wrote:
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
>In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23
wrote:
>>On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
>Hi,
>I have a list of url names like this, and I am trying to strip out the
>domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
>pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
>match = re.findall(patt ern, line)
>if (match):
> s1, s2 = match[0]
> print s2
>but none of the site matched, can you please tell me what am i
>missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com-cnn.com
from urlparse import urlsplit

def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])

def main():
print get_domain('htt p://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?
>>def get_domain(url) :
... net_location = urlsplit(url)[1]
... return net_location.sp lit(".", 1)[1]
...
>>print get_domain('htt p://www.cnn.com')
cnn.com
>>print get_domain('htt p://www.ebay.co.uk' )
ebay.co.uk
>>>
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

Apr 16 '07 #8

On Apr 15, 2007, at 4:24 PM, Ma***********@g mail.com wrote:
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
>In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>,
Marko.Cain.2 3
wrote:
>>On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
>>>>On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
>>>>>Hi,
>>>>>I have a list of url names like this, and I am trying to strip
>out the
>domain name using the following code:
>>>>>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
>>>>>pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
>match = re.findall(patt ern, line)
>>>>>if (match):
> s1, s2 = match[0]
>>>>> print s2
>>>>>but none of the site matched, can you please tell me what am i
>missing?
>>>>change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile
("http:\/
\/(.*)\.(.*)", re.S)
>>>Thanks. I try this:
>>>but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
>>>pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
>>> match = re.findall(patt ern, line)
>>> if (match):
>>> s1, s2 = match[0]
>>> print s2
>>Can anyone please help me with my problem? I still can't solve it.
>>Basically, I want to strip out the text after the first '.' in url
address:
>>http://www.cnn.com-cnn.com

from urlparse import urlsplit

def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])

def main():
print get_domain('htt p://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and
www.cnn.com?
from urlparse import urlsplit

def get_domain(url) :
net_location = (
urlsplit(url)[1]
and urlsplit(url)[1].split('.')
or urlsplit(url)[2].split('.')
) # tricksy way to get long line into email
if net_location[0].lower() == 'www':
net_location = net_location[1:]
return '.'.join(net_lo cation)

def main():
testItems = ['http://www.cnn.com',
'www.yahoo.com' ,
'http://www.ebay.co.uk']

for testItem in testItems:
print get_domain(test Item)

if __name__ == '__main__':
main()
Apr 16 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
9341
by: Fazer | last post by:
Hello, I was wondering what would be the easiest way to strip away HTML tags from a string? Or how would I remove everything between < and > also the < , > as well using regex? Thanks for any help!
6
2156
by: Mark Miller | last post by:
I have a scheduled job that uses different XSL templates to transform XML and save it to disk. I am having problems with the code below. The problem shows up on both my development machine (Windows XP Pro SP 1, .Net Framework 1.1) and on our production server (Windows 2K SP 4, .Net Framework 1.1). I have simplified the code and data to isolate the problem. When I use the xsl:strip-space (Line 12) declaration in conjunction with the xsl:sort...
6
2563
by: rtilley | last post by:
s = ' qazwsx ' # How are these different? print s.strip() print str.strip(s) Do string objects all have the attribute strip()? If so, why is str.strip() needed? Really, I'm just curious... there's a lot don't fully understand :)
12
2875
by: js | last post by:
Hi list, I have a list of URL and I want to sort that list by the domain name. Here, domain name doesn't contain subdomain, or should I say, domain's part of 'www', mail, news and en should be excluded. For example, if the list was the following ------------------------------------------------------------ http://mail.google.com
6
2351
by: eight02645999 | last post by:
hi can someone explain strip() for these : 'example' when i did this: 'abcd,words.words'
7
2189
by: Nick | last post by:
strip() isn't working as i expect, am i doing something wrong - Sample data in file in.txt: 'AF':'AFG':'004':'AFGHANISTAN':'Afghanistan' 'AL':'ALB':'008':'ALBANIA':'Albania' 'DZ':'DZA':'012':'ALGERIA':'Algeria' 'AS':'ASM':'016':'AMERICAN SAMOA':'American Samoa'
6
4128
by: Christoph Zwerschke | last post by:
In Python programs, you will quite frequently find code like the following for removing a certain prefix from a string: if url.startswith('http://'): url = url Similarly for stripping suffixes: if filename.endswith('.html'): filename = filename
10
2471
by: Bobby Roberts | last post by:
hi group. I'm new to python and need some help and hope you can answer this question. I have a situation in my code where i need to create a file on the server and write to it. That's not a problem if i hard code the path. However, the domain name needs to be dynamic so it is picked up automatically. The path to our websites is home/sites/xxxxx/ where xxxxx represents the domain name.
4
2284
by: Poppy | last post by:
I'm using versions 2.5.2 and 2.5.1 of python and have encountered a potential bug. Not sure if I'm misunderstanding the usage of the strip function but here's my example. var = "detail.xml" print var.strip(".xml") ### expect to see 'detail', but get 'detai' var = "overview.xml" print var.strip(".xml") ### expect and get 'overview' I have a work around using the replace function which happens to be the
0
8420
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
8516
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8617
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7353
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5642
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4173
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4330
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1970
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1733
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.