Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code: http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
Thank you. 8 6995
<Ma***********@gmail.comwrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
You're using reverse slashes in your RE pattern, to start with, while
the URLs contain plain slashes (or don't have any slashes, in the case
of the second one).
Anyway, forget REs, and use standard library module urlparse,
specifically its urlparse.urlsplit function.
Alex
On Apr 13, 2007, at 11:49 PM, Ma***********@gmail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' is http://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address: http://www.cnn.com -cnn.com
In <11*********************@y5g2000hsa.googlegroups.c om>, Marko.Cain.23
wrote:
On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote:
>On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com> wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com, but i want 'cnn.com' (everything after the first '.'), how can I do that?
pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com -cnn.com
from urlparse import urlsplit
def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])
def main():
print get_domain('http://www.cnn.com')
Ciao,
Marc 'BlackJack' Rintsch
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
In <1176654669.737355.78...@y5g2000hsa.googlegroups.c om>, Marko.Cain.23
wrote:
On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com-cnn.com
from urlparse import urlsplit
def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])
def main():
print get_domain('http://www.cnn.com')
Ciao,
Marc 'BlackJack' Rintsch
Thanks for your help.
But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"
how can I change it so that it works for both www.ebay.co.uk and www.cnn.com? Ma***********@gmail.com wrote:
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
>In <1176654669.737355.78...@y5g2000hsa.googlegroups.c om>, Marko.Cain.23 wrote:
>>On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote: On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com> wrote: On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote: >Hi, >I have a list of url names like this, and I am trying to strip out the >domain name using the following code: >http://www.cnn.com >www.yahoo.com >http://www.ebay.co.uk >pattern = re.compile("http:\\\\(.*)\.(.*)", re.S) >match = re.findall(pattern, line) >if (match): > s1, s2 = match[0] > print s2 >but none of the site matched, can you please tell me what am i >missing? change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/ \/(.*)\.(.*)", re.S) Thanks. I try this: but when the 'line' ishttp://www.cnn.com, I get 's2' com, but i want 'cnn.com' (everything after the first '.'), how can I do that? pattern = re.compile("http:\/\/(.*)\.(.*)", re.S) match = re.findall(pattern, line) if (match): s1, s2 = match[0] print s2 Can anyone please help me with my problem? I still can't solve it. Basically, I want to strip out the text after the first '.' in url address: http://www.cnn.com-cnn.com
from urlparse import urlsplit
def get_domain(url): net_location = urlsplit(url)[1] return '.'.join(net_location.rsplit('.', 2)[-2:])
def main(): print get_domain('http://www.cnn.com')
Ciao, Marc 'BlackJack' Rintsch
Thanks for your help.
But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"
how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?
>>def get_domain(url):
... net_location = urlsplit(url)[1]
... return net_location.split(".", 1)[1]
...
>>print get_domain('http://www.cnn.com')
cnn.com
>>print get_domain('http://www.ebay.co.uk')
ebay.co.uk
>>>
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com
On Apr 15, 2007, at 4:24 PM, Ma***********@gmail.com wrote:
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
>In <1176654669.737355.78...@y5g2000hsa.googlegroups.c om>, Marko.Cain.23 wrote:
>>On Apr 14, 10:36 am, Marko.Cain...@gmail.com wrote: On Apr 14, 12:02 am, Michael Bentley <mich...@jedimindworks.com> wrote:
>>>>On Apr 13, 2007, at 11:49 PM, Marko.Cain...@gmail.com wrote:
>>>>>Hi,
>>>>>I have a list of url names like this, and I am trying to strip >out the >domain name using the following code:
>>>>>http://www.cnn.com >www.yahoo.com >http://www.ebay.co.uk
>>>>>pattern = re.compile("http:\\\\(.*)\.(.*)", re.S) >match = re.findall(pattern, line)
>>>>>if (match): > s1, s2 = match[0]
>>>>> print s2
>>>>>but none of the site matched, can you please tell me what am i >missing?
>>>>change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile ("http:\/ \/(.*)\.(.*)", re.S)
>>>Thanks. I try this:
>>>but when the 'line' ishttp://www.cnn.com, I get 's2' com, but i want 'cnn.com' (everything after the first '.'), how can I do that?
>>>pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
>>> match = re.findall(pattern, line)
>>> if (match):
>>> s1, s2 = match[0]
>>> print s2
>>Can anyone please help me with my problem? I still can't solve it.
>>Basically, I want to strip out the text after the first '.' in url address:
>>http://www.cnn.com-cnn.com
from urlparse import urlsplit
def get_domain(url): net_location = urlsplit(url)[1] return '.'.join(net_location.rsplit('.', 2)[-2:])
def main(): print get_domain('http://www.cnn.com')
Ciao, Marc 'BlackJack' Rintsch
Thanks for your help.
But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"
how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?
from urlparse import urlsplit
def get_domain(url):
net_location = (
urlsplit(url)[1]
and urlsplit(url)[1].split('.')
or urlsplit(url)[2].split('.')
) # tricksy way to get long line into email
if net_location[0].lower() == 'www':
net_location = net_location[1:]
return '.'.join(net_location)
def main():
testItems = ['http://www.cnn.com',
'www.yahoo.com',
'http://www.ebay.co.uk']
for testItem in testItems:
print get_domain(testItem)
if __name__ == '__main__':
main() This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Fazer |
last post by:
Hello,
I was wondering what would be the easiest way to strip away HTML tags from a string?
Or how would I remove everything between < and > also the < , > as well using regex?
Thanks for...
|
by: Mark Miller |
last post by:
I have a scheduled job that uses different XSL templates to transform XML
and save it to disk. I am having problems with the code below. The problem
shows up on both my development machine (Windows...
|
by: rtilley |
last post by:
s = ' qazwsx '
# How are these different?
print s.strip()
print str.strip(s)
Do string objects all have the attribute strip()? If so, why is
str.strip() needed? Really, I'm just curious......
|
by: js |
last post by:
Hi list,
I have a list of URL and I want to sort that list by the domain name.
Here, domain name doesn't contain subdomain,
or should I say, domain's part of 'www', mail, news and en should be...
|
by: eight02645999 |
last post by:
hi
can someone explain strip() for these :
'example'
when i did this:
'abcd,words.words'
|
by: Nick |
last post by:
strip() isn't working as i expect, am i doing something wrong -
Sample data in file in.txt:
'AF':'AFG':'004':'AFGHANISTAN':'Afghanistan'
'AL':'ALB':'008':'ALBANIA':'Albania'...
|
by: Christoph Zwerschke |
last post by:
In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:
if url.startswith('http://'):
url = url
Similarly for stripping...
|
by: Bobby Roberts |
last post by:
hi group. I'm new to python and need some help and hope you can
answer this question. I have a situation in my code where i need to
create a file on the server and write to it. That's not a...
|
by: Poppy |
last post by:
I'm using versions 2.5.2 and 2.5.1 of python and have encountered a
potential bug. Not sure if I'm misunderstanding the usage of the strip
function but here's my example.
var = "detail.xml"...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |