Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code: http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
Thank you. 8 7012
<Ma***********@ gmail.comwrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
You're using reverse slashes in your RE pattern, to start with, while
the URLs contain plain slashes (or don't have any slashes, in the case
of the second one).
Anyway, forget REs, and use standard library module urlparse,
specifically its urlparse.urlspl it function.
Alex
On Apr 13, 2007, at 11:49 PM, Ma***********@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' is http://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address: http://www.cnn.com -cnn.com
In <11************ *********@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23
wrote:
On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
>On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com> wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com www.yahoo.com http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com, but i want 'cnn.com' (everything after the first '.'), how can I do that?
pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com -cnn.com
from urlparse import urlsplit
def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])
def main():
print get_domain('htt p://www.cnn.com')
Ciao,
Marc 'BlackJack' Rintsch
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23
wrote:
On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote:
On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com>
wrote:
On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
>http://www.cnn.com
>www.yahoo.com
>http://www.ebay.co.uk
pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
match = re.findall(patt ern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com-cnn.com
from urlparse import urlsplit
def get_domain(url) :
net_location = urlsplit(url)[1]
return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])
def main():
print get_domain('htt p://www.cnn.com')
Ciao,
Marc 'BlackJack' Rintsch
Thanks for your help.
But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"
how can I change it so that it works for both www.ebay.co.uk and www.cnn.com? Ma***********@g mail.com wrote:
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
>In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>, Marko.Cain.23 wrote:
>>On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote: On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com> wrote: On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote: >Hi, >I have a list of url names like this, and I am trying to strip out the >domain name using the following code: >http://www.cnn.com >www.yahoo.com >http://www.ebay.co.uk >pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S) >match = re.findall(patt ern, line) >if (match): > s1, s2 = match[0] > print s2 >but none of the site matched, can you please tell me what am i >missing? change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile("htt p:\/ \/(.*)\.(.*)", re.S) Thanks. I try this: but when the 'line' ishttp://www.cnn.com, I get 's2' com, but i want 'cnn.com' (everything after the first '.'), how can I do that? pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S) match = re.findall(patt ern, line) if (match): s1, s2 = match[0] print s2 Can anyone please help me with my problem? I still can't solve it. Basically, I want to strip out the text after the first '.' in url address: http://www.cnn.com-cnn.com
from urlparse import urlsplit
def get_domain(url) : net_location = urlsplit(url)[1] return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])
def main(): print get_domain('htt p://www.cnn.com')
Ciao, Marc 'BlackJack' Rintsch
Thanks for your help.
But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"
how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?
>>def get_domain(url) :
... net_location = urlsplit(url)[1]
... return net_location.sp lit(".", 1)[1]
...
>>print get_domain('htt p://www.cnn.com')
cnn.com
>>print get_domain('htt p://www.ebay.co.uk' )
ebay.co.uk
>>>
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com
On Apr 15, 2007, at 4:24 PM, Ma***********@g mail.com wrote:
On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
>In <1176654669.737 355.78...@y5g20 00hsa.googlegro ups.com>, Marko.Cain.2 3 wrote:
>>On Apr 14, 10:36 am, Marko.Cain...@g mail.com wrote: On Apr 14, 12:02 am, Michael Bentley <mich...@jedimi ndworks.com> wrote:
>>>>On Apr 13, 2007, at 11:49 PM, Marko.Cain...@g mail.com wrote:
>>>>>Hi,
>>>>>I have a list of url names like this, and I am trying to strip >out the >domain name using the following code:
>>>>>http://www.cnn.com >www.yahoo.com >http://www.ebay.co.uk
>>>>>pattern = re.compile("htt p:\\\\(.*)\.(.* )", re.S) >match = re.findall(patt ern, line)
>>>>>if (match): > s1, s2 = match[0]
>>>>> print s2
>>>>>but none of the site matched, can you please tell me what am i >missing?
>>>>change re.compile("htt p:\\\\(.*)\.(.* )", re.S) to re.compile ("http:\/ \/(.*)\.(.*)", re.S)
>>>Thanks. I try this:
>>>but when the 'line' ishttp://www.cnn.com, I get 's2' com, but i want 'cnn.com' (everything after the first '.'), how can I do that?
>>>pattern = re.compile("htt p:\/\/(.*)\.(.*)", re.S)
>>> match = re.findall(patt ern, line)
>>> if (match):
>>> s1, s2 = match[0]
>>> print s2
>>Can anyone please help me with my problem? I still can't solve it.
>>Basically, I want to strip out the text after the first '.' in url address:
>>http://www.cnn.com-cnn.com
from urlparse import urlsplit
def get_domain(url) : net_location = urlsplit(url)[1] return '.'.join(net_lo cation.rsplit(' .', 2)[-2:])
def main(): print get_domain('htt p://www.cnn.com')
Ciao, Marc 'BlackJack' Rintsch
Thanks for your help.
But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"
how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?
from urlparse import urlsplit
def get_domain(url) :
net_location = (
urlsplit(url)[1]
and urlsplit(url)[1].split('.')
or urlsplit(url)[2].split('.')
) # tricksy way to get long line into email
if net_location[0].lower() == 'www':
net_location = net_location[1:]
return '.'.join(net_lo cation)
def main():
testItems = ['http://www.cnn.com',
'www.yahoo.com' ,
'http://www.ebay.co.uk']
for testItem in testItems:
print get_domain(test Item)
if __name__ == '__main__':
main() This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Fazer |
last post by:
Hello,
I was wondering what would be the easiest way to strip away HTML tags from a string?
Or how would I remove everything between < and > also the < , > as well using regex?
Thanks for any help!
|
by: Mark Miller |
last post by:
I have a scheduled job that uses different XSL templates to transform XML
and save it to disk. I am having problems with the code below. The problem
shows up on both my development machine (Windows XP Pro SP 1, .Net Framework
1.1) and on our production server (Windows 2K SP 4, .Net Framework 1.1). I
have simplified the code and data to isolate the problem. When I use the
xsl:strip-space (Line 12) declaration in conjunction with the xsl:sort...
|
by: rtilley |
last post by:
s = ' qazwsx '
# How are these different?
print s.strip()
print str.strip(s)
Do string objects all have the attribute strip()? If so, why is
str.strip() needed? Really, I'm just curious... there's a lot don't
fully understand :)
|
by: js |
last post by:
Hi list,
I have a list of URL and I want to sort that list by the domain name.
Here, domain name doesn't contain subdomain,
or should I say, domain's part of 'www', mail, news and en should be excluded.
For example, if the list was the following
------------------------------------------------------------
http://mail.google.com
|
by: eight02645999 |
last post by:
hi
can someone explain strip() for these :
'example'
when i did this:
'abcd,words.words'
| |
by: Nick |
last post by:
strip() isn't working as i expect, am i doing something wrong -
Sample data in file in.txt:
'AF':'AFG':'004':'AFGHANISTAN':'Afghanistan'
'AL':'ALB':'008':'ALBANIA':'Albania'
'DZ':'DZA':'012':'ALGERIA':'Algeria'
'AS':'ASM':'016':'AMERICAN SAMOA':'American Samoa'
|
by: Christoph Zwerschke |
last post by:
In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:
if url.startswith('http://'):
url = url
Similarly for stripping suffixes:
if filename.endswith('.html'):
filename = filename
|
by: Bobby Roberts |
last post by:
hi group. I'm new to python and need some help and hope you can
answer this question. I have a situation in my code where i need to
create a file on the server and write to it. That's not a problem if
i hard code the path. However, the domain name needs to be dynamic so
it is picked up automatically. The path to our websites is
home/sites/xxxxx/
where xxxxx represents the domain name.
|
by: Poppy |
last post by:
I'm using versions 2.5.2 and 2.5.1 of python and have encountered a
potential bug. Not sure if I'm misunderstanding the usage of the strip
function but here's my example.
var = "detail.xml"
print var.strip(".xml") ### expect to see 'detail', but get 'detai'
var = "overview.xml"
print var.strip(".xml") ### expect and get 'overview'
I have a work around using the replace function which happens to be the
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |